Handling Event Repository Loading Issues
This topic provides information about troubleshooting and handling issues when loading event objects into the Event Repository.
You can query the number of event objects that failed to load into the Event Repository.
The Event Loader logs are contained in the mtx_debug.log file.
LM_INFO 21450|21463 2018-10-03 15:57:13.495272 [event_loader_1:2:1:1(5100.55831)] | EventLoaderWorkerTask::loadEventFromMefToSingleCollection: successfully loaded 183 events from transaction_1_1_2_1538607406_3.mef.gz into event repository
LM_INFO 21450|21463 2018-10-03 15:57:13.495379 [event_loader_1:2:1:1(5100.55831)] | EventLoaderWorkerTask::loadEventFromMefToSingleCollection: successfully renamed file: ../local_1_2_1/event_store_meta/transaction_1_1_2_1538607406_3.mef.gz to ../local_1_2_1/event_store_processed/transaction_1_1_2_1538607406_3.mef.gz.remove.1538607433
LM_INFO 21450|21467 2018-10-03 15:57:35.867956 [event_loader_1:2:1:1(5100.55831)] | EventLoaderWorkerTask::processEventFiles: successfully removed file: ../local_1_2_1/event_store_processed/transaction_1_1_2_1538607426_5.mef.gz.remove.1538607433
LM_ERROR 3032|3043 2016-02-26 15:39:15.247628 [event_loader_1:2:4:1(4710.36950)] | EventLoaderWorkerTask::loadEventFromMefToSingleCollection:
/mnt/mtx/shared_01/event_store_meta/transaction_1_1_2_1456529894_1.mef.gz file not in compact MDC format
An issue with the network could interrupt access to the shared storage device. If a network issue causes access to the shared disk to hang, after five minutes, Event Loader logs the error shown in the following example. Check for network issues and correct them.
LM_ERROR 13555|13557 2016-08-31 15:52:01.663249 [event_loader_1:2:4:1(4751.39532)] | EventLoaderDispatcherTask::warningTriggerCallbackHandler: MtxEventLoader::EventLoaderDispatcherTask abort trigger, Step: EventLoaderDispatcherTask::processEventFiles::scandir, Timeout: 300083 msec
When troubleshooting issues with loading events, you can view the log file for the publishing server and print_blade_stats.py - E
for the
SNMP statistics.
MongoDB Server Failover Handling
If you configured a MongoDB server replica set for the Event Repository, failover handling is transparent.
If a MongoDB primary (mongod
) fails, and it fails to propagate some event objects to its secondaries at the time its processing is
transferred to the new primary, then those events objects must be reloaded into the MongoDB database from MATRIXX Engine.
When a primary fails to propagate event objects to its secondaries in such a failover case, Event Loader logs an error message and an SNMP trap is generated. The following is an example of the Event Loader error message:
LM_ERROR 32022|32034 2016-07-27 12:26:24.223874 [event_loader_1:2:4:1(4750.39117)] | EventLoaderWorkerTask::handleInput: mongocxx::operation_exception.
After this error is logged, you must restart the publishing pod before the Event Loader removes the processed MEF files of the events that were not propagated.
The default delay time that Event Loader waits before
removing processed MEF files is one hour. You can increase this delay time if you
want more time to respond to such failovers by answering the
create_config.info question EventLoader:How long
in seconds do you want to delay before event_loader can remove the processed MEF
files?
When you restart the publishing server, the Event Loader upon start-up, reloads the processed MEF files into MongoDB. Therefore, the event objects that were not propagated during the MongoDB failover are loaded into the Event Repository by the new primary.