Analyzing the Standby Cluster Initialization Phase
When in an HA state of INIT, a cluster is in the process of initializing its databases from a checkpoint or transaction log file (when the cluster becomes the standby cluster), or from replaying the in-memory databases (when the cluster becomes the active cluster). During this time, messages might be written to the mtx_debug.log file that indicate errors, and others that do not. These messages are listed in the following subsection.
Cannot Initialize Databases
LM_CRITI 29584|29769 2016-01-14 15:47:57.236605 [transaction_server_2:1:1:1(4700.36371)] | TXN1-TransactionManager:transaction_manager_task::TransactionManagerTask::handleInitDatabaseMsg: Could not initialize databases: result=14, message =DataContainer: size=1131, bufSize=1430, bufMaxSize=32768, this=0x7f78dc4c2a48
descName=MtxInitDatabaseMsg(363,4700,3), descriptorPtr=0x1921ba0, version=1, flags=0, fields=8
RDM id=6:-:2:4928, mtxBufPtr=0x7f78dc4c2a00, dataPtr=0x7f78dc4c2a8a, baseContainerPtr=0x7f78dc4c2ac1
idx name type L A M P offset maxSz value
0 FromEcbsmiId UINT64 0 0 0 1 0 8 36046457925533697 ([2:1:1:1:0:1])
1 InitDatabaseOp UINT32 0 0 0 1 8 4 2
2 InitDatabaseResult UINT32 0 0 0 1 12 4 14
3 InitDatabaseResultDetail UINT32 0 0 0 1 16 4 0
4 InitDatabaseResultText STRING 0 0 0 1 530 0 (size=180, data=TXN1-CheckpointManager:checkpoint_manager_task::CheckpointManagerTask::waitForReplayToComplete: Replay to [2:1:1:1:0:1] was aborted with 5281658 objects remaining to be completed.)
5 InitDatabaseResultFieldKey FIELD_KEY 0 0 0 1 20 35 0.0:0.0(Unknown)
6 InitDatabaseResultData BLOB 0 0 0 0 0 0
7 MaxObjectIdList OBJECT_ID 1 0 0 0 0 8
---BaseDataContainer---
DataContainer: size=828, bufSize=1430, bufMaxSize=32768, this=0x7f78dc4c2ac1
descName=MtxWorkOrderMsg(140,4700,3), descriptorPtr=0x1ae2fb0, version=1, flags=0, fields=25
RDM id=6:-:2:4928, mtxBufPtr=0x7f78dc4c2a00, dataPtr=0x7f78dc4c2b58, baseContainerPtr=0x7f78dc4c2bc0
idx name type L A M P offset maxSz value
0 ReplayTimeArray UINT64 0 1 0 0 0 8
1 TxnId STRUCT 0 0 0 0 0 0
2 TxnResult UINT32 0 0 0 0 0 4
3 TxnParticipantSet UINT64 0 0 0 0 4 8
4 TxnInQueueId UINT32 0 0 0 0 12 4
5 TxnToEcbsmiId UINT64 0 0 0 0 16 8
6 ApplicationMsgRdmId UINT64 0 0 0 0 24 8
7 TxnConditionList STRUCT 1 0 0 0 0 0
8 TxnActionList STRUCT 1 0 0 0 0 0
9 ResendCount UINT32 0 0 0 0 32 4
10 LogFileTime UINT32 0 0 0 0 36 4
11 LogFileSequenceId UINT32 0 0 0 0 40 4
12 LogBufferId UINT32 0 0 0 0 44 4
13 TotalTxnCountInLogBuffer UINT32 0 0 0 0 48 4
14 TotalTxnCountInLogFile UINT32 0 0 0 0 52 4
15 ReplayEcbsmiId UINT64 0 0 0 0 56 8
16 ReplayContextId UINT32 0 0 0 0 64 4
17 SourceLogEcbsmiId UINT64 0 0 0 0 68 8
18 SourceLogFileTime UINT32 0 0 0 0 76 4
19 SourceLogFileSequenceId UINT32 0 0 0 0 80 4
20 ResendTargetParticipantSet UINT64 0 0 0 0 84 8
21 GlobalTxnCounter UINT64 0 0 0 1 92 8 193604888
22 TxnAuditList STRUCT 1 0 0 0 0 0
23 Flags UINT32 0 0 0 0 100 4
---BaseDataContainer---
DataContainer: size=573, bufSize=1430, bufMaxSize=32768, this=0x7f78dc4c2bc0
descName=MtxMsg(93,4700,3), descriptorPtr=0x1a493e0, version=1, flags=0, fields=19
RDM id=6:-:2:4928, mtxBufPtr=0x7f78dc4c2a00, dataPtr=0x7f78dc4c2c39, baseContainerPtr=0
idx name type L A M P offset maxSz value
0 ReceiveTime DATETIME 0 0 0 1 0 12 2016-01-14T23:47:57.236015Z
1 GatewaySocketId INT32 0 0 0 1 12 4 268697600
2 Op UINT32 0 0 0 0 16 4
3 TimeArray UINT64 0 1 0 1 580 8 {maxElements=39:1452815277236015, 1452815277236015, 1452815277236366, , , , , , , 1452815277236368, 1452815277236444, , , , , , , , , , , , , , , , , , , , , , , , , , , , }
4 ChrgInQueueId UINT32 0 0 0 1 20 4 1
5 MtxParticipantTimeInfoList STRUCT 1 0 0 0 0 0
6 TxnMsgRdmId UINT64 0 0 0 0 24 8
7 Result UINT32 0 0 0 0 32 4
8 ResultDetail UINT32 0 0 0 0 36 4
9 ResultText STRING 0 0 0 0 0 0
10 ResultFieldKey FIELD_KEY 0 0 0 0 40 35
11 ResultData BLOB 0 0 0 0 0 0
12 ProxySocketId INT32 0 0 0 0 75 4
13 HopByHopId UINT32 0 0 0 1 79 4 308
14 EndToEndId UINT32 0 0 0 1 83 4 308
15 AfterTxnMsgRdmIdList UINT64 1 0 0 0 0 8
16 OriginalResult UINT32 0 0 0 0 87 4
17 DiamResult UINT32 0 0 0 0 91 4
18 TraceFlags UINT32 0 0 0 0 95 4
Failure to Join the Topology
LM_ERROR 23803|43133 2015-06-01 19:30:24.735193 [transaction_server_2:1:2:1(4603.32039)] |
TXN2-TransactionManager:transaction_manager_task::TransactionManagerTask::waitUntilAllTransactionsCompleted:
failed to wait for all transactions that have old topology without [id: 8, ECBSMI: [2:1:8:1:0:1]] to complete
LM_ERROR 8224|27799 2015-06-01 19:45:22.006419[transaction_server_2:1:8:1(4603.32039)] | FsmTxnSvrStateBase::handleKeepAliveEvent: timed out in database synchronization
LM_ERROR 30699|30707 2015-06-01 19:45:22.006877 [cluster_manager_2:1:8:1(4603.32039)] | FsmNodeStateBase::transitionOnFatalError: invalid or fatal event 'TransactionServiceFailure' in state SYNC @state=SYNC
In
such cases, wait until the cluster is synchronized and then try to add the server again.Pending Transactions
If a processing server on the cluster transitioning to the standby cluster receives a parallel balance transaction to replay, but it has not yet received the checkpoint transaction for the same balance set object, the parallel balance transaction is saved in a pending state for a short period of time. When the processing server receives the checkpoint transaction for this balance set object, that checkpoint transaction provides an absolute balance to apply the difference. When this occurs, LM_INFO messages similar to the following are written to the log. These messages do not indicate runtime issues. Instead, they indicate that the standby cluster is catching up to the active cluster:
LM_INFO 63530|16816 2015-06-01 19:17:28.793198 [transaction_server_2:1:1:1(4603.32039)] |
TXN1-TransactionManager:transaction_manager_task::TransactionManagerTask::printPendingTxnSummary:
number of pending transactions per blade={blade 5=2}