Redundant Logging

To ensure transactions are fully durable and can be audited, even when a processing server or the publishing server fails, the events received by a processing server are also logged by a second processing server to its local SSD.

The processing servers write their buffer caches at the same time to ensure the logs contain the same data. This way, if one processing server encounters an issue and is removed from the cluster, its transaction logs are not lost. These logs are used to create checkpoints and event files for downstream systems and are archived so they can be audited.

A transaction log is only removed from a processing blade's local SSD when the publishing cluster in the last engine in the chain has replayed all transactions in the transaction log. This way, if one processing server fails, another one can resume the responsibility of publishing the transaction log for the failed blade. In addition, the standby cluster is kept in sync with the active cluster and can take over for it in case of an engine failure.