Checkpoint
The checkpoint entity forms the backbone of the coordination of information between nodes in the cluster. Abstractly, it is a "dictionary", "map", or database table data structure -- that is, a user provides an arbitrary data "key" that returns an arbitrary data "value". However a Checkpoint differs from these structures because it exists in all processes that are interested in it. A checkpoint is fully replicated to all nodes that are interested in it -- it is not a "distributed" dictionary where every node has partial data.
Uses
The primary use for checkpoint is synchronization of state between redundant programs. The "active" software writes a checkpoint, and the "standby" software reads it. This is more useful than a simple message interface because the checkpoint abstraction simultaneously presents the total state and the incremental state changes. Total state access is required when a "cold" standby is first started, when the standby fails and is restarted, and when the active fails over if the "warm" standby has not been actively tracking state changes. Incremental state changes (deltas) are required so a "hot" standby can continually update its program state in response to the active software's actions.
Checkpoint is also used whenever commonly-used data needs to be distributed throughout the cluster. Checkpoints can be single-writer, multiple-reader (more efficient), or multiple-writer, multiple-reader.
A Checkpoint is inappropriate when the data is not often used. Although a checkpoint may be written to disk for recovery during a catastrophic failure event, the entire checkpoint data set is stored in RAM. Therefore a traditional replicated database is more appropriate for a large and/or rarely used data set.
Major Features
Most major features can be selected at object creation time to optimize for speed or for utility
Replicated: Replicated efficiently to multiple nodes
Nested: Checkpoint values can be unique identifiers that automatically resolve-and-lookup in another Checkpoint
Persistent: Checkpoints can automatically store themselves to disk (Persistence)
Notifiable: You can subscribe to get notified of changes to Checkpoints (Event)
Shared memory: For efficiency, a single service per node can maintain coherency (with the rest of the cluster) of a checkpoint. To do this the checkpoint is stored in shared memory
Transactional: Checkpoint operations can be wrapped in a cluster-wide Transaction
Partial record updates: A partial record update occurs when a checkpoint entry (key,value) exists and an application writes a subset of the value. For example, write 100 bytes to offset 10000100 to 10000200. The value of a partial record update becomes compelling when the value of the checkpoint entry is very long which is why the prior example was used.
Design
In this document the term "Checkpoint" will be used to refer to the entire replicated checkpoint abstraction. The term "local replica" will be used to refer to a particular copy of the checkpoint data. "Shared replica" refers to a process that only accesses the checkpoint via shared memory.
Process Access
A Checkpoint can be located in process private memory or in shared memory based on an option when the Checkpoint is created.
A Checkpoint that is used by multiple processes on the same node should be located in shared memory for efficiency.
Checkpoint Creation
A checkpoint is always identified by a Handle. At the API layer, a string name can be used. If the latter, this name will be registered with the Name service using the checkpoint's Handle as the value.
Checkpoint Retention Timer and Deletion
Introduction
When all processes close a checkpoint wait N (configurable) seconds and then delete it from memory. This is the checkpoint "retention time".
Each checkpoint is provided with retentionDuration argument when it's opened. When the last call to checkpoint close is performed, a timer is started with retentionDuration and when the timer expires, data of this checkpoint will be deleted from memory. Let's say processes A and B have a checkpoint open which is configured for a 5 minute retention time. A exits. The checkpoint stays "alive". B exits. Now no process has the checkpoint open. The system does NOT close the checkpoint. At 3 minutes after there were no users, process C opens the checkpoint. It opens the original checkpoint data because it was retained for that time. Now process C closes the checkpoint. Again no process has it open so the timer starts. After 5 minutes the data is deleted from shared memory.
For persistent checkpoints, we should have another field "persistentRetentionTime" that configures how long the data is retained on disk.
The purpose of these retention times is to clean up unused resources, but not so quickly that a failure and restart will cause the data to be deleted.
Implementation (retention timer is a separate process)
Because each checkpoint is stored in a file in shared memory, so, we will iterate all these files to see if there is a process opening or closing a checkpoint. To do this, we'll add "lastUsed" parameter to each checkpoint header. lastUsed stores the time and will updated when one of the following operations invoked: init checkpoint, read checkpoint and write checkpoint. Local copy of lastUsed for each checkpoint is stored as soon as retention timer starts. When retention timer expires, we'll compare the its local value with the one read from the shared memory header: if the result is different, this means the checkpoint is being used, so restart the retention timer, otherwise delete the checkpoint.
The implementation can periodically update the checkpoint shared memory to get the newly added checkpoint and update. To achieve this, we must have a process (named ckptretention). It monitors all the created checkpoint as described above. This process should be started along with other SAFplus servers (safplus_amf, safplus_log,...)
clCustomization.hxx
1 SAFplusI
2 {
3 enum {
4 CkptUpdateDuration = 60, /* This is the configured duration in second for which the program update the checkpoint, means to get changed checkpoint parameters such as last used time or there is any new checkpoint added */
5 CkptRetentionDurationDefault = 28800, /* This is the default retention duration in second for the retention timer to decide if a checkpoint data is deleted from memory */
6 };
7 //...
8 };
clCkptIpi
clCkptApi.hxx
In each Checkpoint constructor, add more argument named "retentionDuration" like this:
1 Checkpoint(const Handle& handle, uint_t flags, uint64_t retentionDuration, uint_t size=0, uint_t rows=0);
2 Checkpoint(uint_t flags, uint64_t retentionDuration, uint_t size=0, uint_t rows=0);
3 Checkpoint(); // The default constructor, no argument supplied, retentionDuration will use the default value from clCustomization.hxx
4
===== clckpt