Size: 993
Comment:
|
Size: 3973
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
= Dictionary = | ## page was renamed from Dictionary = Checkpoint = |
Line 3: | Line 4: |
The dictionary entity forms the backbone of the coordination of information between nodes in the cluster. This entity is fully replicated to all nodes that are interested in it -- it is not a "distributed" dictionary where every node has partial data. | The checkpoint entity forms the backbone of the coordination of information between nodes in the cluster. Abstractly, it is a "dictionary", "map", or database table data structure -- that is, a user provides an arbitrary data "key" that returns an arbitrary data "value". However a Checkpoint differs from these structures because it exists in all processes that are interested in it. A checkpoint is fully replicated to all nodes that are interested in it -- it is not a "distributed" dictionary where every node has partial data. == Uses == The primary use for checkpoint is synchronization of state between redundant programs. The "active" software writes a checkpoint, and the "standby" software reads it. This is more useful than a simple message interface because the checkpoint abstraction simultaneously presents the total state and the incremental state changes. Total state access is required when a "cold" standby is first started, when the standby fails and is restarted, and when the active fails over if the "warm" standby has not been actively tracking state changes. Incremental state changes (deltas) are required so a "hot" standby can continually update its program state in response to the active software's actions. Checkpoint is also used whenever commonly-used data needs to be distributed throughout the cluster. A Checkpoint is inappropriate when the data is not often used. Although a checkpoint may be written to disk for recovery during a catastrophic failure event, the entire checkpoint data set is stored in RAM. Therefore a traditional replicated database is more appropriate for a large and/or rarely used data set. |
Line 9: | Line 19: |
* Replicated | * '''Replicated:''' |
Line 11: | Line 21: |
* Nested Dictionary values can be unique identifiers that resolve to another Dictionary * Persistent Dictionaries can automatically store themselves to disk ([[Persistence]]) * Notifiable You can subscribe to get notified of changes to dictionaries ([[Event]]) * Shared memory For efficiency, a single service per node can maintain coherency (with the rest of the cluster) of a dictionary. To do this the dictionary is stored in shared memory * Transactional Dictionary operations can be wrapped in a [[Transaction]] |
* '''Nested:''' Checkpoint values can be unique identifiers that automatically resolve-and-lookup in another Checkpoint * '''Persistent:''' Checkpoints can automatically store themselves to disk ([[Persistence]]) * '''Notifiable:''' You can subscribe to get notified of changes to Checkpoints ([[Event]]) * '''Shared memory:''' For efficiency, a single service per node can maintain coherency (with the rest of the cluster) of a checkpoint. To do this the checkpoint is stored in shared memory * '''Transactional:''' Checkpoint operations can be wrapped in a cluster-wide [[Transaction]] == Design == In this document the term "Checkpoint" will be used to refer to the entire replicated checkpoint abstraction. The term "local replica" will be used to refer to a particular copy of the checkpoint data. === Process Access === A Checkpoint can be located in process private memory or in shared memory based on an option when the Checkpoint is created. A Checkpoint that is used by multiple processes on the same node should be located in shared memory for efficiency. === Checkpoint Creation === A checkpoint is always identified by a [[Handle]]. At the API layer, a string name can be used. If the latter, this name will be registered with the [[Name]] service using the checkpoint's [[Handle]] as the value. === Internode Replication and Communication === ==== Discovery ==== The implementation of the checkpoint will register a new group with the [[Group]] service. It is identified by a well-known [[Handle]] or by the [[ClusterUniqueId]] returned by the [[Group]] registration. It may also be identified by a string entry in the [[Name]] service and all APIs that need a checkpoint will accept either a [[ClusterUniqueId]] or a string name. The group service shall be used to identify the process responsible for updating the local replica on the node -- the "node primary replica" -- and the process that can write to the checkpoint -- the "master replica". ==== Replication ==== |
Checkpoint
The checkpoint entity forms the backbone of the coordination of information between nodes in the cluster. Abstractly, it is a "dictionary", "map", or database table data structure -- that is, a user provides an arbitrary data "key" that returns an arbitrary data "value". However a Checkpoint differs from these structures because it exists in all processes that are interested in it. A checkpoint is fully replicated to all nodes that are interested in it -- it is not a "distributed" dictionary where every node has partial data.
Uses
The primary use for checkpoint is synchronization of state between redundant programs. The "active" software writes a checkpoint, and the "standby" software reads it. This is more useful than a simple message interface because the checkpoint abstraction simultaneously presents the total state and the incremental state changes. Total state access is required when a "cold" standby is first started, when the standby fails and is restarted, and when the active fails over if the "warm" standby has not been actively tracking state changes. Incremental state changes (deltas) are required so a "hot" standby can continually update its program state in response to the active software's actions.
Checkpoint is also used whenever commonly-used data needs to be distributed throughout the cluster.
A Checkpoint is inappropriate when the data is not often used. Although a checkpoint may be written to disk for recovery during a catastrophic failure event, the entire checkpoint data set is stored in RAM. Therefore a traditional replicated database is more appropriate for a large and/or rarely used data set.
Major Features
Most major features can be selected at object creation time to optimize for speed or for utility
Replicated: Replicated efficiently to multiple nodes
Nested: Checkpoint values can be unique identifiers that automatically resolve-and-lookup in another Checkpoint
Persistent: Checkpoints can automatically store themselves to disk (Persistence)
Notifiable: You can subscribe to get notified of changes to Checkpoints (Event)
Shared memory: For efficiency, a single service per node can maintain coherency (with the rest of the cluster) of a checkpoint. To do this the checkpoint is stored in shared memory
Transactional: Checkpoint operations can be wrapped in a cluster-wide Transaction
Design
In this document the term "Checkpoint" will be used to refer to the entire replicated checkpoint abstraction. The term "local replica" will be used to refer to a particular copy of the checkpoint data.
Process Access
A Checkpoint can be located in process private memory or in shared memory based on an option when the Checkpoint is created.
A Checkpoint that is used by multiple processes on the same node should be located in shared memory for efficiency.
Checkpoint Creation
A checkpoint is always identified by a Handle. At the API layer, a string name can be used. If the latter, this name will be registered with the Name service using the checkpoint's Handle as the value.
Internode Replication and Communication
Discovery
The implementation of the checkpoint will register a new group with the Group service. It is identified by a well-known Handle or by the ClusterUniqueId returned by the Group registration. It may also be identified by a string entry in the Name service and all APIs that need a checkpoint will accept either a ClusterUniqueId or a string name.
The group service shall be used to identify the process responsible for updating the local replica on the node -- the "node primary replica" -- and the process that can write to the checkpoint -- the "master replica".