Differences between revisions 1 and 8 (spanning 7 versions)
Revision 1 as of 2013-07-06 04:45:08
Size: 993
Editor: AndrewStone
Comment:
Revision 8 as of 2014-06-13 20:45:58
Size: 3973
Editor: AndrewStone
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
= Dictionary = ## page was renamed from Dictionary
= Checkpoint =
Line 3: Line 4:
The dictionary entity forms the backbone of the coordination of information between nodes in the cluster. This entity is fully replicated to all nodes that are interested in it -- it is not a "distributed" dictionary where every node has partial data. The checkpoint entity forms the backbone of the coordination of information between nodes in the cluster. Abstractly, it is a "dictionary", "map", or database table data structure -- that is, a user provides an arbitrary data "key" that returns an arbitrary data "value". However a Checkpoint differs from these structures because it exists in all processes that are interested in it. A checkpoint is fully replicated to all nodes that are interested in it -- it is not a "distributed" dictionary where every node has partial data.

== Uses ==

The primary use for checkpoint is synchronization of state between redundant programs. The "active" software writes a checkpoint, and the "standby" software reads it. This is more useful than a simple message interface because the checkpoint abstraction simultaneously presents the total state and the incremental state changes. Total state access is required when a "cold" standby is first started, when the standby fails and is restarted, and when the active fails over if the "warm" standby has not been actively tracking state changes. Incremental state changes (deltas) are required so a "hot" standby can continually update its program state in response to the active software's actions.

Checkpoint is also used whenever commonly-used data needs to be distributed throughout the cluster.

A Checkpoint is inappropriate when the data is not often used. Although a checkpoint may be written to disk for recovery during a catastrophic failure event, the entire checkpoint data set is stored in RAM. Therefore a traditional replicated database is more appropriate for a large and/or rarely used data set.
Line 9: Line 19:
 * Replicated  * '''Replicated:'''
Line 11: Line 21:
 * Nested
 Dictionary values can be unique identifiers that resolve to another Dictionary
 * Persistent
 Dictionaries can automatically store themselves to disk ([[Persistence]])
 * Notifiable
 You can subscribe to get notified of changes to dictionaries ([[Event]])
 * Shared memory
 For efficiency, a single service per node can maintain coherency (with the rest of the cluster) of a dictionary. To do this the dictionary is stored in shared memory
 * Transactional
 Dictionary operations can be wrapped in a [[Transaction]]
 * '''Nested:'''
 Checkpoint values can be unique identifiers that automatically resolve-and-lookup in another Checkpoint
 * '''Persistent:'''
 Checkpoints can automatically store themselves to disk ([[Persistence]])
 * '''Notifiable:'''
 You can subscribe to get notified of changes to Checkpoints ([[Event]])
 * '''Shared memory:'''
 For efficiency, a single service per node can maintain coherency (with the rest of the cluster) of a checkpoint. To do this the checkpoint is stored in shared memory
 * '''Transactional:'''
 Checkpoint operations can be wrapped in a cluster-wide [[Transaction]]


== Design ==

In this document the term "Checkpoint" will be used to refer to the entire replicated checkpoint abstraction. The term "local replica" will be used to refer to a particular copy of the checkpoint data.

=== Process Access ===

A Checkpoint can be located in process private memory or in shared memory based on an option when the Checkpoint is created.

A Checkpoint that is used by multiple processes on the same node should be located in shared memory for efficiency.

=== Checkpoint Creation ===

A checkpoint is always identified by a [[Handle]]. At the API layer, a string name can be used. If the latter, this name will be registered with the [[Name]] service using the checkpoint's [[Handle]] as the value.

=== Internode Replication and Communication ===


==== Discovery ====

The implementation of the checkpoint will register a new group with the [[Group]] service. It is identified by a well-known [[Handle]] or by the [[ClusterUniqueId]] returned by the [[Group]] registration. It may also be identified by a string entry in the [[Name]] service and all APIs that need a checkpoint will accept either a [[ClusterUniqueId]] or a string name.

The group service shall be used to identify the process responsible for updating the local replica on the node -- the "node primary replica" -- and the process that can write to the checkpoint -- the "master replica".

==== Replication ====
 

Checkpoint

The checkpoint entity forms the backbone of the coordination of information between nodes in the cluster. Abstractly, it is a "dictionary", "map", or database table data structure -- that is, a user provides an arbitrary data "key" that returns an arbitrary data "value". However a Checkpoint differs from these structures because it exists in all processes that are interested in it. A checkpoint is fully replicated to all nodes that are interested in it -- it is not a "distributed" dictionary where every node has partial data.

Uses

The primary use for checkpoint is synchronization of state between redundant programs. The "active" software writes a checkpoint, and the "standby" software reads it. This is more useful than a simple message interface because the checkpoint abstraction simultaneously presents the total state and the incremental state changes. Total state access is required when a "cold" standby is first started, when the standby fails and is restarted, and when the active fails over if the "warm" standby has not been actively tracking state changes. Incremental state changes (deltas) are required so a "hot" standby can continually update its program state in response to the active software's actions.

Checkpoint is also used whenever commonly-used data needs to be distributed throughout the cluster.

A Checkpoint is inappropriate when the data is not often used. Although a checkpoint may be written to disk for recovery during a catastrophic failure event, the entire checkpoint data set is stored in RAM. Therefore a traditional replicated database is more appropriate for a large and/or rarely used data set.

Major Features

Most major features can be selected at object creation time to optimize for speed or for utility

  • Replicated: Replicated efficiently to multiple nodes

  • Nested: Checkpoint values can be unique identifiers that automatically resolve-and-lookup in another Checkpoint

  • Persistent: Checkpoints can automatically store themselves to disk (Persistence)

  • Notifiable: You can subscribe to get notified of changes to Checkpoints (Event)

  • Shared memory: For efficiency, a single service per node can maintain coherency (with the rest of the cluster) of a checkpoint. To do this the checkpoint is stored in shared memory

  • Transactional: Checkpoint operations can be wrapped in a cluster-wide Transaction

Design

In this document the term "Checkpoint" will be used to refer to the entire replicated checkpoint abstraction. The term "local replica" will be used to refer to a particular copy of the checkpoint data.

Process Access

A Checkpoint can be located in process private memory or in shared memory based on an option when the Checkpoint is created.

A Checkpoint that is used by multiple processes on the same node should be located in shared memory for efficiency.

Checkpoint Creation

A checkpoint is always identified by a Handle. At the API layer, a string name can be used. If the latter, this name will be registered with the Name service using the checkpoint's Handle as the value.

Internode Replication and Communication

Discovery

The implementation of the checkpoint will register a new group with the Group service. It is identified by a well-known Handle or by the ClusterUniqueId returned by the Group registration. It may also be identified by a string entry in the Name service and all APIs that need a checkpoint will accept either a ClusterUniqueId or a string name.

The group service shall be used to identify the process responsible for updating the local replica on the node -- the "node primary replica" -- and the process that can write to the checkpoint -- the "master replica".

Replication

SAFplus: Checkpoint (last edited 2015-09-18 02:36:06 by HungTa)