Introduction

Feature List

Availability Management

Clusters are defined through a Yang XML model. This model describes nodes in the cluster. Applications running on these nodes and their redundancy relationships. Applications are monitored for failure, automatically restarted upon failure, and active standby roles are assigned. 1+1, M+N redundancy models are supported as plug ins. More models could be plugged in as needed.

Groups

Applications or objects (entities) can easily become members of one or many user-defined groups. Groups allow entities to discover peers for scalability or high availability functions. Groups automatically elect 2 specially-designated members: "active" and "standby" (the application determines what these designations actually mean -- if anything). Members can indicate their capability to become "active" or "standby" and can specify a "credential" value -- the highest "credential" wins, allowing application programmers to guide the election process (if desired). Election results can be permanent or can be superceded by the admittance of a higher-credential member into the cluster (implementing optional fail-back semantics).

Entities can look at the member list of any group -- even ones they do not belong to. Groups can therefore be used to discover service providers. For example, the "load balancer" application can find all "web server" applications running in the cluster. Applications can also send messages to the "group" using different sending modes: the message can be directed to the active entity, the standby entity, all entities (broadcast), or an arbitrary entity (round-robin load balancing).

Groups make it easy to implement complex scale-out or high availability strategies, with no special logic inside the application.

Checkpoint

The checkpoint entity forms the backbone of the coordination of state information between nodes in the cluster. Abstractly, it is an in-RAM "hash table", "dictionary", or "map" (there are many names for this concept) -- that is, a user writes and reads an arbitrary data "key" that maps to an arbitrary data "value". However a Checkpoint differs from the traditional "hash table" structure because it exists in all processes that are interested in it. But unlike cloud-based "distributed hash tables", a checkpoint is fully replicated to all nodes (that are interested in it). It is not a "distributed" dictionary where every node has partial data. This means that checkpoints are fully redundant and have very fast look-up.

Checkpoints are primarily used to replicate program state to redundant nodes. When a standby process becomes active, it can read the latest program state from the checkpoint (warm standby) or a standby process can opt to continually receive state change updates (hot standby). In the latter case, the standby can resume service more quickly because it does not need to update its internal state with checkpointed data when becoming active.

Messaging

SAFplus provides an efficient and high-performance messaging mechanism. The underlying transport is implemented via a plugin architecture allowing new transport protocols to be defined by the user. SAFplus provides out-of-the-box support for TIPC, UDP, TCP and SCTP protocols. Optional layers can be instantiated on top of these transports to provide reliability, traffic shaping, segmentation and reassembly, and bandwidth vs. latency performance optimization.

Remote Procedure Calls: Integrated with Google Protobuf

SAFplus messaging is integrated with the Google Protobuf serialization system to provide a high performance endian-aware object-oriented remote procedure call facility (RPC). RPCs can be defined in either the Protobuf language or the YANG data modelling language (RFC6020).

Fault

The fault service presents a clusterwide consistent view of the availability of any entity. Any entity in the cluster can query the status of, and report faults about, any other entity. When the fault service receives a fault report, passes the report to fault analysis modules that look at the fault in the context of other incoming faults and determine whether any entities should be marked as failed. For example analysis of a "cannot communicate with entity" fault may conclude that the fault lies in the reporter, not the reported. SAFplus provides an Availability Management Framework (AMF) fault module, and systems programmers can create their own fault analysis modules which will be loaded as plugins into the fault service.

YANG Based Configuration

Redundancy models are configured through YANG files that can be easily generated using a GUI cluster design tool. This configuration is accessible through NETCONF/CLI and API. It can be dynamically modified without having to restart the cluster.

Advanced CLI

A single CLI allows you to simultaneously connect to multiple elements in your network. Configuration is presented as YANG hierarchy which is intuitive to explore and navigate. Advanced functions allow you to compare differences between data across any sub-directory in the YANG hierarchy across multiple elements, save, restore and apply changes to any sub-directory. Advanced graphical functions allow you to plot statistical data directly on the CLI. Drag and Drop, cut and paste any CLI output onto a separate window to create a live display of the information which gets refreshed automatically as the data changes.

Logging

Logs that are generated by applications are written to shared memory in an efficient, non-blocking manner. This means that logging's impact on the performance of your application is minimized and the last logs before an application crash are preserved even though the application may have crashed before the logs were flushed to disk. The SAFplus logging server (often embedded within the SAFplus AMF) reads logs from shared memory, filters them by "stream" and SYSLOG (RFC 5424) severity levels, and outputs them to any of:

A log "stream" is a cluster-wide, application defined portal for log messages. Logs originating anywhere in the cluster sent to a particular stream will be received by every subcriber of that stream. Log streams are configured via the SAFplus management interface, via a configuration XML file, or manually via management APIs in any application.

Logging makes it easy for applications to send logs to a variety of destinations, with no special logic inside the application.

Name Service

The Name Service allows users to associate a string with a piece of arbitrary data (often a Handle). Both cluster-wide, node-wide, and process-only data can be associated with the name. This allows name service users to also associate shared memory pointers and local pointers to a particular name. For example, these pointers could reference objects which represent the local instantiation of an entity.

SAFplus: Product Brief (last edited 2017-01-06 01:18:21 by vk)