SAFplus Group service

The Group service consists of a client library and an active-standby server process running on the system controller nodes. The group service simply tracks membership to abritrarily defined sets and allows an "active" and "standby" to be elected. Clients can access the list of members of the group and get the current active and standby. Notifications are posted for any group membership or role change. Failures of nodes or applications can cause automatic removal of membership for affected members and automatic re-election if the active or standby members are affected.

A group is identified by a well-known HandleT, or a dynamically created one. A group can be named via the Name service and all APIs that require a group will also accept a string name.

Well Known Group: Cluster Node Membership

The Cluster Node Membership group controls admission into the SAFplus cluster and elects Active and Standby system controllers.

Well Known Group: Cluster Component Membership

All components are in this group. No Active/Standby is elected.

Active entity responsibilities

The Active entity has the following additional responsibilities:

1. Entity admission: If the active entity registers an admission filter, it will be called before new entities are allowed into the group.

Failover

If the group detects failure of the active, the standby will become active and an election run for a new standby.

Elections

If the group has enabled elections, when the entity designated the active or standby fails a reelection will occur.

A "bully" election shall be used. It occurs as follows: Any group member can call for an election at any time for Active, Standby or both Active and Standby roles. Every entity in the group shall respond with an election message to every other entity containing its credentials. The credential shall be shifted and ORed with the entities' unique ID to ensure that credentials are unique. The member with the highest credentials shall win the election. This member shall send a message "claiming" the role. Receipt of this message ends the election.

Groups can modify the election algorithm by choosing how the credential number is calculated. Some examples:

return 0: No credentials means the highest ID gets elected. This will cause the groups' Active/Standby to "fail-back" to a restarted node... return 2 if master, 1 if standby, else 0: This will cause both the Active and Standby roles to be "sticky" -- that is, the current master/standby is preferred over a newly joined entity.

Synergy

  1. A group can be named via the Name service.

  2. The Group service interacts with the messaging service, allowing several modes:
    • broadcast messages to all members of the group
    • send to master
    • local round robin
    The default mode is defined by the group.

Implementation Issues

Implementation

The Group service shall be a library that can be linked with SAFplus services or applications.

The AMF server shall use the Group service and register the "Cluster Node Membership" group. All AMF processes will be in the Cluster Node Membership group and it will be used to elect the cluster "active" and "standby" (which are the same as the active and standby AMF servers).

   1 typedef SAFplus::Handle EntityIdentifier;
   2 
   3 class Group
   4   {
   5     public:
   6     
   7     enum
   8       {
   9         ACCEPT_STANDBY = 1,  // Can this entity become standby?
  10         ACCEPT_ACTIVE  = 2,  // Can this entity become active?
  11         IS_ACTIVE      = 4,
  12         IS_STANDBY     = 8
  13       };
  14 
  15 
  16     Group(SAFplus::Handle groupHandle) { init(groupHandle,me); }
  17     Group(); // Deferred initialization
  18     void init(SAFplus::Handle groupHandle);
  19 
  20     // Named group uses the name service to resolve the name to a handle
  21     Group(std::string name);  
  22 
  23     // register a member of the group.  This is separate from the constructor so someone can iterate through members of the group without being a member.  Caller owns data when register returns.
  24     void register(EntityIdentifier me, uint64_t credentials, const void* data, int dataLength, uint capabilities);  
  25 
  26     // If me=0 (default), use the group identifier the last call to "register" was called with.
  27     void deregister(EntityIdentifier me=0);
  28 
  29     // If default me=0, use the group identifier the last call to "register" was called with.
  30     void setCapabilities(uint capabilities, EntityIdentifier me=0);
  31     // This also returns the current active/standby state of the entity since that is part of the capabilities bitmap.
  32     uint getCapabilities(EntityIdentifier id); 
  33 
  34     // This also returns the current active/standby state of the entity since that is part of the capabilities bitmap.
  35     SAFplus::buffer& getData(EntityIdentifier id); 
  36 
  37     // Calls for an election
  38     std::pair<EntityIdentifier,EntityIdentifier> elect();  
  39 
  40     // std template like iterator
  41     class Iterator 
  42       {
  43       // See SAFplus::Checkpoint for an example
  44       };
  45 
  46     const Iterator begin(void) const {};
  47     const Iterator end(void) const {};
  48  
  49     bool isMember(EntityIdentifier id);
  50 
  51     void setNotification(SAFplus::Wakeable& w);  // call w.wake when someone enters/leaves the group or an active or standby assignment or transition occurs.  Pass what happened into the wakeable.
  52 
  53     EntityIdentifier getActive(void) const;
  54     EntityIdentifier getStandby(void) const;
  55 
  56   }

SAFplus: Group (last edited 2014-07-25 21:54:45 by AndrewStone)