Differences between revisions 32 and 33
Revision 32 as of 2015-03-18 06:33:10
Size: 7036
Editor: HungTa
Comment:
Revision 33 as of 2015-03-19 04:20:26
Size: 7237
Editor: HungTa
Comment:
Deletions are marked like this. Additions are marked like this.
Line 61: Line 61:
   ||<bgcolor="#00FF00">[Hung] Can we retrieve an IP address of any node by this method? Let say we're in node 1 (NodeID:1) and we want to retrieve the IP address of node 10 (NodeID:10). How can we?||

SAFplus 7 Feature Discussion

SAFplus Messaging

Advanced Socket Layer

The socket abstraction (MsgSocket class) presents a simple read/write API for multiple scatter gather messages, consisting of receive, send, and flush APIs. At the message transport layer, an instance of the socket object directly communicates with the Linux level socket via system calls. However additional socket objects can be created that implement additional functionality, such as traffic shaping, message segmentation and reassembly and reliable messaging. These message sockets can modify the messages sent and received and then call a lower level MsgSocket instance.

Functionality can therefore be selected at socket construction time by instantiation of a stack or tree of MsgSocket-derived classes, and a single implementation of an abstract concept (such as traffic shaping) can be applied to any message transport type.

After socket construction, application code is unaware that it is accessing layered sockets rather than a direct socket implementation, making it easy to change add or remove functionality as application requirements change.

MsgSocketLayering.svg MsgSocketLayering.svg

The diagram above shows 3 applications with different socket stacks. Application 1 is using UDP transport, traffic shaping and allows large messages (segmentation and reassembly). Application 2 does not need large messages; it is just using traffic shaping. Application 3 shows a complex configuration. Its node supports both UDP and TIPC transports, so the first layer above the transport ("splitter") uses the destination nodeId to determine which transport to route the message to. Large messages are supported so the "Segmentation And Reassembly" object is next. After that, the application may need both reliable and unreliable messages, so a "Joiner" object is added that is capable of providing 2 independent MsgSocket interfaces and combining them into one (by prefixing a "message type" character to the front of every message, for example). Finally, a "reliable message" MsgSocket object provides reliable messaging. The application code sees 2 MsgSocket interfaces; one for unreliable and one for reliable messages.

Note that is is expected that the receiving application will have the same socket stack, in cases where the MsgSocket objects modify message content.

SCTP transport plugin

SCTP stands for Stream Control Transmission Protocol. SCTP is the extension of TCP, therefore, it has some typical features:

  • Data transfer is reliable
  • Multi-stream support
  • Multi-homing support
  • Ordered delivery is not strictly enforced

Because of the data transport in stream, so there are 2 separate methods to open socket at server and client:

  • Server:
    • open a socket
    • set sctp socket options
    • bind the opened socket with IP address and port
    • listen connections
    • accept connections (optional)
    • receive/send from/to clients
  • Client:
    • open a socket
    • set sctp socket options
    • connect to the listening server socket
    • send/receive to/from server socket

However, to implement the peer-to-peer model using SCTP, the model has to be refined:

  • Each client supports its own listen socket derived from the "port" supplied to the messaging transport, so that another client can connect directly to its peer. For example, assuming there are 4 nodes: A(s1), B(s2), C(s3), D(s4). s1,s2,s3,s4 are the opened listen sockets (on the specified port) on each node. Assuming A wants to send message to B, C and D, we have to open 3 client sockets (on the same port as servers, of course) that connect to s2,s3,s4 respectively. After that, from A, messages can be sent to s2,s3,s4 as well as received from them. Communications sockets are opened on an as-needed basis. That is, the socket from A to C is not opened until a message is sent from A to C or from C to A. The communication socket is then but left open and reused for the duration of the application.

* Node IP addresses

  • The initial implementation uses the last 8 bits (D) of the A.B.C.D IP address as the node ID (See UDP example). Isolate the code that transforms node IDs into IP addresses into a function call so that we can change the node to IP address translation algorithm easily for "cloud mode" application (see below).

Network Address Mapping

There are two techniques to convert node IDs into network addresses. "Direct mapping" and "cloud mode".

  • The direct mapping technique is useful in a private backplane network. In this case, the network address is algorithmically derived from the node ID and some statically configured environment variable data. For example, for IP addresses the environment variable is defined:

SAFPLUS_BACKPLANE_NETWORK=169.254.26/9  Given a node ID (say 10) the system can construct an IP address: 169.254.26.10

  • [Hung] Do you mean that the constructed IP address 169.254.26.10 is the virtual one and we'll use it throughout SAFplus?

    [Hung] Can we retrieve an IP address of any node by this method? Let say we're in node 1 (NodeID:1) and we want to retrieve the IP address of node 10 (NodeID:10). How can we?

  • Cloud mode assumes that network addresses are arbitrary and contain more bits than the node ID. Therefore it is impossible to algorithmically transform a node ID into a network address.

In this mode, every node is configured via environment variable with the IP address of "well known" nodes in the cluster. Best practices are to make this at least 2: SAFPLUS_CLOUD_PEERS=54.23.54.13,mycluster.mycompany.com Note that domain names are allowed. The new node will contact one of these nodes to request:

  1. A node ID (the new node can supply a preferred ID and indicate whether it MUST have this ID -- i.e return error if ID is already taken, or whether it is merely a preference)
  2. the mapping between node IDs and addresses.

Upon each new registration, the cloud mode server will notify all other nodes of the existence of this node.

This is a generic mapping service so can be implemented once for all transports. However, the service should use the underlying transport (perhaps via a special port/ socket) so that part may be tricky :-).

SAFplus Management

Management Access APIs

We need a c++ API to issue management object set and gets from any application. This will allow the management data to be generically manipulated and so we will not need to support a lot of application specific APIs like log stream creation or Dynamic HA operations.

Applications can also use these APIs to implement a new northbound protocol (like SNMP and NETCONF) if they need to.

Proposal:

mgtSet(const string& pathSpec, const string& value, Transaction& t);

string mgtGet(const string& pathSpec);

void mgtCreate(const string& pathSpec);
void mgtDelete(const string& pathSpec);

All APIs raise an exception if they cannot complete.

For example:

mgtSet("/Log/stream["mystream"]/filename", "/var/log/myapplog")

SAFplus: Feature Discussion (last edited 2015-07-24 14:01:34 by AndrewStone)