Diff for "Log Replication And Distribution"

Differences between revisions 6 and 9 (spanning 3 versions)

Log Replication and Distribution

There are 2 additional features needed in the log system.

1. Replication: Logs are stored on both system controllers, regardless of their origin

2. Distribution: Other applications (anywhere on the cluster) can receive every log as it is written.

These features require higher level SAFplus services which may not exist or be initialized. Therefore these features need to be optionally enabled and be carefully constructed to not cause undefined symbols when a subset of SAFplus is deployed.

Initialization

To implement these features we will need to use the name and group messaging library. However, we also want log to work WITHOUT group (for situations where group has not been initialized yet and for cases where a subset of SAFplus is being utilized).

Therefore, I want you to create a new directory under log7 and a new library to implement these features. I made a directory called "rep" under log7 and please call the library libclLogRep.so.

There needs to be a separate initialization function in safplusInitialize, but not a separate identifier bit. Since logRep is essentially "glue" between log and messaging, logRep should be automatically initialized if log, Group and Name are initialized.

You will need to add "hooks" from the log to logRep so that log can call logRep IF it is initialized, but in such a way that the compiler does not automatically include logRep if log exists. One way to do this is to define a structure with the needed functions in the log service (in c++ you'd use a class with pure virtual functions). Create a global pointer to that structure but set it to NULL. Of course, the functions are only called if the pointer is not null...

When/if logRep is initialized, it sets the global pointer to non-null -- pointing to its implementations.

Configuration

Configuration is defined in the SAFplusLog.yang file

Log streams have a configuration enum called "replication" with the following values:

NONE: No replication
SYSTEM_CONTROLLERS: replicate to the system controllers
APPLICATIONS: replicate to interested applications.
ANY: replicate to system controllers and interested applications.

Note this can actually be a bit-field. Also, if even if SYSTEM_CONTROLLERS is defined the log server does not need to refuse applications that want notifications. So the actual implementation semantics are that SYSTEM_CONTROLLERS and ANY behave exactly the same (this is easier to implement, see proposed implementation below).

        leaf replicate {
          type enumeration {
            enum NONE { description "No replication of this log stream"; value 0; }
            enum SYSTEM_CONTROLLERS { description "Replicate to the system controllers"; value 1;}
            enum APPLICATIONS { description "Replicate to interested applications"; value 2; }
            enum ANY { description "Replicate to the both system controllers and interested applications"; value 3; }
            }
        }

Log stream replication behavior is defined in the ServerConfig object:

maximumRecordsInPacket: this defines the largest packet allowed.
processingInterval: this defines the longest time logs can be delayed for aggregation before sending.

Other configuration?

Operation

Log Stream Creation

When a log stream is created, register the stream name with the Name service with the stream's handle. This lets applications look up the stream by name. This should be done (if Group and LogRep is initialized) regardless of the setting in the replication field since this look up will allow applications to issue logs by stream name not just by handle.

Questions:

Stream object doesn't include "stream handle". How can we determine it?
The stream handle should be registered with the Name service only once, however assume there are several nodes (system controllers and payloads), when they starts, the their log servers start, then the stream handle will be registered multiple times. How do we avoid this?

If the "replication" enum is not None:

Use a well-known sub-handle of the log stream to create a group. The members in this group will be all entities that are interested in receiving the logs from this stream.

Questions:

Same as question 2's situation above, if there are multiple nodes, the group cannot be created multiple times.

If the "replication" enum is SystemControllers or All, then add a handle (maybe well-known?) that will resolve to the system controller's log spooler object (see below).

Questions: I don't know what's the purpose of this handle for the log spooler object?

Note that any application can join this group by using the Name service to look up the log stream by name, and then adding itself using the Group service. Applications can even join the group BEFORE the log stream is created if the log stream's handle is "well-known".

If the "replication" field is changed at any time during operation, the log stream must be updated dynamically to reflect the changed state.

Issuing Logs

When a log is written, the appropriate LogRep virtual function should be called with the text of the log if it exists and if the "replication" field is set.

Inside this virtual function, multiple logs will be serialized into a single message for efficiency. This should not happen as a single text string. Use a format that allows the logs to be broken up on the receiver side, and identifies information like endian and message version. For example:

 Header:
  int16 IdAndEndian;  // This is a well-known number
  int8  version;
  int8  extra;
  Handle streamHandle;  // Because one receiver could be subscribed to multiple streams.
  int16 numLogs;

 Body: (repeated sets of length, value)
  int16 logLength;
  char  log[logLength];
  int16 logLength;
  char  log2[logLength];
  .
  .
  .

The number of logs serialized per message should vary, depending on the number of logs being issued. A high rate of logging should result in a large number of logs per message. A low rate of logging should result in a single log per message. You can use a leaky bucket algorithm to structure this.

The max # of logs per message and the maximum time to delay before issuing a log are defined in configuration.

What other configuration fields are needed?

Once the log is formatted, actually issuing the message is simple. Use the Group message send API:

send(data, length,GroupMessageSendMode::SEND_BROADCAST);

Receiving Logs

Please look at the Object Message (clObjectMessager.hxx) infrastructure to discover how to receive log messages.

Log Spooler Object

The log spooler object exists on the system controllers (and any other node if an application explicitly instantiates one) inside the log server.

Questions: What I understand is that log spooler object is to receive logs from other nodes. If so, if it doesn't exist on other node, how do we receive logs?

It receives logs using the Group infrastructure just like any application would.

When it receives logs, it writes them to disk using the exact SAME code as in log. Do not copy the logic that writes logs to disk, call a function in the log service. But we do not want to write these logs to shared memory, etc, so you may need to create a special API for issuing logs directly and refactor the log service a little bit to isolate the logic that writes logs to disk.

Testing and Examples

An example application should be created that creates a replicated log stream and writes to it in a loop with varying rates. The logs written should be different but follow an algorithm so that the receiver can validate that they are correct. For example, a pseudo-random number generator with a known seed could be used to create a stream of data that is verifiable.

So the log could be "test log 8 [qieddiek]" where 8 is a "randomly" generated length from 1 to 1024 (say), and "qieddiek" is a "randomly" generated string of that length.

In another thread or process (use a command line parameter to change operational behavior), it should register to receive logs from that stream, print them to the screen, and validate them by comparison with the same pseudo-random number generator and seed.

Please write the receiver portion clearly and with descriptive comments so that it can be used as an example for other applications that want to receive logs.

SAFplus: Log Replication And Distribution (last edited 2014-10-07 20:30:14 by AndrewStone)

-  ⇤ ← Revision 6 as of 2014-10-02 08:50:17 → 
  Size: 8383
  Editor: HungTa
  Comment:
+   ← Revision 9 as of 2014-10-02 09:05:20 → ⇥
  Size: 8676
  Editor: HungTa
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 64:
-||<bgcolor="#FF8080">Questions:|| Same as question 2 situation above, there are multiple nodes, the group cannot be created multiple times.||
+||<bgcolor="#FF8080">Questions:|| Same as question 2's situation above, if there are multiple nodes, the group cannot be created multiple times.||
-Line 68:
+Line 66:
+||<bgcolor="#FF8080">Questions: I don't know what's the purpose of this handle for the log spooler object?||
-Line 117:
+Line 116:
+||<bgcolor="#FF8080">Questions: What I understand is that log spooler object is to receive logs from other nodes. If so, if it doesn't exist on other node, how do we receive logs? ||

Site

General

Services

Objects

Wiki

Page

User