Constants/Tweaks
- Constants that the customer may want to tweak should go in the clCustomization.hxx or clCustomization.cxx files
- This puts all tweaks in a single file so they can be easily identified and changed.
Object Methods
All APIs that require a SAFplus::Handle ClusterUniqueId shall also accept a string name. This name shall be looked up in the Name service and resolved to a SAFplus::Handle.
All APIs that modify the object or global state shall accept an optional Transaction parameter. The call shall validate and reserve the state change but not execute until the Transaction is committed.
All APIs that block shall provide accept a Wakeable object to offer both threaded and callback semantics as described in detail in the subsequent section.
Synchronous, asynchronous and transactional programming
Most APIs will be synchronous. APIs that would block for significant time should be (such as those that send a message and wait for a reply) offered as both synchronous and asynchronous flavors. This shall be implemented as follows:
- APIs shall accept an optional parameter as the last in the parameter list. This parameter is an object that can be used in a variety of ways to implement callback, blocking or transactional semantics.
- If this parameter is not supplied or is NULL, the API is being called synchronously.
Example:
Version HelloExchange(int node, Version myVersion, Wakeable& wake); // Returns other node's version, raises exception on error
Application code can call this function in the following ways:
- Synchronous
1 HelloExchange(1,"1.0.0");
In the above case, since "wake" is not specified, the HelloExchange function internally creates a semaphore an blocks. It is essentially shorthand for this equally valid method:
- Asynchronous
Simultaneous Synchronous
This technique preserves the readability of synchronous code but is actually asynchronous.
In this case, the HelloExchange function is run "maxNodes" times asynchronously and then blocks on the sem.take() function until every HelloExchange call completes.
Returning values
Return values can be handled in several ways. First, they can be specified as members of a Wakeable derivation:1 class HelloExchangeResult 2 { 3 Version version; 4 ThreadSemaphore& sem; 5 void give(int amt) { sem.give(amt); } 6 }; 7 8 ThreadSemaphore sem; 9 HelloExchangeResult result[maxNodes](sem); 10 for (int node = 0; node < maxNodes; node++) 11 { 12 HelloExchange(1,"1.0.0",result[node]); 13 } 14 sem.take(maxNodes); // Blocks until the sem is "given" maxNodes times 15 for (int node = 0; node < maxNodes; node++) 16 { 17 printf("Node %d version %s", node, result[node].version); 18 }
In this case, a class is defined that contains both the return value and a reference to the "wakeable" object. The HelloExchange function is run "maxNodes" times asynchronously and this thread blocks on the sem.take(). When the response is received, the return value is placed into the passed HelloExchangeResult object and the semaphore is given. When the semaphore is given "maxNodes" times, this thread wakes up and the results are processed.
Note that this solution does not process the results when they are received. It is possible to do so using a "ThreadCondition" object instead of a ThreadSemaphore since that object can wake the thread each time the semaphore is given.
Classic Asynchronous
The classic asynchronous technique uses function callbacks to continue processing. This technique is STRONGLY DISCOURAGED because it is:- bug prone
- difficult to debug because execution context is not carried in a thread but by an object on a list
- difficult for others to understand because conceptual operations are broken across multiple functions instead of being encapsulated.
- difficult to maintain.
1 class HelloExchangeResult:public Wakeable 2 { 3 int node; 4 Version version; 5 void give(int amt) { printf("Node %d version %s", node, version); } 6 }; 7 8 ... 9 HelloExchangeResult* result = new [maxNodes]; 10 for (int node = 0; node < maxNodes; node++) 11 { 12 result[node].node = node; 13 HelloExchange(1,"1.0.0",result[node]); 14 } 15 return; 16
- Recipes
To block with a timeout, ThreadConditions can be used instead of ThreadSemaphores. Another timeout solution is to place the semaphore on a timer queue before calling the API. For multi-process, use ProcessSemaphores.
Configuration
All SAFplus components manage their configuration via the SAFplus::Mgt objects and backing database. SAFplus will convert XML configuration files from and to the database (automatically on startup if the DB does not exist) so no direct use of XML is necessary.
Statistics
Statistics are reported through the same SAFplus::Mgt objects (but statistics are not stored to the Database)
Fault
If a component takes a fault, it may assert. If a component detects a fault, timeout or other error in another component, it must report that error to the Fault Manager component and then retry. It may not assume that the other component is faulted, failed or dead until it receives a notification from the Fault Manager announcing the component's failure.
The Fault Manager is authoritative:
If a component is working fine and receives an announcement reporting its own failure, it must quit. If a component thinks that another component is working but receives a fault manager notification of its death, it must behave as though the component has failed.
Allocation/Free
Reduce use of new/delete or malloc/free. These operations are inefficient, and often turn an O(1) operation to O(log(n)) where n is the number of memory fragments in the program.
Use "intrusive" data structures whereever possible, since other data structures must malloc metadata.
One common design pattern is to reuse objects. Define the object with an intrusive list, and add "Object* allocObject()" and "void freeObject(Object*)" helper functions. The "freeObject" function puts the object onto a "free list". The allocObject removes an object from the free list (if one exists). If no object exists on the list, it creates a new object using new or malloc. This technique has a few advantages:
- Performance: reduced use of malloc/free or new/delete
- Memory corruption: Use-after-free bugs do not corrupt a random block of memory, but an instance of Object*. This causes the bug to be located nearer to its source. Also, you have the option to add debugging code to freeObject(). This code could record the file/line of the caller to freeObject, it could clear the object being freed to force a seg fault in the case of use-after-free.
- allocObject can return the most recently used object, increasing data locality (reducing cache misses).
- Memory leaks: allocObject and freeObject can keep call counts which should on average be =. Also, after some time, allocObject should never need to call "new" or "malloc".
Note, you can have allocObject/freeObject delete some of the objects on the list if they are not used for a significant period of time to reduce memory use.