Revision 13 as of 2016-08-10 02:34:05

Clear message

Running the Availability Management Framework (AMF) Example

The AMF provides redundancy, failure monitoring and restart for applications. To implement this, the AMF requires a description of the cluster, which is essentially a description of all the programs that should be run, whether they "understand" SAF communications, and their redundancy relationships. This is called the "cluster model" or just "model".

An example model and simple redundant "Hello World" application is included with SAFplus and is located in the source code under "examples/eval/basic" (and on download.openclovis.com). This model specifies applications that are "SAF-aware" -- that is, the applications can communicate with the AMF to receive work assignments. Another example model is also provided at "examples/eval/exampleNonSafApp" that handles "non-SAF-aware" applications. Non-SAF-aware applications do not communicate with the AMF -- they may be legacy apps, standard Linux services (e.g Apache), or applications that you do not have access to the source code.

The "Basic" Example: Compiling and Running

This example presumes that you have already installed SAFplus, or built it from source.

First, Download and detar the example from http://download.openclovis.com/files/examples/SAFplusBasicEval.tgz or cd to <safplus>/examples/eval/basic if you are using SAFplus source from Github. Next build:

make V=1

The program "basicApp" will be created. If you installed SAFplus (via a package manager, for example), your application binary will be created in this directory, but if you are using SAFplus uninstalled from source, your binaries will be located in the SAFplus target directory (<safplus>/target/<architecture>/bin). Also, it is possible to run the application directly out of this build tree, but this tutorial will show the full packaging technique. So next build your executables and SAFplus into a single package:

make image

This executes commands that use the safplus_packager program to produce a tarball. Congratulations! You have successfully compiled and packaged a SAFplus application. In the next section, you will install and run it.

The "Basic" Example: Running

Copy this tarball to your (binary compatible) target computer and detar it. In this eval, we will use the build machine as the target, so simply detar it from its current location into a directory off of our home.

cd ~
tar xvfz basic/evalbasic.tgz

Next set up paths and environment variables, so that the SAFplus commands and applications can find the needed shared libraries:

cd evalBasic/bin
source setup

You should open this script and familiarize yourself with what it is doing since you will need to modify it for your own applications. Essentially it:

This command also sets up the essential SAFplus environment variables that control your backplane network interface, your chosen backplane messaging plugin, and your chosen database plugin. Please read this file and familiarize yourself with the settings.

Note that this setup script also attempts to increase kernel network buffer space. However these commands will only succeed if root. Increasing the kernel networking buffers is optional for this simple sample, but is highly recommended for any network-centric application.

Next, we need to load a cluster model. The cluster model defines what applications are running on what nodes in the cluster and their high availability relationship. Two example models are provided in this sample:

Please choose which option you want (this eval will use the single node version), cd to the bin directory, and "install" the model into the SAFplus database:

cd evalBasic/bin
./safplus_db -x SAFplusAmf1Node1SG1Comp.xml safplusAmf

If you get any "missing file" errors in this step, you did not set up the environment properly.

This command reads the model XML in and writes it to a the safplusAmf database. Since the database back-end is a plugin, you might have different results in this step. For example, if SQLite is your chosen database, a file called safplusAmf.db will be created in the current directory. However, by default this example uses the SAFplus checkpointing service as its database. We can see the table by running:

./safplus_name

You should see an entry named "safplus.mgt.safplusAmf.db". The AMF will read this database "file" to access the cluster model.

Note: To clean up all safplus shared memory segments (including this checkpoint table), run ./safplus_cleanup.

Finally, run the SAFplus AMF to start up the model:

./safplus_amf

Or you may want to start it up as a daemon:

nohup ./safplus_amf &

After a few seconds you will see 2 instances of the program "basicApp" running:

ps -efwww | grep basic

user       7092  7071  0 21:53 pts/2    00:00:00 ./basicApp c1
user       7093  7071  0 21:53 pts/2    00:00:00 ./basicApp c0

And depending on your logging level, you may see these applications outputting their status onto the console:

Tue Aug  9 21:53:23.355 2016 [main.cxx:371] (node0.7093.7093 : c0.APP.MAIN:00019 : INFO) Basic HA app: Standby.
Tue Aug  9 21:53:23.362 2016 [main.cxx:370] (node0.7092.7092 : c1.APP.MAIN:00019 : INFO) Basic HA app: Active.  Hello World!

Go ahead and kill one of these processes to see it automatically restart and watch the logs to see it the active/standby roles fail over.

This concludes this basic evaluation. Congratulations you've built and deployed your first highly available application!

Troubleshooting

Fri Apr  8 14:07:07.388 2016 [nPlusmAmfPolicy.cxx:510] (.0.7671 : AMF.N+M.AUDIT:00996 : INFO) Service Instance [si] should be fully assigned but is [unassigned]. Current active assignments [0], targeting [1]
Fri Apr  8 14:07:07.388 2016 [nPlusmAmfPolicy.cxx:527] (.0.7671 : AMF.N+M.AUDIT:00997 : INFO) Service Instance [si] cannot be assigned 0th active.  No available service units.
Fri Apr  8 14:07:07.388 2016 [customAmfPolicy.cxx:36] (.0.7671 : AMF.POL.CUSTOM:00998 : INFO) Active audit
Fri Apr  8 14:07:07.388 2016 [customAmfPolicy.cxx:47] (.0.7671 : AMF.CUSTOM.AUDIT:00999 : INFO) Auditing service group sg0

Did you forget to set a node name? The node names in the model do not match the nodes running.

SAFplus AMF cannot spawn your application. It is possible that the binary does not exist, is not executable, or is not of this machine's architecture. "cat" the error.#### file to see the specific problem.