Monitoring the super computer with Nagios – Part 2: Adding More Hosts

Nagios stores all of its host configuration files in

/etc/nagios3/conf.d

In this folder we are interested in the config file

localhost_nagios2.cfg

We need to create one of these for each of the servers we want to monitor, so copy this file as many times as you have servers, each time your copy replace localhost in the name with the name of the server to monitor.
In this instance, I have 6 servers

sudo cp localhost_nagios2.cfg rasp-blue_nagios2.cfg
sudo cp localhost_nagios2.cfg rasp-green_nagios2.cfg
sudo cp localhost_nagios2.cfg rasp-white_nagios2.cfg
sudo cp localhost_nagios2.cfg rasp-yellow_nagios2.cfg
sudo cp localhost_nagios2.cfg rasp-purple_nagios2.cfg
sudo cp localhost_nagios2.cfg rasp-red_nagios2.cfg

For each file, open up in your editor and replace every instance of localhost with the name of the service

define host{
use generic-host
host_name REMOTESERVER ; Change to the remote server's hostname and domain name.
alias REMOTESERVER    ; Change to the remote server's hostname
address 5.6.7.8    ; Change to the remote server's IP address (WAN?)
}

define service{
use generic-service
host_name REMOTESERVER ; Change to the remote server's hostname and domain name.
service_description Disk Space 1
check_command check_nrpe!check_sda1
}

define service{
use generic-service
host_name REMOTESERVER ; Change to the remote server's hostname and domain name.
service_description Current Users
check_command check_nrpe!check_users
}

define service{
use generic-service
host_name REMOTESERVER.EXAMPLE.COM    ; Change to the remote server's hostname and domain name.
service_description Total Processes
check_command check_nrpe!check_total_procs
}

define service{
use generic-service
host_name REMOTESERVER ; Change to the remote server's hostname and domain name.
service_description Current Load
check_command check_nrpe!check_load
}

Next we tell Nagios what services are running on each server. This is contained in the file hostgroups_nagios2.cfg. Add the name of each server to the list of members for each service to monitor

# Some generic hostgroup definitions

# A simple wildcard hostgroup
define hostgroup {
        hostgroup_name  all
                alias           All Servers
                members         *
        }

# A list of your Debian GNU/Linux servers
define hostgroup {
        hostgroup_name  debian-servers
                alias           Debian GNU/Linux Servers
                members         localhost, raspi-blue, raspi-green, raspi-white, raspi-yellow, raspi-purple, raspi-red
        }

# A list of your web servers
define hostgroup {
        hostgroup_name  http-servers
                alias           HTTP servers
                members         localhost, raspi-blue, raspi-green, raspi-white, raspi-yellow, raspi-purple, raspi-red
        }

# A list of your ssh-accessible servers
define hostgroup {
        hostgroup_name  ssh-servers
                alias           SSH servers
                members         localhost, raspi-blue, raspi-green, raspi-white, raspi-yellow, raspi-purple, raspi-red
        }

Now we restart Nagios and we’ll see each of the servers

sudo /etc/init.d/nagios3 restart

This information was scammed from Will Bradleys website, which is an excellent source of information on configuring nagios

Why oh Why do I use RS

So a friend recommended some new cases for the Pi which are available at RS, great I thought, I can buy 8 black bases and 8 different coloured tops so that I can start to refer to the PIs by their colour in the grid, rather than numbers.

I’m always dubious of RS given they messed up early deliveries of Pis, but I placed an order for 9 of each and a few days later they duly arrived in an unpadded bag………….

I refuse to use the word packed for shipment, instead 16 rather nice looking but flimsy bits of plastic where thrown into the back with no thought to order or safety.

3 arrived broken, 2 bases and a top. Well it is RS after all what more did I expect and rather than go through the pain of sending them back and them chucking another load at me in the post, whipped out the super glue and repaired them.

How does that company manage to exist…….

Configuring Eclipse to run MPI4PY

If Eclipse is your IDE of choice then you can configure it to run MPI4PY tools from within the editor itself.

The primary app we use for running parallel programms is mpiexec and this needs to be run as an external application from Eclipse, this is asked from the Top Level Run menu, then click External Tools, and then click External Tool Configurations

Run -> External Tools -> External Tool Configurations...

This will bring up the following screen

Select Program in the list on the left, and then click the Add New Icon, which is the left most icons, the square with the + on it

In Location: we put the full path to the app we want to run. You can get this from the command line by typing ‘which mpiexec’ and using the result returned in that field

In Working Directory: use the Eclipse Variable workspace_loc with the addition of the path to where the Python files are stored

${workspace_loc:/Parallel-O-Gram}

Finally in Arguments: use the following information. You can edit this to change the number of processes as and when

-n 4 -f machinefile python ${resource_name}

 

Parallel Programming with Python – The Basics

Parallel Programming with Python – The Basics

Before you begin follow the instructions in Parallel Programming in Python to learn how to install the required Python libraries

And I mean the real basics, this is more of a blog about me learning how to do parallel programming in Python than a detailed tutorial. I’ll probably come back and revisit it as I learn more and more.

Most of this text has been taken from the very good book on parallel programming called Parallel Programming with MPI, available at Amazon.

However this book is almost exclusively C and Fortran based and we are new age Python programmers, so as I’ve worked through the examples I’ve converted them to Python and added my own explanations to some of the topics and problems I’ve come across

I searched high and wide for some tutorials on the web about MPI4PY and they all seem to be the same rehash of an academic presentation so they all demonstrate pretty much the same concepts in the same way using the same language. To be honest I really struggled with what I was reading, and it took me a while to get my head around parallel programming with MPI. There are a few basic constructs that you need to get your head round first before trying to use the MPI libraries in anger

The Basics

MPI can be huge and daunting when you first start with it, especially trying to understand the way the APIs work in 1 of 2 very distinct ways, ( see Basic Mindset below for more details ), but the reason for MPI is in its title, this is a message passing API that allows the same programming running on multiple processes or processors to communicate in realtime with each other.

Rather than get into the nasty details of asynchronous programming which comes later, there are 2 basic ways MPI works,

  • Point To Point – Allowing one process to send a message to another process, each process identified by a unique number
  • Multicast – Allows one process to distribute data ( i.e Load ) to multiple other processes and then recombine the data back into a single piece. The process with id 0, sends the data and all other processes with unique numbers > 0 receive it.

Executing an MPI Program

Hopefully you followed the instructions for install MPI on your PI, ran through a couple of the C examples that came with the install and then went on to install MPI4PI and downloaded and ran one of the examples. To recap we first need to identify all the machines in our network, we do this by creating a file with all the IP addresses of the machine we communicate with. Typically this is called machinefile and for now can look like this

127.0.0.1

To run parallel programmes written in Python we still use mpiexec command, but tell it to run the python interpreter,

mpiexec -n 2 -f machinefile python program.py

Where -n is the number of processes to create with our program, -f points to the name with all the IP addresses on machines to use in our network, and program.py is the name of the Python programme we want to execute.

Basic Send and Receive Example

This is an example of a simple programme which if run twice via the MPI exec will send a message from one of its instances to the other.

from mpi4py import MPI			# Import the MPI library

comm = MPI.COMM_WORLD			# Initialise MPI library ( more on this later )
rank = comm.Get_rank()			# Get the unique number of this process, know as its rank
size = comm.Get_size()			# Get the number of processes available to this task
name = MPI.Get_processor_name();	# Gets the name of the processor, usually its network name

print
print "Rank : ", rank
print "Size : ", size
print "Name : ", name

if rank == 0:				# If the first process, create the data and send it
        data = [1, 2, 3, 4, 5]
        print "Sending..."
        comm.send(data, dest=1)
        print "Sent {0}".format(data)
else:					# Otherwise I am the receiver, so receive it
        print "Receiving..."
        data = comm.recv(source=0)
        print "Recv {0}".format(data)

print "Data on rank {0} : {1}", rank, data

If we run this, then something like the following will be displayed

Rank :  0
Size :  2
Name :  fender
Sending...
Sent [1, 2, 3, 4, 5]
Data on rank {0} : {1} 0 [1, 2, 3, 4, 5]

Rank :  1
Size :  2
Name :  fender
Receiving...
Recv [1, 2, 3, 4, 5]
Data on rank {0} : {1} 1 [1, 2, 3, 4, 5]

Note, as with all parallel programming you cannot determine the order in which the processes will produce output, so for all these examples you’ll get something that looks similar but the lines may be in slightly different order

Basic Mindset

MPI is a unqiue way of coding, generally with procedural and object orientated programming you expect every function call to behave in one way and one way only; you have a defined set of parameters defined as the API, the function normally has an explanation of what it does, and more often than not the function returns a value. If you call the API with the right parameters you expect it always to behave the same. However for alot of MPI functions this is not the case and for many it depends on the context of the caller and whether the function takes data or returns data.

A good example is the method scatter which scatters the elements of an list to all
available processes

return = comm.scatter(self, sendBuffer, recvBuffer=None, root=0)

Basic parallel programs tend to have sequential sections and then parallel sections. When the mpi subsystem runs you program, it sends a copy of the programm to every process it you tell it about ( more on this later ) and executes them all at the same time ( this is parallel programming after all ), The first program running is often called the root and is generally given the unique id number 0, every other programme is then given the number 1, 2, 3 etc in sequence to signify their uniqueness and the fact that they are not root ( i.e = 0 )

At some point in your code you will have created a list, and pass it to scatter

if rank == 0:
	data = [1, 2, 3] 
else
	data = None
return = comm.scatter(data, root=0)

This says, scatter the elements of data to all processes and remember that they all came from root=0, ( the first process ), now you have to imagine all the process running this parallel, each one hits the above lines at approximately the same time.

Process with rank=0, has data set to an array of 3 numbers, all other processes have no array, when each process makes the call, the library hides the magic, and distributes the array to all processes, so that each return ends up with a different number

Process with rank = 0, gets 1
Process with rank = 1, gets 2, but Data is still None
Process with rank = 2, gets 3, but Data is still None

Its quite a mind change to visualise the same code running at the same time and depending upon the value of rank, different behaviour can happen explicitly as in our if statement to set the data only for rank = 0, or implicitly as within the MPI calls

Wierd or what !!!!

Broadcast Example

Point to point doesn’t always have to be one process to another, it can one process to a number of processes. For this we use the broadcast API call

from mpi4py import MPI

comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
name = MPI.Get_processor_name();

print "Rank : ", rank
print "Size : ", size
print "Name : ", name

if rank == 0:
        data = [1, 2, 3, 4, 5]
else:
        data = None

# bcast sends the same data to all processes
# For rank 0, data is not None and therefore is the one that is issued and return
# For rank > 0, data passed is None, but the value returned is the data sent from rank 0
data = comm.bcast(data, root=0)
print "Data on rank {0} : {1}".format(rank, data)

Run this example and you’ll see something like the following which shows that the array defined by process with rank = 0 has been passed to all other processes

Rank :  0
Size :  3
Name :  fender
Rank :  1
Size :  3
Name :  fender
Rank :  2
Size :  3
Name :  fender
Data on rank 0 : [1, 2, 3, 4, 5]
Data on rank 1 : [1, 2, 3, 4, 5]
Data on rank 2 : [1, 2, 3, 4, 5]

This is useful if you need multiple processes to work on the same data.

Scatter Example

from mpi4py import MPI

comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
name = MPI.Get_processor_name();

print "Rank :{0}, Size :{1}, Name : {2}".format(rank, size, name)

if rank == 0:
        data = [1,2,3]
else:
        data = None

# Scatters distributes the values of an array out to waiting processes
# If this process is rank 0, then this sends the data to all processes, 
# If its rank 1 or above, it waits for data from scatter
# Each call to scatter returns with one element from the array
data = comm.scatter(data, root=0)
print "Data on rank {0} : {1}".format(rank, data)

# Then we add one to each value
data = data + 1

# Gather, collects all single values of an array back into an array
# A Call to gather with rank = 0, returns with an array, and takes a value to be gathered
# A call to gather with rank 1 or above takes the value to be gathered
data = comm.gather(data,root=0)
if rank==0:
    print "Data on rank {0} : {1}".format(rank, data)
# At this point, all other processes will have complete gather but it will have returned them nothing, 
# only the process with rank 0 gets a return value

If we run this example then we see the something like the following

Rank :0, Size :3, Name : fender
Data on rank 0 : 1
Data on rank 0 : [2, 3, 4]
Rank :1, Size :3, Name : fender
Data on rank 1 : 2
Rank :2, Size :3, Name : fender
Data on rank 2 : 3

Scatter Example 2

In this example we show that we don’t just pass single values, we can pass objects. In this instance the objects are further arrays of values, but can be any valid Python object

comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
name = MPI.Get_processor_name();

print "Rank :{0}, Size :{1}, Name : {2}".format(rank, size, name)

if rank == 0:
        data = [[1,2,3], [4,5,6], [7,8,9]]
else:
        data = None

# Scatters distributes the values of an array out to waiting processes
# If this process is rank 0, then this sends the data to all processes, 
# If its rank 1 or above, it waits for data from scatter
# Each call to scatter returns with one element from the array
data = comm.scatter(data, root=0)
comm.sca
# Then we do something with the data
for i in range(0,len(data)) :
    data[i] = data[i] + 1

# Gather, collects all single values of an array back into an array
# A Call to gather with rank = 0, returns with an array, and takes a value to be gathered
# A call to gather with rank 1 or above takes the value to be gathered
data = comm.gather(data,root=0)
if rank==0:
    print "Data on rank {0} : {1}".format(rank, data)

Running this program gives us output similar to this

Rank :0, Size :3, Name : fender
Rank :1, Size :3, Name : fender
Rank :2, Size :3, Name : fender
Data on rank 0 : [[2, 3, 4], [5, 6, 7], [8, 9, 10]]

Each array has been manipulated and returned to the original sender

Pi Back !!!

I posted earlier that I was suffering problems from SD cards in Pi Down

Well over the weekend I received 4 16GB Extreme Class 10 Sandisk SD cards from Amazon and set about taking an image of each current SD card using the instructions here and copying the image onto the new card

At the same time I ran rasp-config and switch off all Turbo modes, and I’m happy to say that all nodes and up and running and working a treat and now running MPI libraries and their Python variants MPI4PI

Pi Down !!!!

Over the past week of beaverish actively I’m seen a couple of my Pi’s fail and even after a decent cool down and reboot they have failed to boot properly, loosing all network connectivity.

I’ve eventually tracked the problem down to what looks to be shoddy SD cards. I purchase 4 Kingston 16GB Class 4 cards from Amazon at about £6 a pop and now regretting scrimping on this important part.

With all the compiling and configuration it looks like they are failing one by one, however having purchase a 16GB Extreme Class 10 Sandisk earlier I’ve popped this into one of the Pi’s and its been running like a dream.

Looks like I’m going to need a few more of these Extremes to keep Colossus running

Parallel Programming in Python

Now that we have MPI installed and working we can start using it with real programming languages, and first language of choice for the Pi is ofcourse Python.

Following on with the instructions from Southampton Uni there is some information at the bottom about the use of mpi4py one of a number of Python libraries for MPI.

MPI4PY can be installed through apt-get

sudo apt-get install python-mpi4py

Once installed we can pull down some demo code

mkdir mpi4py
cd mpi4py
wget http://mpi4py.googlecode.com/files/mpi4py-1.3.tar.gz
tar xfz mpi4py-1.3.tar.gz
cd mpi4py-1.3/demo

And once its installed we can check it works by running one of the basic examples

mpirun.openmpi -np 2 -machinefile /home/pi/mpi_testing/machinefile python helloworld.py

This produces the following output if running on one node, in this instance pislave1

Hello, World! I am process 0 of 2 on pislave1.
Hello, World! I am process 1 of 2 on pislave2.

The above instructions can now be repeated on pimaster, and pislaves2, 3 and 4

Do not order your Pi from RS

Why because basically they are pants, The Register is reporting that customers who ordered in June are still waiting for their Pi’s and the best way is to cancel your order and re-order, my advice is to cancel your order and place a new one with Farnell who I’ve never had a problem with.

Such a shame that a UK company lets down another UK company by just not being able to fulfil its most basic capability that of a component supplier to supply components.

Pi Ram doubled to 512M

Various news boards are reporting that Farnell are now taking orders for PI’s with 512MB of RAM rather than 256MB

This is great news, can’t wait to see the performance increase for the X Windows system now we can allocate more to it

The Register also reports just how crap RS Components are, with customer who ordered in June still waiting for orders. My view is bin RS and go to Farnell who have shipped just about all my Pis ( 7 so far ) in under 48 hours.