Sunday, 21 August 2016

Simple Website in Node.js for you Raspberry Pi 3

Now that I have a cluster of Raspberry Pi's my possibilities are endless. In this article I will show you how to create a simple website using Node.js with Express, Stylus and Pug

Node.js is is a platform built on Chrome's JavaScript runtime for easily building fast and scalable network applications, Express is a fast web framework for Node.js, Stylus is an innovative style-sheet language that compiles down to CSS and Pug is a succinct language for writing HTML templates.

To ease the pain of working with a Raspberry Pi, I will show you how to remote desktop it first for ease of use so you can code your website on the Pi itself without having to use the console.

The following operations have to be applied to every member of the cluster so Remote desktop is available to every node. The idea is to access each node via remote desktop using a laptop with windows 10.

Configure Remote desktop on the Raspberry Pi

The first thing to do is to update our version of Raspbian and ensure that all the packages are upgraded. This can be done with the following commands:

sudo apt-get update

Next, run the following command to upgrade any packages installed on your system that need upgrading:

sudo apt-get dist-upgrade

You'll have to perform this operation in all your cluster nodes. Now we are ready to install the packages we need: xrdp and samba. xrdp is an open source remote desktop protocol(rdp) server and Samba is the standard Windows interoperability suite of programs for Linux and Unix.

Run the following commands so we can remote desktop the Raspberry Pi's:

sudo apt-get -y install xrdp

The '-y' option will automatically answer yes to the default continue [Y/n] question.

Next step is to install the samba package so we will be able to access the Raspberry Pi's by its host name from Windows rather than by it’s IP address which changes as the node receives its IP address via DHCP:

sudo apt-get -y install samba

After the installation is successful, you should be able to ping the Raspberry Pi from your windows machine and perform the remote desktop:

If everything is configured correctly you should see the following screens:

To copy files from my Win10 machine to one of the nodes I use WinSCP. You can also create a shared folder on the Pi that's visible on your Win10 machine using Samba.

Now it's time to configure the rest.

Install Node.js

By default there is a pre-installed version of Node.js on the Raspberry Pi's. If you type node -v you'll see that the version of node.js is v0.10.29. We need to upgrade this version to a more recent one (v6.3.1 by the time I published this article).

Now our Raspberry Pi is ready for action. Let's see what are the next steps to create our Website.

To make things easier for you, I've created a project on Github that contains a sample website that you can use to start with. Installing the dependencies required through npm is a bit cumbersome so using a sample project makes things a bit easier.

Here is a screenshot of the site once it is up and running:

I'm using the site as a Raspberry Pi status monitor where there is a bit of javascript that pings each node on the grid. Then I use knockoutjs to bind the results to the page.

Once you've downloaded the repository, you only need to run the following commands to install the dependencies and run the website:
You can follow the instructions on my Github project:
The site in action:

Here is the list of installed packages for your reference:


Sunday, 14 August 2016

Raspberry Pi 3 Cluster Test

The following article describes a simple test that was executed on a 4 node (1 controller + 3 workers), Raspberry Pi cluster. The purpose is to obtain reproducible measures of MPI performance that can be useful to MPI developers.  If you haven't read my article about building a Raspberry Pi 3 cluster for parallel programming, you can find it here. The test is a matrix multiplication where each node will perform the calculations of a slice of it and send the results back to the main node. The test will play with 2 main variables: a) the size of the matrix and b) the number of nodes to use to perform the calculations. This should give us the time for each calculation and the speedup.

If you haven't seen my cluster yet, here is an image:

Test Description

The test consist of the following: 
The application generates two square (NxN) matrices A and B of a variable size and defined via arguments. Matrix B is by default visible to each node so we save time sending the array to each node. Then Matrix A is generated in the master node and sliced into several chunks and sent to each individual node of the cluster. The slicing is calculated in the master node. Once each individual node of the cluster has finalised with its calculations, they send the results back to the master node to combine the results and present the resultant matrix.

The slicing mechanism works as follows:

For the example above, imagine that we have a square matrix of size 6x6. We have 4 nodes in our cluster but only three of them are available for calculations. Node 0 or master is just there to arrange initial calculations, send the values to each node and then gather the results from each individual node and display results.

The architecture is quite simple but very common in these scenarios. The beauty of it is that we can increase the number of nodes in the cluster without having to change a single line of code in the application.

As we have a 6x6 matrix, we need to split that by the number of nodes available in the system. Notice that the size of the matrix needs to be divisible by the number of nodes available in the cluster. In this case we have 6 rows and 3 nodes, so there will be 2 rows of data for each node.


You can find all the source code and results on my github project:

In there you will find the source code of, the shell scripts that I used to run the tests, the logs and excel files that I used to gather all the details from each node.

The first step is to calculate the matrix multiplication using just 1 node and then see what's the speedup by using additional nodes.

The sizes of the matrices for this test are defined below:

  • 12x12
  • 60x60
  • 144x144
  • 216x216

Each matrix will be run against 3 nodes and from 1 to 4 cpus on each node. Every cycle of the application runs 10 times and we use the average value for defining our results.

Here are the results for the calculations above against 1 node:

Time is in seconds and we can see that the bigger the matrix, the longer it takes to be multiplied. Remember that the complexity for a matrix multiplication is O(n3). We can easily how the graphic tends to draw a cubic function. Just increasing the size of the matrix by 50% we increased the calculation time by 300%.

Here you can see the calculation that the application performs:
Let's see what happens when we run the same matrices against our cluster:

Matrix multiplication against 3 nodes (1 CPU each):

As expected we've reduced one third the execution time for our calculations.

Let's see what happens when we introduce more CPUs:

Matrix multiplication against 3 nodes (2 CPU each):

Matrix multiplication against 3 nodes (3 CPU each):

Matrix multiplication against 3 nodes (4 CPU each):

Notice that the RPI3 has 4 CPUs and we can control the number of CPU used through the machinefile and MPI. All the cpus are defined as a node in my machinefile and I made sure that each CPU was working while monitoring them. Below is a graph showing all four cpus working on one of my nodes while running the experiment 216x216 on 12 CPUs:

Here you can see an example running 3 CPUs on each PI. Notice how the CPU's reach 100% on each PI.

Here a sample script to grab the cpu usage for linux:
If we group the graphs together we have:

We can see that the highest throughput is achieved by splitting the matrix using as many nodes as possible and return the results back. Notice that time is not linear in this case as we would suppose to go down to 4s of calculations for each node but we go down to 8s instead (calculation of 216x216 against 12 cpus). We need to consider also that there is an overhead when running MPI and this needs to be taken into consideration. In any case the throughput can be seen in the following figure representing the speedup:

Using 4 CPUs per node gives the highest throughput with a speedup of 6.34. Speedup is calculated with the division of the SeqTime/ParaTime. With this configuration we achieve an 85% of time reduction for our calculations, allowing us to perform large calculations under seconds.

There are loads of tests still to perform on the cluster and this is just a simple example as to how to code a simple example into parallel computing. Please see my project on github for more info and reference.


Saturday, 30 July 2016

Creating a Raspberry Pi 3 Cluster - "Supercomputer", for parallel computing.

In this quick article I will show you how to create your own Raspberry Pi cluster for parallel computing via MPI (Messaging Passing Interface) library. This is a nice summer project now that I'm free from my Master's duties until September and I have been wanting to build this for a while. Thanks to the low prices of the Raspberry Pi we are now able to build this without spending too much. See below for the list of items you will need and price for the whole kit with 4 Pi's. 

The main decision behind this architecture is to choose which operating system and programming language to use to implement parallel computing. Because of my experience with HPC (High Performance Computing) and SGE (Sun Grid Engine) the best way to achieve this is by using either OpenMPI or MPICH3. These two are free open distributions, portable and very popular. As per the programming language, we have several alternatives: we could use c++, c#, python, etc. I could use winIoT for Rpi or simply a Linux distro. I'm a geek so I went for the latter as I do like interacting with command line interfaces, there is some beauty there that I can't explain :). 
So my decision is to use a Linux distribution as OS. In this case I'm choosing Raspbian Jessie which comes with some goodies installed by default and it will allow me to install all the components I need for my little project.

The second decision to make is to choose the programming language. In this case I'm choosing Python as I'm very familiar with it, it has plenty of libraries available and a nice integration with MPI via mpi4py library.

The other factor to take into account here is that I have two different models of RPi and I need to make sure that whatever I install on those will work well for both instances. I won't be able to install WinIoT to my old Rpi model A.

Building the cluster of Rpi's

The material that you will need is listed below with links included:

4 x Rpi 3 model B = 4 x £30 = £120
4 x 16Gb microSD card (Kingston) = 4 x £4.84 = £19.36
4 x USB to Micro USB Cable 0.5m = 4 x £0.88 = £3.5
1 x 5 port desktop switch = 1 x £6.49 = £6.49
5 x Ethernet patch cable 0.3m = 5 x £2.90 = £14.5
1 x USB Hub = 1 x £2.53 = £2.53

Total =  £192.38 (without considering delivery)

*This is a common configuration but you can start with just 2 or 3 RPi's and keep adding hardware later on.

Once all the components are assembled using the stackable case you should have something like the image below:

Below the image of my cluster up and running (see configuration section for more):

Configuring your cluster of RPi's

The idea is to configure one of the RPi's and then just clone the SD card and plug it to the next Rpi. Here you'll find a summary description of the steps to do to get you up and running:

Installing the OS

  • Download Raspbian Jessie image. I had some trouble downloading the zip file so I used the torrent link instead. See the version used below (4.4)
Once the OS image is downloaded, burn it to the SD card using Win32DiskImager:

Plug the microSD card to the first Pi (my PiController in my case) and power it up. Plug the Ethernet cable and head back to your computer to access the Pi remotely.

Open a command prompt (I'm using Win10 as my main computer) and type "ping raspberrypi". By default the Rpi's are named raspberrypi so they are easy to spot in your network. Once you ping it, you will be able to see the ip address of the device. Save this IP address for later as we will use it in PuTTY.

Launch PuTTY and type the IP address of the RaspberryPi:

You should see something similar to the image below:

login as: pi and password: raspberry (each Rpi uses same login/password)

Type: sudo raspi-config to configure our device:
  1. Go to Expand File System
  2. Go to Advanced Options -> HostName -> set it to PiController
  3. Go to Advanced Options -> MemorySplit -> set it to 16.
  4. Go to Advanced Options -> SSH -> Enable.
  5. Finish and leave the configuration.

Now we can start installing MPICH3 and MPI4PY. Notice that these steps take a while (> 4h) so arrange some free time for this beforehand:

Installing MPICH3

Follow the steps below to install version 3.2 of MPICH:
Once you've got everything installed you should see something like the image below:

Installing MPI4PY

Follow the steps below to install version 2.0 of MPI4PY:
once installed you should see something like the image below:

Now we have finished configuring the first RPi. Believe it or not if you reach this step and everything is working you should be proud of it. Now we will have to clone this SD card and place them into the other RPi's.

Preparing the other RPi's

As mentioned in the step above, bring the SD card to your main computer and save the content of the SD card using Win32DiskImager. Now copy this new image to the other SD cards. You should have now 4 SD cards with the same image. As now we have 4 cloned SD cards, my advice is to plug every Rpi individually and change the host name of every new added Rpi into the network, e.g. pi01, pi02, pi03, etc.

Do the following for every new RPi added into the network:


scan the network for a newly added device to find its IP address using a network scanner. Once found, use PuTTY to access it and use the commands below to set it up:

Type: sudo raspi-config to configure our device:
  1. Go to Expand File System
  2. Go to Advanced Options -> HostName -> set it to pi01
  3. Go to Advanced Options -> MemorySplit -> set it to 16.
  4. Go to Advanced Options -> SSH -> Enable.
  5. Finish and leave the configuration.
  6. sudo reboot.
Do the same for pi02 and pi03. Note that you can name your RPis the way you want.

Once done you should be able to see them all 4 using PuTTY:

Once completed, each Rpi will have its own IP. We need now to store each IP address into a host file also known as machinefile. This file contains the hosts which to start the processes on.

Go to your first RPi and type:

nano machinefile

and add the following IP addresses: (Note that you will have to add your own):
This will be used by the MPICH3 to communicate and send/receive messages between various nodes.

Configuring SSH keys for each RPi

Now we need to be able to command each RPi without using users/passwords. To do this we will have to generate SSH keys for each RPi and then share each key to each device under authorised devices. This is the way MPI will be able to talk to each device without worrying about credentials. This process is a bit tedious but once completed you will be able to run MPI without problems.

Run the following commands from the first Pi (PiController):
When running the ssh-keygen just hit enter (if you don't want to add specific passphrase) and the RSA key will be generated for you automatically.

Now we have configured the link between PiController to every single device but we still need to configure the other way around. So you will have to run the following commands from every individual device:

open the authorized_keys files and you will see the additional keys there. Each authorized_keys file on each device should contain 3 keys (as stated in the architecture diagram above).

Now the system is ready for testing.

Note that if your IP address changes, the keys will not be valid and the steps will have to be repeated. 

Testing the cluster

At this point I will just include a small example for you to test that the cluster works as expected. Later on I will publish a more complex scenario with a refined configuration to maximise the power of the cluster.

If everything is configured correctly, the following command should work correctly:

mpiexec -f machinefile -n 4 hostname

You can see that each Device has replied back and every key is used without problems.

Now run the following command to test a helloworld example:

mpiexec -f machinefile -n 4 python /home/pi/mpi4py-2.0.0/demo/

You should see something like the image below:

Now our system is ready to take any parallel computing application we want to develop.

Watch this space for more!.

Next steps

I will be creating more complex scenarios and squeezing the architecture to test its limits. Soon more!. Give it a go and let me know if you face any problem during the set up.


Saturday, 25 June 2016

BaaS with Kinvey and Delphi 10.1 Berlin

In this article I will show you how to connect your desktop and mobile applications to a mobile backend as a service (mBaaS) with Delphi 10.1 Berlin. I normally use as a backend but as they announced that they will close their mBaaS service I will use Kinvey instead.
If you are interested in you can read my previous articles about creating your own self-hosted Parse server and deploying a Parse server to Heroku which achieves similar result to what I want to explain you today (basically having your data hosted anywhere in the cloud either using Kinvey as an mBaaS or Parse + Heroku as a PaaS).
 There are plenty of articles online about these topics and I just want to give you my input and how I dealt with some of the challenges I faced during development.

Join Kinvey and create your App environment
First step is to join Kinvey and create your application environment. Kinvey offers a Free plan for developers that it is ideal to test your applications. It includes the core mBaaS features and 1GB of data storage. Once you move your solution into production you can switch over one of the other plans available.
Once you are in the console management, press "New App" and enter the name of your backend:
Now you should see your enviromnent created:
Click on the Development label and the dashboard will be shown:
At this moment in time, you should know what data you are going to store. In this example I have created a simple collection that includes several fields for a sample application that I'm building and that the source code can be found here:
Here is my collection of values:
*Note that if you POST data to a non-existing collection, this will be created automatically.

Handling collections
Now it's time to start using the collection and querying/adding data via Http REST. For this task you will have to identify the required headers that are needed for your GET/POST requests. You can see one example here for Parse.

Kinvey follows a different approach than Parse in terms of security. Kinvey offers basic and session authentiation. Basic authentication uses the HTTP header "Authorization" with the components "Basic" and Base64Encode(AppId:MasterSecret). Session authentication sends a login request to collct an auth token from Kinvey backend and then this token is used in subsequent REST requests.

I will focus on Basic authentication as Session authentication is as simple but with more steps.

The first example is by using REST via IdHttp so you can see how GET/POST are handled manually and the second example is just by using the Kinvey provider component available with our Delphi 10.1 Berlin that will make our life easier.

Here it's my Win64 example that uses Kinvey BaaS:

Add action, adds data into the collection and reload just brings the data back from the cloud and displays it in a listview component.

Here is the code behind it:
Add item:
Load data:
As you can see, each method uses a POST/GET command to baas.kinvey url with some arguments, headers and options. I find this way really useful as you can clearly see what's going on and easily map what you could do via curl:

If you want to run this example successfully you'll need to include libeay32.dll and ssleay32.dll from OpenSSL.

You can test this app and its mobile companion app using the source code here.

Using Kinvey Provider component
Delphi 10.1 Berlin has a KinveyProvider component that we can use directly without having to worry about the request details. You'll see below that to do the same as the code sample above we will need just few lines of code:
The KinveyProvider needs the parameters:

  • AppKey
  • AppSecret
  • MasterSecret
Get those from the Kinvey dashboard and off you go.

Then connect the BackEndStorage component to the KinveyProvider component.

Now, here is the code to Add items to the collection and to load the collection via components:
Load Data:
Add Item:
As you can see now it's way simpler.

And here my android app up and running:
Now I have a Win64 app and an Android app that share the same back end using Kinvey BaaS.

And the data in Kinvey:

Although this article it's quite long I'm sure you will find it quite interesting if you still haven't played with these components and the cloud.

Note that this BackEnd service will cease to exist shortly (as the appkey is hardcoded in the android app and the source code is available).

Do not hesitate to contact me if you have any questions.

Monday, 6 June 2016

Invoke PowerShell remote command with parameters with spaces via TeamCity

One of the coolest things to do with TeamCity is to run some external applications via PowerShell. This will allow us to invoke remote commands without having to install an additional agent on the remote target and it will allow us to centralise those commands from our build agent environment.

The idea behind it is as follows:
I have a centralised Build Agent environment and I need to run a command line application en two additional machines. I don't want to install any TeamCity agent on those machines as these are just deployment machines and should be independent of the bulding process. I just need to remote deploy some binaries there via Powershell and then execute the application.
These two additional machines are in the same network and WinRM is configured in every instance with the correct permissions so TeamCity can run the commands without problems.
There are loads of guides over the internet regarding WinRM configuration. Here is the screenshot from my local configuration on my Win10 machine. I run the same on the TeamCity agent, a Win2012 machine:

I had to tweak first with the wifi connection as it was set to public by default and it needs to be private. Then just enable PSRemoting and add the trusteshosts as all (*). Then to test it out, run the Test-WSMan myRemoteMachine and you should see something similar to the image above.

In TeamCity, there are two additional steps, one to copy some binaries to a remote machine and the second one to run the binaries remotely. Both steps running powershell:

Here is the command for each step:

And the remote execution from Powershell: