Using StarCluster with PySPH¶
StarCluster is an open source cluster-computing toolkit for Amazon’s Elastic Compute Cloud (EC2). StarCluster has been designed to simplify the process of building, configuring, and managing clusters of virtual machines on Amazon’s EC2 cloud.
Using StarCluster along with PySPH’s MPI support, you can run PySPH code on multiple instances in parallel and complete simulations faster.
Configuring StarCluster¶
Creating Configuration File¶
After StarCluster has been installed, the next step is to update your StarCluster configuration
$ starcluster help
StarCluster - (http://star.mit.edu/cluster)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster@mit.edu
cli.py:87 - ERROR - config file /home/user/.starcluster/config does not exist
Options:
--------
[1] Show the StarCluster config template
[2] Write config template to /home/user/.starcluster/config
[q] Quit
Please enter your selection:
Select the second option by typing 2 and press enter. This will give you a template to use to create a configuration file containing your AWS credentials, cluster settings, etc. The next step is to customize this file using your favorite text-editor
$ emacs ~/.starcluster/config
Updating AWS Credentials¶
This file is commented with example “cluster templates”. A cluster template defines a set of configuration settings used to start a new cluster. The config template provides a smallcluster template that is ready to go out-of-the-box. However, first, you must fill in your AWS credentials and keypair info
[aws info]
aws_access_key_id = # your aws access key id here
aws_secret_access_key = # your secret aws access key here
aws_user_id = # your 12-digit aws user id here
To find your AWS User ID, see Finding your Account Canonical User ID
You can get your root user credentials from the Security Credentials page on AWS
Management Console. However, root credentials allow for full access to all
resources on your account and it is recommended that you create separate IAM
(Identity and Access Management) user credentials for managing access to your
EC2 resources. To create IAM user credentials, see Creating IAM Users
(Console)
For StarCluster, create an IAM user with the EC2 Full Access
permission.
If you don’t already have a keypair, you can generate one using StarCluster by running:
$ starcluster createkey mykey -o ~/.ssh/mykey.rsa
This will create a keypair called mykey on Amazon EC2 and save the private key to ~/.ssh/mykey.rsa. Once you have a key the next step is to fill in your keypair info in the StarCluster config file
[key mykey]
key_location = ~/.ssh/mykey.rsa
Also, update the following information for the smallcluster configuration:
[cluster smallcluster]
..
KEYNAME = mykey
..
Now that the basic configuration for StarCluster is complete, you can directly launch instances using StarCluster. However, note that EC2 charges are not pro rata and you will be charged for an entire hour even if you run an instance for a few minutes. Before attempting to deploy an instance/cluster you can modify the following information in your cluster configuration:
[cluster smallcluster]
..
NODE_INSTANCE_TYPE=t2.micro
NODE_IMAGE_ID=ami-6b211202
..
Now you can launch an EC2 instance using:
$ starcluster start smallcluster
You can SSH into the master node by running:
$ starcluster sshmaster smallcluster
You can transfer files to the nodes using the get
and put
commands as:
$ starcluster put /path/to/local/file/or/dir /remote/path/
$ starcluster get /path/to/remote/file/or/dir /local/path/
Finally, you can terminate the instance by running:
$ starcluster terminate smallcluster
Setting up PySPH for StarCluster¶
Most of the public AMIs currently distributed for StarCluster are outdated and
have reached their end of life. To ensure a hassle-free experience while
further extending the AMI and installing packages, you can use the 64 bit
Ubuntu 16.04 AMI with AMI ID ami-01fdc27a
which has most StarCluster
dependencies and PySPH dependencies installed.
Base AMI for PySPH [Optional]¶
The ami.sh
file which can be found in the starcluster
directory in the
PySPH repository automatically launches a vanilla 64-bit Ubuntu 16.04 instance,
installs any necessary StarCluster and PySPH dependencies and saves an AMI with
this configuration on your AWS account
$ ./ami.sh
The AMI ID of the generated image is stored in AMI_ID
. You can also see a
list of the AMIs currently in your AWS account by running
$ starcluster listimages
Cluster configuration for PySPH¶
Modify your StarCluster configuration file with the following
information. Launching a cluster with the following configuration will start 2
t2.micro instances, install the latest version of PySPH in each and keep track
of the nodes loaded in /home/pysph/PYSPH_HOSTS
:
[cluster pysphcluster]
KEYNAME = mykey
CLUSTER_SIZE = 2 # Number of nodes in cluster
CLUSTER_USER = pysph
CLUSTER_SHELL = bash
NODE_IMAGE_ID = ami-01fdc27a # Or AMI ID for base AMI generated previously
NODE_INSTANCE_TYPE = t2.micro # EC2 Instance type
PLUGINS = pysph_install
[plugin pysph_install]
setup_class = sc_pysph.PySPHInstaller
Also, copy sc_pysph.py
from the starcluster
directory to
~/.starcluster/plugins/
Running PySPH scripts on a cluster¶
You can start the cluster configured previously by running
$ starcluster start -c pysphcluster cluster
Assuming your PySPH file cube.py
is in the local home directory, you can
first transfer this file to the cluster:
$ starcluster put -u pysph cluster ~/cube.py /home/pysph/cube.py
Then run the PySPH code as:
$ starcluster sshmaster -u pysph cluster "mpirun -n 2 --hostfile ~/PYSPH_HOSTS python ~/cube.py"
Finally, you can get the output generated by PySPH back by running:
$ starcluster get -u pysph cluster /home/pysph/cube_output .