Skip to content

Docker Containers

Run your docker image

It’s simple , you can run any docker image

docker run centos:7

Bind directories

Make paths /users/<user_dir_path> and /work/<work_dir_path> available inside the container.

docker run -v /users/<user_dir_path>:/users/<user_dir_path> -v /work/<work_dir_path>:/work/<work_dir_path> centos:7

Commit changes in the docker image

Find your running container id

docker ps 

Then commit the changes to a new image

  • -m : commit message
  • -a : author name
  • CONTAINER_ID : the running container id
  • REPOSITORY: usually your Docker Hub username
  • NEW_IMAGE_NAME: set a new name
docker commit -m "Added users, work directories" -a "Author Name" <CONTAINER_ID>  <REPOSITORY>/NEW_IMAGE_NAME

Volumes

docker run -v /users/<user_dir_path>:/users/<user_dir_path> -v /work/<work_dir_path>:/work/<work_dir_path> centos:7

MPI

Running MPI jobs in a docker container will require some work.

Prepare the MPI container

We will need an image that provides the MPI runtime and tools along with an OpenSSH server so that multiple containers can be linked together and used via mpirun.

For ease of use , we are using a image including all that from this repo: https://github.com/oweidner/docker.openmpi

Prepare the Cluster

In order to have docker containers to communicate with MPI, we need first to setup a virtual cluster of docker containers. (swarm) That resides inside the SLURM

#!/bin/bash -l
###############################
#SBATCH --job-name=swarm
#SBATCH --output=swarm.%j.out
#SBATCH --error=swarm.%j.err
#SBATCH --ntasks=2
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=1
#SBATCH --time=00:30:00
#SBATCH --account=testproj
#SBATCH --partition=el7fat
#SBATCH --exclusive
###############################

MANAGER=$(hostname)
MANAGER_IP=$(hostname -i)

SERVICE_NAME=mpi
SERVICE_NETWORK=mpi-network

ssh fat44 hostname
ssh fat43 hostname

#scontrol show hostname $SLUMR_NODELIST > workers #hostnames
WORKERS=$(echo $SLURM_NODELIST | cut -d "," -f 1 --complement)
#NWORKERS="$("$SLURM_NNODES"-1)"

echo $WORKERS
echo $NWORKERS

echo "Creating new SWARM cluster"
#docker swarm init --advertise-addr $MANAGER_IP --force-new-cluster
docker swarm init --advertise-addr $MANAGER_IP
echo "Node $MANAGER inits swarm and becomes a Manager"
sleep 10 #wait for cluster initialization

echo "Creating an overlay network"
docker network create --driver overlay --subnet 172.32.0.0/24 $SERVICE_NETWORK
sleep 5 #wait for network initialization

MANAGER_TOKEN=$(docker swarm join-token manager | grep token | awk '{print $5 }')
WORKER_TOKEN=$(docker swarm join-token worker | grep token | awk '{print $5 }')

#Join workers
ssh fat44 docker swarm join --token ${WORKER_TOKEN} ${MANAGER_IP}:2377

sleep 15 #wait for all worker nodes to be active

docker node demote fat44

#Node info
docker node ls
docker network ls | grep ${SERVICE_NETWORK}

#Spawn containers on the cluster
docker service create --replicas 1 --name ${SERVICE_NAME} --network ${SERVICE_NETWORK} --publish 2222:22 -d ocramz/docker-openmpi
docker service scale ${SERVICE_NAME}=${SLURM_NTASKS}

# if image is not available on ther nodes this neeed to wait longer until is downloaded
# OR load image manually
#srun docker load -i docker-openmpi.tar
sleep 40 

docker service ls
#SERVICE_ID=$(docker service ls | grep ${SERVICE_NAME} | awk '{print $1 }')
#docker service inspect -f "{{.Endpoint.VirtualIPs}}"  ${SERVICE_ID}

#RUN MPI on the virtual cluster
srun docker ps

echo "get the manager node container id"
MASTER_ID=$(ssh fat43 docker ps | grep $SERVICE_NAME | awk '{print $1}')
MASTER_IP="$(ssh fat43 docker inspect --format '{{ .NetworkSettings.IPAddress }}' "$MASTER_ID")"
ssh fat43 docker inspect --format '{{ .NetworkSettings.IPAddress }}' $MASTER_ID

SLAVE_ID=$(ssh fat44 docker ps | grep $SERVICE_NAME | awk '{print $1}')
SLAVE_IP="$(ssh fat44 docker inspect --format '{{ .NetworkSettings.IPAddress }}' "$SLAVE_ID")"

echo "MASTER"
echo $MASTER_ID
echo $MASTER_IP

echo "SLAVE"
echo $SLAVE_ID
echo $SLAVE_IP

echo "container hostnames"
ssh fat43 docker exec $MASTER_ID hostname
ssh fat44 docker exec $SLAVE_ID hostname

ssh fat43 docker exec $MASTER_ID whoami
ssh fat43 docker exec $MASTER_ID mpirun -V
ssh fat43 docker exec --user mpirun $MASTER_ID mpirun -np 2 --host ${MASTER_IP},${SLAVE_IP} --mca oob_tcp_if_include 172.32.0.0/24 --mca btl_tcp_if_include 172.32.0.0/24 python /home/mpirun/mpi4py_benchmarks/all_tests.py


#Delete swarm cluster
docker service rm mpi
docker network rm mpi-network

#worker
ssh fat44 docker swarm leave --force
ssh fat43 docker swarm leave --force

Note

This example is only relevant for communication between nodes, for a single host it would be more sense to use docker bridge network and not an overlay swarm network