Docker Containers¶
Run your docker image¶
It’s simple , you can run any docker image
docker run centos:7
Bind directories¶
Make paths /users/<user_dir_path>
and /work/<work_dir_path>
available inside the container.
docker run -v /users/<user_dir_path>:/users/<user_dir_path> -v /work/<work_dir_path>:/work/<work_dir_path> centos:7
Commit changes in the docker image¶
Find your running container id
docker ps
Then commit the changes to a new image
- -m : commit message
- -a : author name
- CONTAINER_ID : the running container id
- REPOSITORY: usually your Docker Hub username
- NEW_IMAGE_NAME: set a new name
docker commit -m "Added users, work directories" -a "Author Name" <CONTAINER_ID> <REPOSITORY>/NEW_IMAGE_NAME
Volumes¶
docker run -v /users/<user_dir_path>:/users/<user_dir_path> -v /work/<work_dir_path>:/work/<work_dir_path> centos:7
MPI¶
Running MPI jobs in a docker container will require some work.
Prepare the MPI container¶
We will need an image that provides the MPI runtime and tools along with an OpenSSH server so that multiple
containers can be linked together and used via mpirun
.
For ease of use , we are using a image including all that from this repo: https://github.com/oweidner/docker.openmpi
Prepare the Cluster¶
In order to have docker containers to communicate with MPI, we need first to setup a virtual cluster of docker containers. (swarm) That resides inside the SLURM
#!/bin/bash -l
###############################
#SBATCH --job-name=swarm
#SBATCH --output=swarm.%j.out
#SBATCH --error=swarm.%j.err
#SBATCH --ntasks=2
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=1
#SBATCH --time=00:30:00
#SBATCH --account=testproj
#SBATCH --partition=el7fat
#SBATCH --exclusive
###############################
MANAGER=$(hostname)
MANAGER_IP=$(hostname -i)
SERVICE_NAME=mpi
SERVICE_NETWORK=mpi-network
ssh fat44 hostname
ssh fat43 hostname
#scontrol show hostname $SLUMR_NODELIST > workers #hostnames
WORKERS=$(echo $SLURM_NODELIST | cut -d "," -f 1 --complement)
#NWORKERS="$("$SLURM_NNODES"-1)"
echo $WORKERS
echo $NWORKERS
echo "Creating new SWARM cluster"
#docker swarm init --advertise-addr $MANAGER_IP --force-new-cluster
docker swarm init --advertise-addr $MANAGER_IP
echo "Node $MANAGER inits swarm and becomes a Manager"
sleep 10 #wait for cluster initialization
echo "Creating an overlay network"
docker network create --driver overlay --subnet 172.32.0.0/24 $SERVICE_NETWORK
sleep 5 #wait for network initialization
MANAGER_TOKEN=$(docker swarm join-token manager | grep token | awk '{print $5 }')
WORKER_TOKEN=$(docker swarm join-token worker | grep token | awk '{print $5 }')
#Join workers
ssh fat44 docker swarm join --token ${WORKER_TOKEN} ${MANAGER_IP}:2377
sleep 15 #wait for all worker nodes to be active
docker node demote fat44
#Node info
docker node ls
docker network ls | grep ${SERVICE_NETWORK}
#Spawn containers on the cluster
docker service create --replicas 1 --name ${SERVICE_NAME} --network ${SERVICE_NETWORK} --publish 2222:22 -d ocramz/docker-openmpi
docker service scale ${SERVICE_NAME}=${SLURM_NTASKS}
# if image is not available on ther nodes this neeed to wait longer until is downloaded
# OR load image manually
#srun docker load -i docker-openmpi.tar
sleep 40
docker service ls
#SERVICE_ID=$(docker service ls | grep ${SERVICE_NAME} | awk '{print $1 }')
#docker service inspect -f "{{.Endpoint.VirtualIPs}}" ${SERVICE_ID}
#RUN MPI on the virtual cluster
srun docker ps
echo "get the manager node container id"
MASTER_ID=$(ssh fat43 docker ps | grep $SERVICE_NAME | awk '{print $1}')
MASTER_IP="$(ssh fat43 docker inspect --format '{{ .NetworkSettings.IPAddress }}' "$MASTER_ID")"
ssh fat43 docker inspect --format '{{ .NetworkSettings.IPAddress }}' $MASTER_ID
SLAVE_ID=$(ssh fat44 docker ps | grep $SERVICE_NAME | awk '{print $1}')
SLAVE_IP="$(ssh fat44 docker inspect --format '{{ .NetworkSettings.IPAddress }}' "$SLAVE_ID")"
echo "MASTER"
echo $MASTER_ID
echo $MASTER_IP
echo "SLAVE"
echo $SLAVE_ID
echo $SLAVE_IP
echo "container hostnames"
ssh fat43 docker exec $MASTER_ID hostname
ssh fat44 docker exec $SLAVE_ID hostname
ssh fat43 docker exec $MASTER_ID whoami
ssh fat43 docker exec $MASTER_ID mpirun -V
ssh fat43 docker exec --user mpirun $MASTER_ID mpirun -np 2 --host ${MASTER_IP},${SLAVE_IP} --mca oob_tcp_if_include 172.32.0.0/24 --mca btl_tcp_if_include 172.32.0.0/24 python /home/mpirun/mpi4py_benchmarks/all_tests.py
#Delete swarm cluster
docker service rm mpi
docker network rm mpi-network
#worker
ssh fat44 docker swarm leave --force
ssh fat43 docker swarm leave --force
Note
This example is only relevant for communication between nodes, for a single host it would be more sense to use docker bridge network and not an overlay swarm network