Resource Queues

In order to use all the nodes efficiently and in a fair share fashion, resources are distributed in queues (partitions). Queues group nodes into logical sets, each of which has an assortment of constraints such as job size limit, job time limit, users permitted to use it, etc.

To determine what partitions exist on the system, what nodes they include, and general system state. use the sinfo command.

sinfo -s

sinfo man page

ARIS queue (partition) overview:

Queue Table

PARTITION DESCRIPTION AVAIL TIMELIMIT NODES NODELIST
compute Compute nodes w/o accelerator up 2-00:00:00 426 node[001-426]
fat Fat compute nodes up 2-00:00:00 24 fat[01-24]
taskp Serial jobs queue up 2-00:00:00 20 fat[25-44]
gpu GPU accelerated nodes up 2-00:00:00 44 gpu[01-44]
phi MIC accelerated nodes up 2-00:00:00 18 phi[01-18]
  • compute queue: Is intended to run parallel jobs on the THIN compute nodes.
  • fat queue: Is dedicated to run parallel jobs on the FAT compute nodes.
  • taskp queue: Is intended to run multiple serial jobs on the FAT compute nodes.
  • gpu queue: Provides access on the GPU accelerated nodes.
  • phi queue: Provides access on the PHI accelerated nodes.

The scontrol command can be used to report more detailed information partitions and configuration

:$ scontrol show partition

PartitionName=compute
   AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
   AllocNodes=ALL Default=YES
   DefaultTime=NONE DisableRootJobs=NO GraceTime=0 Hidden=NO
   MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
   Nodes=node[001-426]
   Priority=1 RootOnly=NO ReqResv=NO Shared=NO PreemptMode=OFF
   State=UP TotalCPUs=8520 TotalNodes=426 SelectTypeParameters=N/A
   DefMemPerNode=UNLIMITED MaxMemPerNode=57344

PartitionName=fat
   AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
   AllocNodes=ALL Default=NO
   DefaultTime=NONE DisableRootJobs=NO GraceTime=0 Hidden=NO
   MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
   Nodes=fat[01-24]
   Priority=1 RootOnly=NO ReqResv=NO Shared=NO PreemptMode=OFF
   State=UP TotalCPUs=960 TotalNodes=24 SelectTypeParameters=N/A
   DefMemPerNode=UNLIMITED MaxMemPerNode=507904

PartitionName=taskp
   AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
   AllocNodes=ALL Default=NO
   DefaultTime=NONE DisableRootJobs=NO GraceTime=0 Hidden=NO
   MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
   Nodes=fat[25-44]
   Priority=1 RootOnly=NO ReqResv=NO Shared=NO PreemptMode=OFF
   State=UP TotalCPUs=1600 TotalNodes=20 SelectTypeParameters=N/A
   DefMemPerNode=UNLIMITED MaxMemPerNode=507904

PartitionName=gpu
   AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
   AllocNodes=ALL Default=NO
   DefaultTime=NONE DisableRootJobs=NO GraceTime=0 Hidden=NO
   MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
   Nodes=gpu[01-44]
   Priority=1 RootOnly=NO ReqResv=NO Shared=NO PreemptMode=OFF
   State=UP TotalCPUs=880 TotalNodes=44 SelectTypeParameters=N/A
   DefMemPerNode=UNLIMITED MaxMemPerNode=57344

PartitionName=phi
   AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
   AllocNodes=ALL Default=NO
   DefaultTime=NONE DisableRootJobs=NO GraceTime=0 Hidden=NO
   MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
   Nodes=phi[01-18]
   Priority=1 RootOnly=NO ReqResv=NO Shared=NO PreemptMode=OFF
   State=UP TotalCPUs=360 TotalNodes=18 SelectTypeParameters=N/A
   DefMemPerNode=UNLIMITED MaxMemPerNode=57344
:$ scontrol show node node426

NodeName=node426 Arch=x86_64 CoresPerSocket=10
   CPUAlloc=0 CPUErr=0 CPUTot=20 CPULoad=0.15 Features=(null)
   Gres=(null)
   NodeAddr=node426 NodeHostName=node426 Version=14.11
   OS=Linux RealMemory=57344 AllocMem=0 Sockets=2 Boards=1
   State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1
   BootTime=2015-07-27T19:34:56 SlurmdStartTime=2015-07-27T22:15:06
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s