Resource Queues¶
In order to use all the nodes efficiently and in a fair share fashion, resources are distributed in queues (partitions). Queues group nodes into logical sets, each of which has an assortment of constraints such as job size limit, job time limit, users permitted to use it, etc.
To determine what partitions exist on the system, what nodes they include, and general system state.
use the sinfo
command.
sinfo -s
ARIS queue (partition) overview:
Queue Table¶
PARTITION | DESCRIPTION | AVAIL | TIMELIMIT | NODES | NODELIST |
---|---|---|---|---|---|
compute | Compute nodes w/o accelerator | up | 2-00:00:00 | 426 | node[001-426] |
fat | Fat compute nodes | up | 2-00:00:00 | 24 | fat[01-24] |
taskp | Fat compute nodes (HyperThreading: ON) | up | 2-00:00:00 | 20 | fat[25-44] |
gpu | GPU accelerated nodes | up | 2-00:00:00 | 44 | gpu[01-44] |
phi | MIC accelerated nodes | up | 2-00:00:00 | 18 | phi[01-18] |
- compute queue: Is intended to run parallel jobs on the THIN compute nodes.
- fat queue: Is dedicated to run parallel jobs on the FAT compute nodes.
- taskp queue: (TaskParallel queue) Is intended to run multiple serial jobs on the FAT compute nodes. Available threads per node 80.
- gpu queue: Provides access on the GPU accelerated nodes.
- phi queue: Provides access on the PHI accelerated nodes.
The scontrol
command can be used to report more detailed information partitions and configuration
:$ scontrol show partition
PartitionName=compute
AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=YES
DefaultTime=NONE DisableRootJobs=NO GraceTime=0 Hidden=NO
MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
Nodes=node[001-426]
Priority=1 RootOnly=NO ReqResv=NO Shared=NO PreemptMode=OFF
State=UP TotalCPUs=8520 TotalNodes=426 SelectTypeParameters=N/A
DefMemPerNode=UNLIMITED MaxMemPerNode=57344
PartitionName=fat
AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=NO
DefaultTime=NONE DisableRootJobs=NO GraceTime=0 Hidden=NO
MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
Nodes=fat[01-24]
Priority=1 RootOnly=NO ReqResv=NO Shared=NO PreemptMode=OFF
State=UP TotalCPUs=960 TotalNodes=24 SelectTypeParameters=N/A
DefMemPerNode=UNLIMITED MaxMemPerNode=507904
PartitionName=taskp
AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=NO
DefaultTime=NONE DisableRootJobs=NO GraceTime=0 Hidden=NO
MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
Nodes=fat[25-44]
Priority=1 RootOnly=NO ReqResv=NO Shared=NO PreemptMode=OFF
State=UP TotalCPUs=1600 TotalNodes=20 SelectTypeParameters=N/A
DefMemPerNode=UNLIMITED MaxMemPerNode=507904
PartitionName=gpu
AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=NO
DefaultTime=NONE DisableRootJobs=NO GraceTime=0 Hidden=NO
MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
Nodes=gpu[01-44]
Priority=1 RootOnly=NO ReqResv=NO Shared=NO PreemptMode=OFF
State=UP TotalCPUs=880 TotalNodes=44 SelectTypeParameters=N/A
DefMemPerNode=UNLIMITED MaxMemPerNode=57344
PartitionName=phi
AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=NO
DefaultTime=NONE DisableRootJobs=NO GraceTime=0 Hidden=NO
MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
Nodes=phi[01-18]
Priority=1 RootOnly=NO ReqResv=NO Shared=NO PreemptMode=OFF
State=UP TotalCPUs=360 TotalNodes=18 SelectTypeParameters=N/A
DefMemPerNode=UNLIMITED MaxMemPerNode=57344
:$ scontrol show node node426
NodeName=node426 Arch=x86_64 CoresPerSocket=10
CPUAlloc=0 CPUErr=0 CPUTot=20 CPULoad=0.15 Features=(null)
Gres=(null)
NodeAddr=node426 NodeHostName=node426 Version=14.11
OS=Linux RealMemory=57344 AllocMem=0 Sockets=2 Boards=1
State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1
BootTime=2015-07-27T19:34:56 SlurmdStartTime=2015-07-27T22:15:06
CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s