ARIS¶
Wed Nov 6 14:01 EET 2024
Dear ARIS users,
We have powered off ARIS after a serious incident that took place on 29/10/2024 in the building infrastruct ure where the system is installed. Our initial estimation is that computing and storage have not been affec ted.
Currently, we are working to restore the safety of the area and replace the affected equipment (electrical, cooling and others).
This will take at least 8 weeks. After that, the system will go live again.
Whenever new information about the progress is available, it will will be announced in the ARIS page: https://doc.aris.grnet.gr/
Furthermore, due to the incident the results of the 17th production call as well as any applications for preparatory projects, will be announced after system resume.
The projects that expired during the downtime period will be extended accordingly.
Thank you for your understanding
Wed Oct 30 EET 2024
Since 2024-10-29 17:31 the ARIS system has been shut down as a precaution due to maintainance issues with t he building facilities. Maintenance staff are working to restore conditions and as soon as it is safe, ARIS will resume operation. We will provide updates.
Tue Oct 29 17:31 EET 2024
Cooling system problem. System powered off. Running jobs killed.
Mon Sep 02 18:54 EEST 2024
System Back in production
Mon Sep 02 17:03 EEST 2024
Cooling system problem. All compute nodes powered
off. Running jobs killed.
Mon Aug 13 13:49 EEST 2024
System in production.
Mon Aug 12 13:55 EEST 2024
Due to power grid instability, all compute nodes
powered off. Running jobs killed, they will start
again when system is powered on.
Tue Jun 11 19:13 EEST 2024
System back to production
Thu Jun 06 13:28 EEST 2024
System will be unavailable in the period
2024-06-10 22:00 - 2024-06-11 20:00 for power
infrastructure management.
Jobs not starting with reason ReqNodeUnavail
request time that falls in downtime period.
Mon Apr 30 23:34 EEST 2024
System in full production.
Mon Apr 08 16:30 EEST 2024
System will be unavailable in the period
2024-04-30 12:00 - 2024-05-01 01:00 for building
power management.
Fri Apr 05 15:45 EEST 2024
It is possible to face small network interuptions at 09/04/2024 in the 11:00 - 13:00 window.
Fri Mar 15 18:29 EET 2024
Storage and login available.
Queues will be enabled gradually to ensure storage stability.
Fri Mar 15 06:23 EET 2024
Storage failure. System unavailable.
All running jobs killed.
Sun Jan 14 11:02 EET 2024
System back in full production.
Sun Jan 14 02:15 EET 2024
Power Outage, system went down. All running jobs killed.
Tue Sept 05 12:23:00 EEST 2023
System back in production
Fri Sep 01 10:23 EEST 2023
System will be unavailable in the period
2023-09-05 00:00 - 2023-09-05 17:00 for area power management.
ReqNodeNotAvail means that the walltime request of your job
extends into the upcoming scheduled cluster maintenance and
therefore the scheduler will not run your job before maintenance.
Tue Aug 29 13:00 EEST 2023
System back in production
Tue Aug 29 2023
Compute nodes will be unavailable at 2023-08-31 02:00 - 16:00
Wed May 10 13:02:00 EEST 2023
System back in production
Thu Apr 27 2023
System will be UNAVAILABLE in the period
2023-05-09 23:00 - 2023-05-10 16:00 for area power maintenance
Fri Mar 24 22:00:00 EET 2023
System back in production
Fri Mar 17 2023
System will be unavailable in period
2023-03-24 16:00 - 24:00 for building power maintenance
Fri Oct 06 2022
System will be fully unavailable in the period
2022-10-14 22:00 - 2022-10-15 22:00 for building power
management.
Tue Aug 09 02:21:00 EEST 2022
Area Power Outage. All jobs running in this period killed and restarted after power restore after 2022-08-09 04:25
Sun Apr 10 2022
System will be fully unavailable in the period 2022-05-04 20:00 - 2022-05-05 10:00 for area power management
Fri Feb 04 13:25:00 EET 2022
Power maintenance canceled. System in production.
Thu Feb 03 21:50:00 EET 2022
Possible power mainenance at 2022-02-05 10:00-12:00
Jobs with expected end time after 2022-02-05 09:00
will not start - Reason : ReqNodesUnavail.
Login nodes and storage are not affected
Pending jobs will start after 2022-02-05 12:00
Tue Jan 25 13:30:00 EET 2022
Compute nodes back in production.
Tue Jan 25 04:20:00 EET 2022
Storage and login nodes available, compute nodes
powered off for investigation.
Tue Jan 25 02:40:00 EET 2022
Problems with power/cooling. System Powered off.
Wed Jul 22 2020
System will be unavailable in period 2020-07-23 09:00 - 18:00 for
power mainenance
Wed May 13 2020
System will be unavailable in the period
2020-05-16 23:00 - 2020-05-17 23:00
for area power maintenace,
Fri May 08 2020
mmlsquota command is now available.
Wed May 06 13:00:00 EEST 2020
The network problems have been resolved. The system is
operating normally.
Jobs queued or running in period
06.05.2020 05:00 - 06.05.2020 13:00, failed.
You should resubmit.
Wed May 06 10:00:00 EEST 2020
There are problems with the Infiniband network
switch. Until they are resolved, slurm queues are at
drain status. GPFS filesystems may become unstable.
Fri Mar 20 2020
mmlsquota command will be unavailable until further notice.
As a workaround, you can see your quota usage in the file:
/users/quota/$USER
This file will be updated every 2 hours.
Thu Mar 12 2020
GRNET put in place measures for all staff members in the effort to
slow the spread of COVID-19.
Starting on Thursday, 12 March, all HPC staff is working remotely.
For all technical requests please continue to contact us via email.
Contact details at https://hpc.grnet.gr/en/contact/
Thu Jan 2 10:54:24 EET 2020
System will be unavailable at 02-Jan-2020 at 10:00 - 18:00
Mon May 20 19:01:49 EET 2019
Scheduled system outage (power outage in area) at 2019-05-26.
Jobs that are expected to finish after 2019-05-25 00:00:00 will start after power outage.
Wed Oct 24 13:21:56 EEST 2018
Connections to login.aris.grnet.gr will not be possible on 2018-10-25
between 16:00 and 17:00 because of network maintenance.
The rest of the system and the running jobs will not be affected.
2018-06-27
Scheduled system outage (power outage in area) at 2018-06-28
Jobs that are expected to finish after 2018-06-28 00:00:00 will start after power outage.
System will be unavailable at 2018-05-31 after 10:00 for urgent maintenance.
Thu May 24 17:28:10 EEST 2018
The compute nodes will be unavailable between 15.00 and 20.00 on Friday 25 May 2018 because of a power outage.
Wed Mar 21 17:55:36 EET 2018
All systems are operational
Wed Mar 21 14:11:16 EET 2018
All queque are not available for submission, due to storage problems
Wed Feb 21 15:54:28 EET 2018
Non Scheduled Power outage in building. All compute nodes down.
Fri May 12 14:33:47 EEST 2017
ARIS will be down in period 2017-05-12 15:00 - 18:00 due to unscheduled area power maintenance. Running jobs will be killed
Tue May 9 17:04:55 EEST 2017
ARIS will be unavailable due to area power maintenance in the periods : 2017-05-19 11:00 - 21:00 2017-05-26 22:00 - 2017-05-27 22:00
Wed Dec 14 16:15:55 EET 2016
System will be unavailable in the period 2016-12-16 20:00 - 2016-12-17 20:00
Mon Jul 18 14:57:10 EEST 2016
SLURM is back in production.
Sun Jul 17 20:05:01 EEST 2016
SLURM is unavailable
Mon May 23 10:27:28 EEST 2016
System is reserved for Maintenance in period :
2016-05-26 00:00:00 - 2016-05-29 23:59:59
If a job’s estimated end time falls in the period, job execution
will start after maintenance period.
Fri Apr 15 14:00:58 EEST 2016
The system will be unavailable between 15.30 and 19.30 today
(Friday April 15 2016) because of a power outage.
Tue Mar 22 19:29:38 EET 2016
System Back in Production.
System will be unavailable at 2016-03-22 08:00 - 20:00
for System Maintenance and Upgrade.
Please schedule your jobs accordingly
Wed Mar 16 14:41:35 EET 2016
All the compute nodes have been powered on. The system is back in production
Wed Mar 16 12:21:55 EET 2016
Because of a problem with the /work (/dev/gpfs0) filesystem, all systems including the login nodes are currently down. We apologise for any inconvenience.
Tue Mar 15 19:25:14 EET 2016
All the compute nodes have been powered on. The system is back in production
Tue Mar 15 17:33:03 EET 2016
THERE IS A POWER OUTAGE. ALL COMPUTE NODES HAVE BEEN SHUT OFF.
Tue Nov 17 18:30:08 EET 2015
ARIS cooling system problem resolved, ARIS is back in production since 2015-11-17 18:30. Thank you for your understanding.
Mon Nov 16 13:08:08 EET 2015
There is an ambient temperature problem because of A/C failure, all compute nodes have been shutdown. We apologise for any inconvenience.