Announcements to CCS Users or User Groups

(Several most recent ones)

Sent Time:2 016-03-26 15:06

Pegasus maintenance complete – March 26, 2016

Dear Pegasus Users,

Pegasus maintenance for Saturday, March 26, 2016 has completed.

If you have any questions, please contact us at hpc@ccs.miami.edu

CCS Advanced Computing
hpc@ccs.miami.edu

Sent Time: 2016-03-25 08:26

CCS Notification Reminder: Pegasus cluster maintenance tomorrow, Saturday, March 26, 2016

Dear Pegasus Users,

This is a reminder that CCS Pegasus cluster will be down for filesystem maintenance tomorrow, Saturday, March 26, 2016, from 8am to 4pm ET. Once the maintenance work is completed, the Users will be notified.

Please schedule your work accordingly. Jobs running at that time will be terminated.

If you have any questions please contact us at hpc@ccs.miami.edu

CCS Advanced Computing
hpc@ccs.miami.edu

Sent Time: 2016-03-18 13:06

Subject CCS Notification: Pegasus cluster maintenance on Saturday, March 26, 2016

Dear Pegasus Users,

Pegasus will be down for filesystem maintenance on Saturday, March 26, 2016, from 8am to 4pm ET. We will send an “all clear” message once the work is completed.

Please schedule your work accordingly. Jobs running at that time will be terminated.

If you have any questions please contact us at hpc@ccs.miami.edu

CCS Advanced Computing
hpc@ccs.miami.edu

 

Sent Time: 2015-10-20 17:48

Subject CCS Notification: Pegasus cluster maintenance downtime, October 24, 2015, 9am-5pm

Dear Pegasus Cluster users,

This is a reminder about the maintenance of the Pegasus
cluster scheduled for October 24 (Saturday), 2015, between 9am – 5pm.
Pegasus cluster will be unavailable during that time.
Please keep this in mind when planning your upcoming jobs!

Thank you,

Advanced Computing/HPC Team
hpc@ccs.miami.edu

Sent Time: 2015-07-29 12:28

Subject:CCS Notification: Pegasus scratch hardware maintenance

All Pegasus cluster users:

Storages /scratch and /projects/scratch are currently undergoing emergency hardware maintenance that may degrade performance. While we do not anticipate any loss of access, intermittent interruptions are possible.

Please remember that /scratch and /projects/scratch are not for storage of data and are only to be used during processing. After processing you must remove all data from /scratch and /projects/scratch.

These filesystems are not backed up and are subject to purging as necessary. Never store critical data in scratch space.

Thank you for your patience,
CCS/HPC Administration
hpc@ccs.miami.edu

Sent Time: 2015-07-17 19:46

Subject: CCS Notification: Pegasus cluster system is open

Dear CCS Pegasus system users,

Pegasus cluster is released back to operation. Access to all the storage locations, including /projects/scratch/ space is restored. Job queues are open.
Please note that some filesystem maintenance is still ongoing and may affect performance at this time.

Thank you for your patience,

CCS HPC Administration
hpc@ccs.miami.edu

Sent Time: 2015-07-14 16:19

Subject: CCS Notification – Visx system is open

Dear CCS Visx users,

Visx is back to operation, logins are opened and the access to /nethome and /projects restored.
Scratch space (/projects/scratch) is not available at this time; it will be back on Friday, July 17, at 6pm, or sooner.

Thank you for your patience,

CCS HPC Administration
hpc@ccs.miami.edu

Sent Time: 2015-07-14 10:24

Subject: CCS Notification: Pegasus cluster login open

Dear CCS systems users,

Pegasus cluster login nodes are open now, and the access to /nethome and /projects restored.
Scratch space (/projects/scratch) is not available yet, and the job queues remain closed.
Normal Pegasus operations are scheduled to resume on Friday, July 17, at 6pm.

Thank you for your patience,

CCS HPC Administration
hpc@ccs.miami.edu

 

Sent Time: 2015-07-10 16:35

Subject: CCS Notification: HIHG systems and network maintenance (Monday, July 13, 2015, 6pm)

Dear CCS HIHG systems users,

On July 13 from 6-10pm, Networking will be performing maintenance on our core switch. All equipment on the CCS network will be intermittently inaccessible during this time. We do not anticipate any one outage lasting longer than 30 minutes.

CCS samba storage (K:,P:,Q:,R) will be affected.

Please reference the system list below for more details:

mendel1-4
inti (legacy data)
crick (dev server)
bacchus (Oracle Backups)
neo (bacchus storage)
allele (CCS_FC CCS_FS) (pclinic, tclinic, dclinic)
oracle server (plab, prost)
chimera (analysis apps)
trinity (dev and testing)
perseus (glassfish)
s1-4
limsdb
lims1
lims2

Thank you for your patience,

CCS HPC Administration
hpc@ccs.miami.edu

 

Sent Time: 2015-07-09 18:16

Subject: CCS Notification: Apollo system maintenance to begin on Friday, July 10, 6pm

Dear CCS Apollo users,

Apollo system will be affected by the scheduled Pegasus cluster maintenance, on the following anticipated schedule:

* Friday, July 10 at 6pm: Logins to Pegasus and Apollo are disabled and current sessions terminated.
* Monday July 13 at 8am: Apollo nodes will be up. No jobs could be submitted, apollo queue closed.
* Tuesday, July 14 at 8am: Pegasus login nodes will be up, and the access to /nethome and /projects will be open. Scratch space will not be available at this time.
* Friday, July 17 at 6pm: Pegasus will resume normal operations. All Pegasus and apollo queues are open.

 

Thank you for your patience,

CCS HPC Administration
hpc@ccs.miami.edu

Sent Time: 2015-07-09 17:54

Subject: CCS Notification: Visx cluster and system maintenance (Friday, July 10, 2015, at 6pm)

Dear CCS Visx system users,

Visx cluster is scheduled for maintenance between Friday, July 10 6pm – Weds, July 15, 8am. Every effort will be made to restore service sooner if possible.

Pegasus queues will remain closed until Pegasus cluster maintenance is completed – Friday, July 17 at 6 pm.

Thank you for your patience,

CCS HPC Administration
hpc@ccs.miami.edu

Sent Time: 2015-07-09 17:30

Subject: Subject: CCS Notification: Reminder – system maintenance starts Friday, July 10, 2015, at 6pm

Dear CCS systems users,

Pegasus systems maintenance will begin July 10, 2015 at 6pm (Friday) and will continue into the following week (July 13 – 17). The anticipated timeline is as follows:

* Friday, July 10 at 6pm: Login and compute nodes will be down. Current sessions and submitted jobs will be terminated at this time.
* Tuesday July 14 at 8am: Login nodes will be up. The job queues will remain closed. Access to /nethome and /projects will be restored. Scratch space will not be available at this time.
* Friday, July 17 at 6pm: Pegasus will resume normal operations.
Upgrade plan:

LSF cluster version upgrade
Network router firmware upgrades and maintenance
Storage firmware upgrades and hardware maintenance

Thank you for your patience,
CCS HPC Administration
hpc@ccs.miami.edu

Sent Time: 2015-07-02 13:21

Subject: CCS Notification: UPDATE – maintenance RESCHEDULED to begin on Friday, July 10, 2015, at 6pm

Dear CCS systems users,

CCS systems maintenance and upgrades downtime has been RESCHEDULED to begin on July 10, 2015 at 6pm (Friday).

Maintenance will continue into the following week (July 13 – 17), and systems will be released as early as possible. CCS users will be notified upon release.

 

Pegasus cluster systems, visx, and apollo will be unavailable during this time.

All logins will be disabled, current sessions and submitted jobs to Pegasus will be terminated.

Storage space will be inaccessible for Pegasus system users during the maintenance.

Logins to visx and apollo may become available on Wednesday, but job submissions to these queues will be delayed until Pegasus cluster maintenance is completed.

Tentative upgrade plan:

Pegasus operating system upgrade to Centos 6.6

LSF cluster version upgrade

Network router firmware upgrades and maintenance

Storage firmware upgrades and hardware maintenance

 

Thank you for your patience,

CCS HPC Administration

hpc@ccs.miami.edu

 

Sent Time: 2015-06-25 16:47

Subject: CCS Reminder – Pegasus Cluster Scratch Space

Pegasus scratch space is for limited-duration, high-performance data storage for running jobs or workflows on the cluster only. Scratch is not designed to store any data outside those used for current processing.
—Job results on scratch should be copied to your own space and temporary data should be deleted as soon as possible.—

Please remember, scratch filesystems are engineered for capacity and high performance. They are not protected from any kind of data loss.

—Do not store important data on /scratch or /project/scratch.—

Scratch is space in which to run your jobs, not storage for data, applications, or other files in between runs or for other purposes.

Sent Time 2015-06-22 15:10

Subject: CCS Notification: cluster systems maintenance: July 6-10, 2015

Dear CCS systems Users,

CCS systems maintenance and upgrades downtime is scheduled for the week of July 6, 2015 – July 10, 2015, and will affect the cluster systems as following:

Pegasus cluster will be unavailable July 6, ~8am (Monday) – July 10 ~5pm (Friday).

All logins will be disabled, current sessions and submitted jobs to Pegasus will be terminated.
Storage space will be inaccessible for Pegasus system users during the maintenance.

Visx and apollo systems will be unavailable July 6, ~8am (Monday) – July 7 ~10pm (Tuesday). Login to these systems will become available on Wednesday, but the job submission to apollo and visx queues will be delayed until Pegasus cluster maintenance is completed.

Tentative upgrade plan:

Pegasus operating system upgrade to Centos 6.6

LSF cluster version upgrade

Network router firmware upgrades and maintenance

Storage firmware upgrades and hardware maintenance

Thank you,

CCS HPC Administration

hpc@ccs.miami.edu

Sent Time 2015-05-01 17:49

Subject: CCS Pegasus Cluster Project Management Update

Dear {USER},

Project-based resource management has been implemented today on the Pegasus cluster. Please review the following updates, which may affect your computing jobs on the Pegasus cluster:

* Project-based management requires every job be associated with a project.

* If you are already a member of a project, please start submitting jobs associated with your project (instructions in link below).

* If your account is bounded to only one project, your job will be associated with your project automatically.

* If you are member of multiple projects, please specify your project in your job script or on your command line. Otherwise your job will be submitted to ‘default’ project. After May 10, 2015, these jobs will be rejected.

* If you are not a project member, your computing jobs will be placed in a ‘default’ project.

* Jobs in the ‘default’ project are subject to stricter resource limits. After May 10, 2015, all Jobs in this project will share up to 1000 CPU’s, which may cause longer wait times.

* Any faculty or PI member can request projects and resource allocations through the HPC Portal site.

* Any CCS user may join a project by making a request to the project leader through HPC portal site.

Instructions for Project Management: http://ccs.miami.edu/hpc/doc/pm
HPC Portal: http://portal.ccs.miami.edu/portal

If you have any questions, please contact us at hpc@ccs.miami.edu.

CCS/HPC Team​

Sent Time: 2015-05-01 17:27

Subject: Pegasus Cluster Project Resource Management Update

Dear {USER},

The new project scratch policy has been implemented today on the Pegasus cluster. Please review the following changes to your Pegasus account:

1. Once you are a project member, your data in personal scratch space /scratch/{USER_ID} on the Pegasus cluster becomes read-only.

2. You will not be able to write new data to personal scratch space. If you have running jobs using this scratch space, please stop them and update your script to use your project scratch space at /projects/scratch/PROJECT_ID (see Projects link below).

3. If you would like to keep using your personal scratch data, please migrate to your project scratch space. All scratch data is subject to purge by the Pegasus system purging tool after 21 days.

For more Pegasus information:

Policies: http://ccs.miami.edu/hpc/policies
Documentation: http://ccs.miami.edu/hpc/doc
Projects: http://ccs.miami.edu/hpc/doc/pm

If you have any question or need assistance, please contact us at hpc@ccs.miami.edu.

Thanks.

CCS/HPC Team

Sent Time: 2015-04-13 09:29

Subject: CCS HPC Project Management Migration Update – Scratch Space

Dear {USER},

You are receiving this email notice because you are a member of one or more projects registered in the CCS account system.

Personal scratch space on the Pegasus cluster will be replaced by project scratch space for all project members on May 1st, 2015, as part of the project-based resource management migration process. Please notice the following updates and take appropriate actions:

1. Each project scratch space has been created at /projects/scratch/PROJECT_ID.
2. Your personal scratch space /scratch/{USER_ID}, if exists, will become read only on May 1st, 2015 and will be purged after 21 days, on May 22nd, 2015.
3. Project scratch space is shared by all members of that project and subjected to that project’s space quota.
4. Project members have full read/write access to that project’s scratch space.
5. Project scratch space is subjected to the same 21-day purging policy.

Please follow the procedures below to make a smooth transition to your new project scratch space:

1. Login to HPC portal site and find out your projects
a. Click ‘My Pegasus’ to view your group list
b. Find any groups that are marked as ‘Project’
2. Login to Pegasus cluster to make sure your project’s scratch space has been created and that you can read/write to your project space.
3. If you are a project leader, please check your project quota size on HPC portal under ‘My Pegasus’ -> ‘Projects’. If this is not correct, please click the ‘Update’ link next to quota size number to submit a correction request. As a project leader, you can also check and update your project members from the same page.
4. In your computing jobs, replace personal scratch space path with project scratch space path.
5. Start migrating your data from personal scratch to project scratch space, if you have personal scratch data at /scratch/{USER_ID}. If you have multiple projects, talk to your project leaders to find the appropriate project for your data.

The HPC portal site is hosted at http://portal.ccs.miami.edu. It is protected by the University firewall. You need to be on a secure campus network to access it.

If you have any questions or concerns, please let us know at hpc@ccs.miami.edu

Thank you for your cooperation.
CCS/HPC

Sent Time 2015-04-08 13:08

Subject: CCS Notification to Visx users: maintenance on Sat., April 11, 9pm – midnight

Dear Visx Users,

Visx will be intermittently unavailable during filesystem maintenance on Saturday April 11, from 9pm to 12am.
If you have questions or concerns, please contact us as hpc@ccs.miami.edu.

CCS/HPC Team

Sent Time: 2015-04-01 14:25

Subject: CCS HPC Project Based Cluster Resource Management

CCS Users,

CCS HPC will soon implement project-based resource management on the Pegasus cluster. The new management system is scheduled for deployment Friday, May 1st, 2015. Updated project features:

* Project-based management requires every job be associated with a project.
* If you are already a member of a project, please start submitting jobs associated with your project (instructions in link below).
* If you are not a project member, your computing jobs will be placed in a default project with limited resources.
* Any faculty or PI member can create projects and request resource allocations through the HPC Portal site.
* Any CCS user may join a project by making a request to the project leader.

Instructions for project management can be found at http://ccs.miami.edu/hpc/?page_id=6548

If you have any questions, please contact us at hpc@ccs.miami.edu.

CCS/HPC Team

Sent Time: 2015-03-06 17:00

Subject: CCS Notification to INSARLAB group members: /scratch space changes

Hello,

To improve management of group project space, the insarlab group space location will change from /scratch/userid to /projects/scratch/insarlab.

What this means for you:

* Going forward, please cease writing to /scratch; use /projects/scratch/insarlab instead
* Existing data on /scratch may remain for 21 days (maximum allowed time for data on /scratch)
* After 21 days (March 27th), access to /scratch will be restricted to read-only for the insarlab group and remaining data will be purged according to the 21-day policy

Please have all data migrated to /projects/scratch/insarlab by March 31st.

If you have any questions or concerns, please let us know at hpc@ccs.miami.edu

Thank you for your cooperation.
CCS/HPC

=====================================

Sent Time: 2015-02-11 10:05

Subject: REMINDER: Research Network Unavailability on February 12, 7pm-10pm

Dear CCS Users,

This is a reminder about the scheduled Research Network outage on February 12, 2015 (Thursday) about 7pm-10pm. The UM IT network team will be taking down the Research Network in order to install a new firewall. This firewall will allow unblocked 10Gb/sec external access to both Internet2 and the commercial Internet.

During this time, network access to CCS resources will be unavailable. This will include the Pegasus cluster, Visx system and all storage servers. This outage is expected to last no more than 3 hours in duration.

Jobs running on the Pegasus cluster will not be affected, but you may not be able to reach the cluster during this time.

Please make a note of this upcoming network outage and plan accordingly.

NETWORK OUTAGE Date: Thursday, February 12, 2015.
Time: 7pm – 10pm

Best regards,
CCS HPC Administration
hpc@ccs.miami.edu
=====================================

Sent time: 2015-02-04 12:45

Subject: Research Network Unavailability: February 12, 7pm-10pm

Dear CCS Users,

Research Network will be unavailable between about 7pm-10pm on February 12, 2015 (Thursday). The UM IT network team will be taking down the Research Network in order to install a new firewall. This firewall will allow unblocked 10Gb/sec external access to both Internet2 and the commercial Internet.

During this time, network access to CCS resources will be unavailable. This will include the Pegasus cluster, Visx system and all storage servers. This outage is expected to last no more than 3 hours in duration.

Jobs running on the Pegasus cluster will not be affected, but you may not be able to reach the cluster during this time.

Please make a note of this upcoming network outage and plan accordingly.

NETWORK OUTAGE Date: Thursday, February 12, 2015.
Time: 7pm – 10pm

Best regards,
CCS HPC Administration
hpc@ccs.miami.edu
=====================================

Sent Time: 2015-02-02 16:38

Subject CCS Software Available: Debugging and Profiling Tools

Dear CCS Users,

Debugging and Profiling of parallel applications on Pegasus2 cluster are now available using the AllineaTools (http://www.allinea.com/products), to help Users with their needs. The Tools include:

* DDT – Distributed Debugging Tool (debug and visualize the workflow using intuitive GUI interface)
* MAP – profiling application for programs that use Message-Passing Interface/MPI (with GUI interface).
* PerformanceReports – MPI profiling summary (command-line option)

The tools DDT and MAP are available under module allinea/4.2.1;
PerformanceReports are available upon loading the module allinea/4.2-PR.

The server license for MAP and DDT allows debugging and profiling of applications using up to 72 processors total/max 64 concurrent users; PerformanceReports’ license is for 64 processors/64 users.

Quick guide on step-by-step setup and examples for Pegasus users, for each of the AllineaTools, are available from the CCS HPC page http://ccs.miami.edu/hpc/:
Documentation (top menu) –> Debugging and Profiling –> AllineaTools –>
–> DDT (http://ccs.miami.edu/hpc/?page_id=5481 )
–> MAP (http://ccs.miami.edu/hpc/?page_id=5545 )
–> PerformanceReports (http://ccs.miami.edu/hpc/?page_id=5549 )

Tutorials by AllineaSoftware for the general audience are also available on YouTube, e.g.:
· Allinea DDT Tutorial – Start an MPI program (http://youtu.be/SVjizuo3x6A )
· Allinea DDT Tutorial – Work with variables (http://youtu.be/iDM0WJgn1m0 )
· Allinea DDT : Visualize variables in a parallel environment (http://youtu.be/k41FJY1I6aI )
· Allinea DDT Tutorial – Enable memory debugging (http://youtu.be/g_ZTg-Ek6nI )
· Welcome to Allinea MAP (http://youtu.be/WjCLLwy1Brc )
· Demo: Allinea Performance Reports (http://youtu.be/OaiBPum4GQw )

Please feel free to start using these tools for your debugging and profiling needs, and let us know of any questions!
CCS HPC Administration,
hpc@ccs.miami.edu
===========================================

Sent Time: 2015-01-29 14:18

Subject: RESCHEDULED: Research Network Downtime

Dear CCS Users,

The maintenance of the Research Network that was scheduled to take place on January 29 (Thursday) at 7pm has been postponed. We will notify the users about the maintenance taking place at a later time.

We apologize for any inconvenience this may cause, and thank you for your understanding.
CCS HPC Administration
hpc@ccs.miami.edu
=====================================

Sent Time 2015-01-27 16:23

Subject: CCS Notification: VISX system and storage issues resolved

Dear Visx Users,

Please note that the recent problems with the visx storage on GPFS filesystems have been resolved.
The following GPFS volumes are up and accessible:

/aclement
/bkirtman2
/villy3
/bkirtman3
/famelung
/ikamenkovich2
/bsoden2
/bkirtman4
/miskandarani
/tamay2

Thank you,
CCS HPC Administration.
hpc@ccs.miami.edu
========================

Sent Time: 2015-01-26 11:22

Subject: Research Network Unavailability: January 29, 7pm-10pm

Dear CCS Users,

Beginning at 7pm on January 29, 2015 (Thursday), the UM IT network team will be taking down the Research Network in order to install a new firewall. This firewall will allow unblocked 10Gb/sec external access to both Internet2 and the commercial Internet.
During this time, network access to CCS resources will be unavailable.
This will include the pegasus cluster, visx system and all storage servers. This outage is expected to last no more than 3 hours in duration.

Jobs running on the pegasus cluster will not be affected, but you may not be able to reach pegasus during this time.

Please make a note of this upcoming outage and plan accordingly.

OUTAGE: Date: Thursday January 29th
Time: 7pm – 10pm

Best regards,
CCS HPC Administration
hpc@ccs.miami.edu
=====================================

Sent Time: 2015-01-16 20:38

Subject: CCS Notification: VISX system and storage

Dear visx Users,

Continuing problems with the visx system and filesystems indicate possible hardware issues. To protect the data storage volumes that could to be affected, we are currently disconnecting/unmounting the storages from the list below, until further notice. We are actively working on the problem but have not isolated the cause.

The following GPFS volumes to be unmounted:

/aclement
/bkirtman2
/villy3
/bkirtman3
/famelung
/ikamenkovich2
/bsoden2
/bkirtman4
/miskandarani
/tamay2

Please note that if accessing data from these volumes is critical at the moment by any reason, a particular drive could be connected back, but the safeness/preservation of the data could then be compromised.

We will notify of any further changes or findings related to the system.
Thank you for your understanding and cooperation,
CCS HPC Administration.
hpc@ccs.miami.edu
========================

Sent Time: 2015-01-14 12:02

Subject: CCS notification to VISX system Users: storage issues

Dear visx system Users,

An issue with visx storage has been detected, and we currently investigate the likely failure in the Storage Area Network (SAN). User data is accessible now, but the performance involving storage could not be optimal.

Thank you for your patience and reporting errors as they emerge,
We will notify users when investigation is finished and/or the problem fixed.
Please notify us of any issues: hpc@miami.edu.
CCS HPC Administration
=================

Sent Time: 2015-01-12 10:48

Subject: [UPDATE:] CCS systems maintenance completed

Dear CCS Systems Users,

The maintenance and upgrades of the CCS systems and DDN storage is completed.

* Pegasus2 operating system updated to Centos 6.5 (was: Centos 6.2);
* DDN storage systems firmware updated, storage expanded:
- Update SS7000 Enclosure FW to 05.09.04;
- Update SFA to SFAOS 2.2.1.3 ;
- Update GRIDScaler to GS 2.2.3;
* GPFS filesystem updated to Version 3.5.0-21 ;
* Infiniband HCA updated to fw V2.31.2820;
* OFED updated to V2.2.1.0;
* Discontinued support of the openmpi/1.6.2 build for the new operating system;
* Discontinued support of Matlab/R2012a.

Please note that the HPC team has updated the operating system on Pegasus2. This update may cause some unexpected inconsistencies in programs compiled in your home directories. In case you run into any issues, please send an e-mail to hpc@ccs.miami.edu.

Thank you again for your patience during this significant upgrade,
CCS HPC Team.
hpc@ccs.miami.edu
=======================================

Sent Time: 2015-01-09 12:20

Subject: CCS system maintenance update

Dear CCS Systems Users,

The maintenance and upgrades of CCS systems is almost complete. The following services are now restored and functional:

* Windows shares are up
* Regulus service is up
* Gridnas service is us
* /nethome is available
* /scratch is available
* /projects are available

Pegasus2 system: data is available but queues are not open.

We will be sending additional information as we finalize our work.

We appreciate your patience during this significant upgrade period.
CCS HPC Team.
hpc@ccs.miami.edu
=====================================