Actively Running Jobs:

CCR now offers a detailed view into what is happening on the node(s) where your job is running.  You can view the graphs from OnDemand but you do NOT need to submit the jobs from within OnDemand.  Even jobs submitted from the command line using 'sbatch' are available in your list of Active Jobs.


Click on the arrow next to one of your current jobs and at the bottom you will see two graphs for each node your job is running on.  One graph shows CPU metrics and the other memory (RAM) usage.



When you click on the 'Detailed Metrics' link you will be redirected to the Grafana dashboard.  Here you will see very detailed metrics regarding the node(s) your job is running on including CPU, RAM, network, Infiniband/OmniPath, Disk usage and I/O, and GPU info:







We provide how-to guides on job monitoring as well as other options for monitoring running jobs here



Completed Job Information:


Grafana charts

You can access Grafana charts of your completed jobs, like the Active Jobs available in OnDemand, but you need to query Slurm for the appropriate start and end times and get the node list.  To do this, we provide a script that can be run in the terminal that creates the Grafana URL for your job. 


[ccruser@vortex]$/util/common/metrics/ccr-jobview-url [jobid] [cluster]


[ccruser@vortex]$ /util/common/metrics/ccr-jobview-url 10457965 ub-hpc

Paste this link in your browser:

https://dashboard.ccr.buffalo.edu/grafana/d/Vi3oi5gohz/hpc-job-metrics?orgId=1&theme=light&from=1669820527000&to=1669820600000&var-cluster=ub-hpc&var-host=cpn-k08-34-01&var-jobid=10457965


Slurm accounting

Slurm account information is also available and useful depending on what information you're looking for regarding your jobs.  We provide some suggested commands here.


Metrics OnDemand

For metrics about job performance and node usage after your job completes, please use the UBMoD portal.  Job data is usually available 24 hours after a job completes.  More details about UBMod