We've identified a bug in the new version of SLURM we upgraded to during the last downtime (August 1). It is causing some, but not all, jobs to wash out of the queue without ever starting. We've reported the problem to the vendor and are awaiting their release of a patch to fix the problem. We plan to test this patch on our development cluster and then will deploy it to the production clusters if we encounter no problems. We anticipate the release of the patch for later this week and we'll spend several days testing before upgrading in production.
We have no evidence that this is causing any problems with the jobs that do run. We are very sorry for the inconvenience!
Update 8/17: This problem has been resolved