ACCRE Home

Enabling Researcher-Driven Innovation and Exploration


Vanderbilt Home
How to Check the Status of a Submitted Job

There are several Moab and Torque commands you can use to monitor the status of your submitted job, the job scheduler, and the job queue. The commands you will use reside in "/usr/scheduler/moab/bin/" and "/usr/scheduler/torque/bin/". The most commonly used commands have manual pages (e. g., type "man checkjob" at the command line).

Our FAQ page provides links to documentation for these commands. You will also find in the FAQ solutions to some of the common problems users encounter when submitting jobs. There is also in-depth material in our training presentations which we encourage you to print out and keep as a desktop references.

In addition, you should read our scheduler policies to inform yourself of limitations we apply to help ensure efficient use of cluster resources which is also fairly distributed among users.

  • To display the jobs currently in the queue waiting to be run issue the Moab command:

    showq

    The output is frequently hundreds of lines long. Pipe it to less to display the information one screen at a time or to grep on a string to extract particular lines of the output, e. g.,

    showq | less

    showq | grep username

  • A PBS command used to display the jobs currently in the queue waiting to be run:

    qstat

    The output from the qstat command provides the following information:

    • "Job ID" (job_identifier)
    • "Name" (submitted PBS script name)
    • "User" (submit user name)
    • "Time Use" (running time)
    • "S" (job status: "R" means running, "Q" means queued)
    • "Queue" (queue name)

    To display the status of a specific job that you have submitted, include the job identifier number. For example:

    qstat 28671

  • To delete a job that you have submitted to the queue:

    qdel <jobID>

  • To query the cluster node status and list some basic information about each of the nodes:

    pbsnodes -a

  • To get more information about your job status and estimated start/complete times:

    checkjob -v <jobID>

    will give you information on the availability of nodes which match your request.

    showstart <jobID>

    will give you an estimated start time for you job, which can change for a variety of reasons, such as, if more jobs with higher priority are submitted, or if the node the job is assigned to goes down. If your job cannot run because you've requested resources that don't exist, this command will state it cannot determine a start time.

  • Our job scheduler policies describe why we must manually kill jobs which exceed their requested memory. The following are ways one can determine how much memory they should request.

    • Checking the memory usage of a running job: First log onto the node your job is running on. You can use the Linux commands ps or top to find the Linux process ID <PID> of your job. Then use the Linux pmap command:

      pmap <PID>

      The last line of the output gives the total memory usage of the running process. For more information read the online manual pages:

      man ps
      man top
      man pmap

    • Checking the memory usage of a completed job: After logging onto the cluster, ssh to vmpsched where the job logs are stored. Then use:

      tracejob <jobID>

  • As discussed previously in Sample Job Run Examples, while a job is running, the standard out and standard error are written to a temporary file on the node your job is running on. They are then copied over to the file you specified with:

    qsub -o filename

    only after the job completes (in order to lighten the load on the system network).

    If you wish to access the contents of this temporary file while your job is still running, ACCRE staff have written a script called qcat which will locate and read the temporary file and then display its current contents to your screen.

    qcat usage:

    Usage: qcat [-e [-o]] [-f] JOB_ID ...
    Display the standard output (default) or the standard error (-e option)
    or both (-e -o options) of a PBS running job.
    If -f is specified qcat output appended data as the file grows.

  • If you're having a problem running jobs, please also refer to our FAQ and review our cluster policies. If you cannot find the solution to your problem, please submit an RT ticket and we can help diagnose it. Include in your request as much information you can garner from the above commands. Whenever you can, include the gateway machine you were logged onto, your jobIDs, and the nodes the jobs were sent to. Without this information we may not be able to help solve your problem until you resubmit a new job. If the problem is related to bugs in the code you are running, we also may not be able to help solve your problem.

Please continue to how to check fairshare usage.