- 21 Jun, 2012 2 commits
-
-
Morris Jette authored
-
Morris Jette authored
The underlying problem is in the sched plugin logic in SLURM v2.4
-
- 20 Jun, 2012 4 commits
-
-
Danny Auble authored
but not node count the node count is correctly figured out.
-
Morris Jette authored
Without this fix, gang scheduling mode could start without creating a list resulting in an assert when jobs are submitted.
-
Morris Jette authored
This change permits a user to get a zero size allocation by specifying a task count of zero with no node count specification.
-
Morris Jette authored
-
- 18 Jun, 2012 3 commits
-
-
Danny Auble authored
-
Danny Auble authored
packing the step layout structure.
-
Danny Auble authored
we must use a small block instead of a shared midplane block.
-
- 15 Jun, 2012 2 commits
-
-
Danny Auble authored
-
Morris Jette authored
-
- 13 Jun, 2012 4 commits
-
-
Danny Auble authored
still messages we find when we poll but haven't given it back to the real time yet.
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
- 12 Jun, 2012 3 commits
-
-
Danny Auble authored
-
Nathan Yee authored
-
Danny Auble authored
-
- 11 Jun, 2012 2 commits
-
-
Danny Auble authored
-
Martin Perry authored
-
- 07 Jun, 2012 1 commit
-
-
Danny Auble authored
-
- 05 Jun, 2012 4 commits
-
-
Phil Eckert authored
I was doing some checking to find out why the the 2.4 branch and master branch of schedmd was not allowing held jobs to be modified, when attempting to do so, scontrol would return: slurm_update error: Requested partition configuration not available now I did some debugging and found that it was caused by code added to the tail end of job_limits_check() in job_mgr.c. It had this addition: } else if (job_ptr->priority == 0) { /* user or administrator hold */ fail_reason = WAIT_HELD; } It is causes all modifications done by scontrol on held jobs, to fail.
-
Don Lipari authored
I'd like to propose quieting down the job_mgr a tad. This is a refinement to: https://github.com/SchedMD/slurm/commit/30a986f4c600291876f4ec3e3949934512f2cba5
-
Danny Auble authored
a job kill timeout aren't always reported to the system. This is now handled by the runjob_mux plugin.
-
Danny Auble authored
-
- 04 Jun, 2012 1 commit
-
-
Rod Schultz authored
I'd like to add the following disclaimer to the documentation of the --mem option to the salloc/sbatch/srun commands. There is currently similar wording in the slurm.conf file, but I've received a bug report in which the memory limits were exceeded (until the next accounting poll). NOTE: Enforcement of memory limits currently requires enabling of accounting, which samples memory use on a periodic basis (data need not be stored, just collected). A task may exceed the memory limit until the next periodic accounting sample. Rod Schultz, Bull
-
- 01 Jun, 2012 4 commits
-
-
Danny Auble authored
-
Danny Auble authored
sub-blocks.
-
Danny Auble authored
-
Danny Auble authored
to make a larger small block and are running with sub-blocks.
-
- 31 May, 2012 2 commits
-
-
Danny Auble authored
function didn't always work correctly.
-
Danny Auble authored
rerun autogen.sh
-
- 30 May, 2012 3 commits
-
-
Danny Auble authored
the next step in the allocation only uses part of the allocation it gets the correct cnodes.
-
Morris Jette authored
-
Andy Wettstein authored
In etc/init.d/slurm move check for scontrol after sourcing /etc/sysconfig/slurm. Patch from Andy Wettstein, University of Chicago.
-
- 29 May, 2012 1 commit
-
-
Don Lipari authored
-
- 25 May, 2012 3 commits
-
-
Morris Jette authored
According to man slurm.conf, the default for NodeAddr is NodeName: "By default, the NodeAddr will be identical in value to NodeName." However, it seems the default is NodeHostname (when that differs from NodeName): With the following in slurmnodes.conf: Nodename=c0-0 NodeHostname=compute-0-0 ... I get NodeName=c0-0 Arch=x86_64 CoresPerSocket=2 CPUAlloc=0 CPUErr=0 CPUTot=4 Features=intel,rack0,hugemem Gres=(null) *** NodeAddr=compute-0-0 NodeHostName=compute-0-0 *** OS=Linux RealMemory=3949 Sockets=2 State=IDLE ThreadsPerCore=1 TmpDisk=10000 Weight=1027 BootTime=2012-05-08T15:07:08 SlurmdStartTime=2012-05-25T10:30:10 (This is with 2.4.0-0.pre4.) (We are planning to use cx-y instead of compute-x-y (the rocks default) on our next cluster, to save some typing.) -- Regards, Bjørn-Helge Mevik, dr. scient, Research Computing Services, University of Oslo
-
Rod Schultz authored
This change makes the code consistent with the documentation. Note that "bf_res=" will continue to be recognized for now. Patch from Rod Schultz, Bull.
-
Don Albert authored
I have implemented the changes as you suggested: using a "-dd" option to indicate that the display of the script is wanted, and setting both the "SHOW_DETAIL" and a new "SHOW_DETAIL2" flag. Since "scontrol" can be run interactively as well, I added a new "script" option to indicate that display of both the script and the details is wanted if the job is a batch job. Here are the man page updates for "man scontrol". For the "-d, --details" option: -d, --details Causes the show command to provide additional details where available. Repeating the option more than once (e.g., "-dd") will cause the show job command to also list the batch script, if the job was a batch job. For the interactive "details" option: details Causes the show command to provide additional details where available. Job information will include CPUs and NUMA memory allocated on each node. Note that on computers with hyperthreading enabled and SLURM configured to allocate cores, each listed CPU represents one physical core. Each hyperthread on that core can be allocated a separate task, so a job's CPU count and task count may differ. See the --cpu_bind and --mem_bind option descriptions in srun man pages for more information. The details option is currently only supported for the show job command. To also list the batch script for batch jobs, in addition to the details, use the script option described below instead of this option. And for the new interactive "script" option: script Causes the show job command to list the batch script for batch jobs in addition to the detail informa- tion described under the details option above. Attached are the patch file for the changes and a text file with the results of the tests I did to check out the changes. The patches are against SLURM 2.4.0-rc1. -Don Albert-
-
- 24 May, 2012 1 commit
-
-
Danny Auble authored
so acct_policy_job_runnable will always return true.
-