- 27 Mar, 2014 1 commit
-
-
Franco Broi authored
Add support for job std_in, std_out and std_err fields in Perl API.
-
- 26 Mar, 2014 2 commits
-
-
Morris Jette authored
-
David Bigagli authored
processes.
-
- 25 Mar, 2014 2 commits
-
-
Morris Jette authored
Modify hostlist expressions to accept more than two numeric ranges (e.g. "row[1-3]rack[0-8]slot[0-63]")
-
Danny Auble authored
-
- 24 Mar, 2014 4 commits
-
-
Danny Auble authored
-
Danny Auble authored
-
Morris Jette authored
Previous logic would typically do list search to find job array elements. This commit adds two hash tables for job arrays. The first is based upon the "base" job ID which is common to all tasks. The second hash table is based upon the sum of the "base" job ID plus the task ID in the array. This will substantially improve performance for handling dependencies with job arrays.
-
Morris Jette authored
When slurmctld restarted, it would not recover dependencies on job array elements and would just discard the depenency. This corrects the parsing problem to recover the dependency. The old code would print a mesage like this and discard it: slurmctld: error: Invalid dependencies discarded for job 51: afterany:47_*
-
- 22 Mar, 2014 1 commit
-
-
Morris Jette authored
When adding or removing columns to most data types (jobs, partitions, nodes, etc.) on some system types an abort is generated. This appears to be because when columns displayed change, on some systems that changes the address of "model", while on others the address does not change (like my laptops). This fix explicitly sets the last_model to NULL when the columns are changed rather than relying upon the data structure's address to change.
-
- 21 Mar, 2014 4 commits
-
-
Danny Auble authored
be setup for 1 node jobs. Here are some of the reasons from IBM... 1. PE expects it. 2. For failover, if there was some challenge or difficulty with the shared-memory method of data transfer, the protocol stack might want to go through the adapter instead. 3. For flexibility, the protocol stack might want to be able to transfer data using some variable combination of shared memory and adapter-based communication, and 4. Possibly most important, for overall performance, it might be that bandwidth or efficiency (BW per CPU cycles) might be better using the adapter resources. (An obvious case is for large messages, it might require a lot fewer CPU cycles to program the DMA engines on the adapter to move data between tasks, rather than depend on the CPU to move the data with loads and stores, or page re-mapping -- and a DMA engine might actually move the data more quickly, if it's well integrated with the memory system, as it is in the P775 case.)
-
Morris Jette authored
If srun invoked with the --multi-prog option, but no task count, then use the task count provided in the MPMD configuration file.
-
Morris Jette authored
-
David Bigagli authored
-
- 20 Mar, 2014 3 commits
-
-
Hongjia Cao authored
performance.
-
Danny Auble authored
than you really have.
-
Danny Auble authored
doesn't get chopped off.
-
- 19 Mar, 2014 2 commits
-
-
David Bigagli authored
-
Gennaro Oliva authored
a minus sign for options was intended.
-
- 18 Mar, 2014 4 commits
-
-
David Bigagli authored
-
Danny Auble authored
-
Danny Auble authored
Some of these were resulting in the state of a job not being updated correctly to tools like sview.
-
Danny Auble authored
in waiting reason ReqNodeNotAvail.
-
- 17 Mar, 2014 4 commits
-
-
David Bigagli authored
-
David Bigagli authored
-
David Bigagli authored
protocol version to be SLURM_2_5_PROTOCOL_VERSION which is the minimum supported version.
-
Danny Auble authored
-
- 16 Mar, 2014 3 commits
-
-
Morris Jette authored
Previously if the sbatch --export=NONE option was used then several Slurm environment variables were not propagated from the sbatch command (SLURM_SUBMIT_DIR, SLURM_SUBMIT_HOST, SLURM_JOB_NAME, etc.)
-
Morris Jette authored
Scheduler enhancements for reservations: When a job needs to run in reservation, but can not due to busy resources, then do not block all jobs in that partition from being scheduled, but only the jobs in that reservation.
-
Morris Jette authored
Reset a node's CpuLoad value at least once each SlurmdTimeout seconds. Previously the value would not be reset unless communications with the slurmd did not happen for at least 1/3 of the SlurmdTimeout value. That means nodes that were actively running and terminating jobs would not get the CpuLoad value reset in a timely fashion. Added a CpuLoad reset timer to prevent this.
-
- 15 Mar, 2014 3 commits
-
-
Morris Jette authored
Add logic to sleep and retry if slurm.conf can't be read. Without this, the slurmd daemons may die and when the SlurmdTimeout is reached, the nodes will be marked DOWN and their jobs will be killed. In the long term, it would be good to exit only if the read files on program startup, and the daemons keep running with old configuration on reconfiguration, but I don't have time to do that work now.
-
Morris Jette authored
Fix invalid memory reference if script returns error message for user. Previous code failed to set static variable to NULL resulting in xfree of memory previously freed elsewhere.
-
Morris Jette authored
Add support for job array options in the qsub command, in #PBS options for sbatch scripts and set the appropriate environment variables in the spank_pbs plugin (PBS_ARRAY_ID and PBS_ARRAY_INDEX). Note that Torque uses the "-t" option and PBS Pro uses the "-J" option.
-
- 14 Mar, 2014 4 commits
-
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
slurm.conf. Rebooting daemons after adding nodes to the slurm.conf is highly recommended.
-
- 13 Mar, 2014 3 commits
-
-
Danny Auble authored
-
Danny Auble authored
-
Morris Jette authored
Add a job flag to indicate when the EpilogSlurmctld us running and don't purge the job record until it completes. This lets the EpilogSlurmctld requeue the job and otherwise manage it. bugs 635 and 636
-