- 07 May, 2015 15 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Danny Auble authored
-
Danny Auble authored
-
Morris Jette authored
-
Morris Jette authored
-
=Veronique Legrand authored
-
Nicolas Joly authored
-
Morris Jette authored
The previous logic could deadlock with the log functions pthead_atfork() functions depending upon the call order.
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Morris Jette authored
Apply the limit to SPANK plugins. Avoid invalid memory reference is slurmdbd calls slurm_get_prolog_timeout() Updates to commit 659ae598
-
Danny Auble authored
-
- 06 May, 2015 10 commits
-
-
Morris Jette authored
There were many calls to slurm_list_iterator_create() without the matching slurm_list_iterator_destroy(), which would result in memory leaks.
-
Morris Jette authored
-
Morris Jette authored
cancelling job ID that could not be found was treated like an error when it should just return success.
-
Morris Jette authored
Add re-entrant versions of glibc time functions (e.g. localtime) to Slurm in order to eliminate rare deadlock of slurmstepd fork and exec calls. bug 1638
-
Morris Jette authored
-
Danny Auble authored
14.11 worked fine, but there were changes in 15.08 that weren't addressed in the original patch.
-
Morris Jette authored
-
Danny Auble authored
utilization.
-
Danny Auble authored
random crashing in db2 when the slurmctld is exiting. Signed-off-by: Danny Auble <da@schedmd.com>
-
David Bigagli authored
-
- 05 May, 2015 12 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
Dead assignments, possible zero divide, and wrong data types
-
David Bigagli authored
Minor typo fixes
-
Christopher Bottoms authored
-
Morris Jette authored
Modify all tests to use cancel_job() rather than "exec scancel ..." so that we can do better error handling all in one place. There were several places in the globals module that printed "FAILURE" and set a local "exit_code" variable. I added "global exit_code" to those functions in hopes of those failures being reflected in the test's exit code.
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
Conflicts: src/plugins/select/cons_res/job_test.c
-
Morris Jette authored
-
Morris Jette authored
Mostly replacing white-space with tabs, also some minor movement of logic
-
Morris Jette authored
Also includes some cosmetic changes Initialize variables to avoid invalid memory free
-
- 04 May, 2015 3 commits
-
-
Ryan Cox authored
here is information about this patch and the reasons for it: http://tech.ryancox.net/2015/04/caller-id-handling-ssh-launched-processes-in-slurm.html) As discussed previously, here is a patch against master (branched this morning at 709f6504). It works though I'm sure it has some rough edges that you'll find. I had to export a few symbols that weren't there from stepd_api.[ch]. A lot of the code I had to modify is new territory for me so it's likely I made many mistakes. There are a few minor things I might end up wanting to change (I'm not exactly in love with some of the variable or function names I chose though I can live with them). I might make a few minor tweaks to the pam module as well but it won't affect the RPC code. Currently the README is written like a manpage. I might turn it into a man page and say "read the manpage" in the README. Here is an excerpt from the README that states how decisions are made: 1) Check the local stepds for a count of jobs owned by the non-root user a) If none, deny (option action_no_jobs) b) If only one, adopt the process into that job c) If multiple, continue 2) Determine src/dst IP/port of socket 3) Issue callerid RPC to slurmd at IP address of source a) If the remote slurmd can identify the source job, adopt into that job b) If not, continue 4) Pick a random local job from the user to adopt into (option action_unknown) I tried to document to thoroughly document the code, so hopefully it makes sense. Also, I noticed that one of the stepd functions returns a uid_t which is set to -1 on error. The problem with that is that Linux's uid_t is uint32_t. One area of concern in the code is the stepd calls in pam_slurm_adopt.c code. I hope I'm doing enough error handling there, but maybe not. What happens if a step is completing or if the step data is still around even though it's actually dead? The code to actually adopt processes is currently a no-op. That will depend on having the allocation step code added. I haven't checked yet to see if all the relevant plugins (proctrack, jobacct_gather, etc.) have hooks to add a new process to the plugin. If not, it will have to be added as well. Lastly, I exceed 80 characters on lines with user-visible strings since Slurm follows the Linux kernel coding style. Chapter 2 of https://www.kernel.org/doc/Documentation/CodingStyle says "never break user-visible strings... because that breaks the ability to grep for them" (which I have wished Slurm followed, by the way, since I have hit that issue). I know in the past that you wanted even those lines to be wrapped but I figured I would ask if anything has changed :)
-
Morris Jette authored
-
Morris Jette authored
-