- 02 Feb, 2011 12 commits
-
-
-
Moe Jette authored
job is in a pending state, then send the request directly to the slurmctld daemon and do not attempt to send the request to slurmd daemons, which are not running the job anyway.
-
Moe Jette authored
real problems with some GTK themes and is really no longer necessary
-
Moe Jette authored
This implements Moe's suggestion for NodeAddr/NodeHostname semantics, NodeName "nid#####" (this is what SLURM will refer to the node as) NodeHostName "c0-0c0s0n1" (Cray's component ID, visible only with scontrol and sview's node display) NodeAddr "###" (hexadecimal X, Y and Z coordinates, visible only with scontrol and sview's node display) For example, palu> scontrol show node nid00189 NodeName=nid00189 Arch=XE CoresPerSocket=6 CPUAlloc=0 CPUErr=0 CPUTot=24 Features=(null) Gres=(null) NodeAddr=01E NodeHostName=c1-0c0s1n1 RealMemory=32000 Sockets=4 ... Please note: ~~~~~~~~~~~~ on XE systems each two nodes (0/1 and 2/3) on a node share the same network interface and hence are located at identical Y coordinates in the torus. To make tools such as smap work with these coordinates, we use "virtual" Y coordinates, computed as y_coord = 4 * cage + cpu; This scheme mirrors the one currently used to derive node coordinates on a SeaStar/XT system. 09_Cray-hostlist.diff
-
Moe Jette authored
This is a global compatibility test to ensure that any (remote) host trying to talk to a cluster using select/cray meets the minimum requirements of supporting the required Cray data structures and hooks. As per previous patches, it may be possible to factor this out, but at this stage is working code. 06_read_config--test-for-select-cray.diff
-
Moe Jette authored
If this test is performed on a non-Cray system which tries to talk to a remote Cray system, it fails -- which it should not. ela1:1 ~>echo $SLURM_CLUSTERS palu ela1:0 ~>squeue squeue: fatal: Requested SelectType=select/cray in slurm.conf, but not running on a cray system. If looking to emulate a Cray system use --enable-cray-emulation. 02_node_select_test.diff
-
Moe Jette authored
-
Moe Jette authored
This fixes a copy&paste bug where the wrong memory area was dereferenced, found by these error messages in the logs: [2011-02-01T15:39:08] error: cray/get_select_jobinfo: jobinfo magic bad [2011-02-01T15:39:08] error: cray/get_select_jobinfo: jobinfo magic bad [2011-02-01T15:39:08] error: orphaned ALPS reservation 1022, trying to remove While at it, added tests for the return values of these functions (resv_id may be undefined if the return value is not SLURM_SUCCCESS). 01_Bug-Fix_pointer-dereference.diff
-
Moe Jette authored
-
Moe Jette authored
Fix call to gtk_table_set_row_spacing() to not be called for 4-D in addition to not being called for 3-D
-
Moe Jette authored
-
Moe Jette authored
-
- 01 Feb, 2011 5 commits
- 31 Jan, 2011 13 commits
-
-
Danny Auble authored
-
Don Lipari authored
-
Moe Jette authored
-
Moe Jette authored
-
-
Moe Jette authored
consider the job's time limit when attempting to backfill schedule. The job will just be preempted as needed at any time.
-
Moe Jette authored
Priority.
-
Moe Jette authored
-
Moe Jette authored
It needs to allocate CPUs, Cores or Sockets too
-
Moe Jette authored
-
Moe Jette authored
-
Moe Jette authored
for improved performance
-
Danny Auble authored
Updated configure option "--enable-cray-emulation" (still under development) to emulate a cray XT/XE system, and auto-detect a real Cray XT/XE systems (removed no longer needed --enable-cray configure option). Building on native Cray systems requires the cray-MySQL-devel-enterprise rpm and expat XML parser library/headers. -This line, and those below, will be ignored-- M configure M Makefile.in M contribs/torque/Makefile.in M contribs/slurmdb-direct/Makefile.in M contribs/Makefile.in M contribs/sjobexit/Makefile.in M contribs/perlapi/libslurmdb/Makefile.in M contribs/perlapi/Makefile.in M contribs/perlapi/libslurm/Makefile.in M contribs/phpext/Makefile.in M contribs/pam/Makefile.in M src/sbcast/Makefile.in M src/plugins/select/Makefile.in M src/plugins/select/bluegene/Makefile.in M src/plugins/select/bluegene/block_allocator/Makefile.in M src/plugins/select/bluegene/plugin/Makefile.in M src/plugins/select/bgq/Makefile.in M src/plugins/select/linear/Makefile.in M src/plugins/select/cons_res/Makefile.in A src/plugins/select/cray/basil_alps.h A src/plugins/select/cray/basil_interface.h M src/plugins/select/cray/Makefile.in A src/plugins/select/cray/libalps A src/plugins/select/cray/libalps/nodespec.c A src/plugins/select/cray/libalps/Makefile.in A src/plugins/select/cray/libalps/do_release.c A src/plugins/select/cray/libalps/basil_request.c A src/plugins/select/cray/libalps/popen2.c A src/plugins/select/cray/libalps/parser_common.c A src/plugins/select/cray/libalps/basil_mysql_routines.c A src/plugins/select/cray/libalps/memory_handling.c A src/plugins/select/cray/libalps/do_confirm.c A src/plugins/select/cray/libalps/atoul.c A src/plugins/select/cray/libalps/parser_basil_1.0.c A src/plugins/select/cray/libalps/parser_basil_1.1.c A src/plugins/select/cray/libalps/parser_basil_3.1.c A src/plugins/select/cray/libalps/do_query.c A src/plugins/select/cray/libalps/Makefile.am A src/plugins/select/cray/libalps/parser_internal.h A src/plugins/select/cray/libalps/do_reserve.c M src/plugins/select/cray/Makefile.am A src/plugins/select/cray/basil_interface.c M src/plugins/select/cray/select_cray.c M src/plugins/crypto/Makefile.in M src/plugins/crypto/openssl/Makefile.in M src/plugins/crypto/munge/Makefile.in M src/plugins/priority/basic/Makefile.in M src/plugins/priority/Makefile.in M src/plugins/priority/multifactor/Makefile.in M src/plugins/Makefile.in M src/plugins/mpi/none/Makefile.in M src/plugins/mpi/Makefile.in M src/plugins/mpi/mpich1_p4/Makefile.in M src/plugins/mpi/mpichgm/Makefile.in M src/plugins/mpi/mpichmx/Makefile.in M src/plugins/mpi/mvapich/Makefile.in M src/plugins/mpi/openmpi/Makefile.in M src/plugins/mpi/lam/Makefile.in M src/plugins/mpi/mpich1_shmem/Makefile.in M src/plugins/sched/Makefile.in M src/plugins/sched/wiki/Makefile.in M src/plugins/sched/wiki/get_nodes.c M src/plugins/sched/wiki2/Makefile.in M src/plugins/sched/wiki2/get_nodes.c M src/plugins/sched/builtin/Makefile.in M src/plugins/sched/hold/Makefile.in M src/plugins/sched/backfill/Makefile.in M src/plugins/checkpoint/none/Makefile.in M src/plugins/checkpoint/aix/Makefile.in M src/plugins/checkpoint/Makefile.in M src/plugins/checkpoint/blcr/Makefile.in M src/plugins/checkpoint/ompi/Makefile.in M src/plugins/proctrack/cgroup/Makefile.in M src/plugins/proctrack/aix/Makefile.in M src/plugins/proctrack/rms/Makefile.in M src/plugins/proctrack/lua/Makefile.in M src/plugins/proctrack/Makefile.in M src/plugins/proctrack/linuxproc/Makefile.in M src/plugins/proctrack/pgid/Makefile.in M src/plugins/proctrack/sgi_job/Makefile.in M src/plugins/jobcomp/filetxt/Makefile.in M src/plugins/jobcomp/none/Makefile.in M src/plugins/jobcomp/Makefile.in M src/plugins/jobcomp/script/Makefile.in M src/plugins/jobcomp/mysql/Makefile.in M src/plugins/jobcomp/pgsql/Makefile.in M src/plugins/job_submit/lua/Makefile.in M src/plugins/job_submit/Makefile.in M src/plugins/job_submit/logging/Makefile.in M src/plugins/job_submit/defaults/Makefile.in M src/plugins/job_submit/partition/Makefile.in M src/plugins/jobacct_gather/linux/Makefile.in M src/plugins/jobacct_gather/none/Makefile.in M src/plugins/jobacct_gather/aix/Makefile.in M src/plugins/jobacct_gather/Makefile.in M src/plugins/gres/Makefile.in M src/plugins/gres/nic/Makefile.in M src/plugins/gres/gpu/Makefile.in M src/plugins/auth/none/Makefile.in M src/plugins/auth/Makefile.in M src/plugins/auth/authd/Makefile.in M src/plugins/auth/munge/Makefile.in M src/plugins/switch/elan/Makefile.in M src/plugins/switch/none/Makefile.in M src/plugins/switch/federation/Makefile.in M src/plugins/switch/Makefile.in M src/plugins/task/none/Makefile.in M src/plugins/task/Makefile.in M src/plugins/task/affinity/Makefile.in M src/plugins/preempt/none/Makefile.in M src/plugins/preempt/partition_prio/Makefile.in M src/plugins/preempt/qos/Makefile.in M src/plugins/preempt/Makefile.in M src/plugins/topology/none/Makefile.in M src/plugins/topology/tree/Makefile.in M src/plugins/topology/node_rank/Makefile.in M src/plugins/topology/3d_torus/Makefile.in M src/plugins/topology/Makefile.in M src/plugins/accounting_storage/filetxt/Makefile.in M src/plugins/accounting_storage/none/Makefile.in M src/plugins/accounting_storage/Makefile.in M src/plugins/accounting_storage/mysql/Makefile.in M src/plugins/accounting_storage/pgsql/Makefile.in M src/plugins/accounting_storage/common/Makefile.in M src/plugins/accounting_storage/slurmdbd/Makefile.in M src/Makefile.in M src/sshare/Makefile.in M src/strigger/Makefile.in M src/sattach/Makefile.in M src/srun/Makefile.in M src/common/node_conf.h M src/common/read_config.c M src/common/Makefile.am M src/common/Makefile.in M src/common/node_select.c D src/common/basil_resv_conf.c D src/common/basil_resv_conf.h M src/sprio/Makefile.in M src/sacct/Makefile.in M src/sview/Makefile.in M src/sview/job_info.c M src/sstat/Makefile.in M src/sreport/Makefile.in M src/smap/Makefile.in M src/scontrol/Makefile.in M src/sacctmgr/Makefile.in M src/database/Makefile.in M src/sbatch/Makefile.in M src/slurmd/slurmstepd/Makefile.in M src/slurmd/slurmstepd/mgr.c M src/slurmd/Makefile.in M src/slurmd/slurmd/Makefile.in M src/slurmd/common/Makefile.in M src/squeue/Makefile.in M src/scancel/Makefile.in M src/slurmctld/Makefile.in M src/slurmctld/proc_req.c D src/slurmctld/basil_interface.c D src/slurmctld/basil_interface.h M src/slurmctld/controller.c M src/slurmctld/read_config.c M src/slurmctld/node_scheduler.c M src/slurmctld/Makefile.am M src/api/Makefile.in M src/srun_cr/Makefile.in M src/slurmdbd/Makefile.in M src/salloc/Makefile.in M src/salloc/opt.c M src/salloc/salloc.c M src/sinfo/Makefile.in M src/db_api/Makefile.in M testsuite/slurm_unit/Makefile.in M testsuite/slurm_unit/common/Makefile.in M testsuite/slurm_unit/api/Makefile.in M testsuite/slurm_unit/api/manual/Makefile.in M testsuite/Makefile.in M testsuite/expect/Makefile.in M auxdir/Makefile.in M auxdir/x_ac_cray.m4 M config.h.in M configure.ac M doc/Makefile.in M doc/html/Makefile.in M doc/man/Makefile.in M NEWS
-
- 30 Jan, 2011 1 commit
-
-
Moe Jette authored
16k to 24k
-
- 29 Jan, 2011 9 commits
-
-
Moe Jette authored
-
Moe Jette authored
This disables srun support: * on native Cray systems (having 'apbasil' available) it is currently not possible, since it would require to have a slurmd on each compute node -- which at least at the moment is still done by the ALPS daemons; * if running srun from a remote host and trying to launch a job on a remote Cray host, the same consideration applies; * trying to use Cray-enabled srun (HAVE_CRAY) to launch a job on a non-Cray system is allowed to proceed.srun: disable srun on local/remote Cray systems 14_srun.diff
-
Moe Jette authored
On Cray, wait_job means to confirm the already existing ALPS reservation. This is handled already: * for salloc by select_g_job_ready() - hence no need to call again; * for batch jobs it is done in the stepdmanager. Hence just print a warning to the user. 13_scontrol-no-wait_job.diff
-
Moe Jette authored
This adds support for execution of salloc on a local Cray system, disabling node sharing (still not supported on XT/XE). It further disables running salloc within salloc, as it leads to errors: since Cray uses process group / PAGG IDs for tracking its reservations, running salloc from within salloc invariably leads to a ALPS resource allocation error. Thirdly, it disable Cray node allocation on non-Cray systems, since this requires that the host on which salloc spawns the shell process is capable of Cray task launch. If it is not, then the remote slurmctld will reserve the requested nodes, but the local host runninc salloc will neither be able to confirm the ALPS reservation (due to the absence of a local apbasil command), nor would it be able to run jobs on the compute nodes. To distinguish this case from general task launch (we use a frontend host where salloc could end up running jobs on different clusters, depending on the value exported via $SLURM_CONF), the following condition is tested: * Cray build support has been enabled (HAVE_CRAY); * the loaded slurm.conf uses select/cray (required on Cray hosts); * the local host does not have support for apbasil (HAVE_NATIVE_CRAY undefined). Since the 'apbasil' command is only available on native Cray systems, this combination of conditions seems sufficient to prevent accidentally using salloc on a host which does not support it. (For sbatch the case is different, since the job script runs on the remote host.) 11_salloc.diff done with minor change for Cray emulation
-
Moe Jette authored
This puts the Basil inventory immediately before each (backfill) schedule. Having considered multiple alternatives, this is the most robust and least wasteful solution. The reason is that ALPS keeps internal node state, which can be changed * by the administrator (xtprocadmin), * by the node health checker programs (setting some nodes into 'suspect'), * by ALPS itself. Tracking this periodically, e.g. every HealthCheckInterval, may mean to miss some state changes. The result would not be a crash, but a subsequently failed ALPS reservation, which would require to undo some of the slurm state. Also added inventory to plugin/sched/wiki and wiki2 at get_node time 09_Cray-INVENTORY-directly-before-schedule.diff
-
Moe Jette authored
-
Moe Jette authored
-
Moe Jette authored
select/cray: update compile-time and runtime support for Cray build These changes update build support for Cray XT/XE: 1. renamed '--cray-xt' into '--cray' since also XE systems are supported; 2. autoconf rules to cover the various possible build cases: a) --enable-cray=off: HAVE_CRAY/HAVE_NATIVE_CRAY undefined, b) --enable-cray=on: HAVE_CRAY defined b1) local host is a native Cray system: HAVE_NATIVE_CRAY defined (requires installation of mysql-devel and libexpat-devel packages), b2) local host is not a native Cray system: the conditionally built parts (basil_interface.c, libalps.la) are not built; 3. updated configure logic: - since Cray support depends on mySQL, reordered tests in configure.ac, - reordered logic with regard to changes in (2), - an AM_CONDITIONAL to build native-Cray parts conditionally, - updated configure messages (XT/XE); 4. run-time read_conf test to ensure use of select/cray is properly supported, 5. an update of the NEWS file due to the change in (1) ==> may have a conflict in case you have a locally-updated copy. I have compile-tested the three possible scenarios in (2).
-
Moe Jette authored
03_Bug-fix_slurmctld-swap-both-NodeAddr-and-NodeHostname-when-reordering.diff
-