Skipping of arbitrary jobs if certain jobs are completed/failed
Hello @dbeltran and @bdepaula,
Summary of the feature
Is there a possibility to skip arbitrary jobs if certain dependencies are fulfilled, e.g. if a previous job has failed? If I understand the "skippable" keyword correctly, this only works for members/chunks.
Describe your use case
I would like to check if some required input files are already present. If yes, start the simulation, if no create them from scratch.
My intuitive approach uses three jobs:
- A: check if input is there (lightweight, can run e.g. on the login node of the cluster)
- B: create the input data (some computational power required, runs in the batch system)
- C: simulation
With the straight dependencies A -> B -> C. However, if A succeeds, I can just skip B, because there is no need for B anymore.
Describe the problem
I see four solutions for this
- Make a configurable switch for the user to use existing or create new files. Quite elegant for the resulting workflow, slightly more complicated for the user.
- Combine jobs A and B into one job, that does the check. However, this introduces some overhead of submitting a job to the batch system that does barely anything if the files do exist.
- Make job A fail if the files do not exist and set the dependency A -> B to the status FAILED. With a weak dependency of C on both A and B, C will be started as soon as A is successful or (if not), after B is successful. But B will always be started, just to the quit without doing anything. More complicated than 1 with few benefit in runtime.
-
>> new feature <<
: make a dependency A -> B "if and only if A has FAILED" and combine it with the weak dependencies as in 2.
Is there an alternative way to do something like this, or is this already implemented and I did not find out how?