Monitoring
Hi,
I would like to start here a discussion on what we would like/could have to monitor our experiments. On my side, what I would find very useful would be a command which take two arguments: a period of time in hours (72hours is useful after a week end, 24h four daily monitoring, 2h for example after a meeting of lunch time, 9hours for checking the simulation at the end of the day) and the type of jobs you want to monitor (SIM, INI, POST, TRANSFERT...). Then the result would be:
Number of SIM Jobs submitted in the last 24h: 9
Number of SIM jobs running at the moment: 2
Number of SIM jobs failed in the last 24h: 4
Number of SIM jobs completed in the last 24h: 3
Total queuing time in the past 24h: 2
Total queuing time associated to failed job in the past 24h: 1.5
real and (only for multi-proc jobs) CPU time consumed in the last 24h: 1h - 100CPU
real and (only for multi-proc jobs) CPU time consumed by failed job in the last 24h: 0.1 - 40CPU
(only for multi-proc jobs) percentage of the whole available CPU time used by this experiment/in the 24h: 1%
Of course it is just a suggestion and feel free to comment on it. ( @eexarchou @obellprat @nevensf @fmassonnet @masif @dvolpi @dmanubens @jvegas @vguemas )
Regards.