Post by Willem Jan Withagen Post by Conrad Meyer
One (ugly) trick is to use multiple filesystem links to the script
interpreter, where the link names distinguish the scripts. E.g.,
$ ln /bin/sh /libexec/my_script_one_sh
$ ln /bin/sh /libexec/my_script_two_sh
$ cat myscript1.sh
Cores will be dumped with %N of "my_script_one_sh."
Neat trick... got to try and remember this.
But it is not the shell scripts that are crashing...
When running Ceph tests during Jenkins building some
programs/executables intentionally crash leaving cores.
Others (scripts) use some of these programs with correct input and
should NOT crash. And test during startup and termination that there are
no cores left.
One jenkins test run takes about 4 hours when not executed in parallel.
I'm testing 4 version multiple times a day to not have this huge list of
PRs the go thru when testing fails.
But the intentional cores and the failure cores here collide.
And when I have a core program_x.core I can't tell if they are from a
failure or from an intentional crash.
Now if could tell per program how to name its core that would allow me
to fix the problem, without overturning the complete Ceph testing
infrastructure and still keep parallel tests.
It would also help in that "regular" cores just keep going the way the
are. So other application still have the same behaviour. And are still
picked up by periodic processing.
turns out that Linux can set PR_SET_DUMPABLE. And that is actually used
in some of the Ceph applications...
Being able to set this to 0 or 1 would perhaps be a nice start as well.