Discussion:
Debugging signal 11
Johannes Totz
2021-04-16 23:51:01 UTC
Permalink
Hi there,

My init(8) is crashing with a signal 11.

I've added a breakpoint() in kern_sig.c:


static int
issignal(struct thread *td)
{
// [snip]

case (intptr_t)SIG_DFL:
/*
* Don't take default actions on system processes.
*/
if (p->p_pid <= 1) {
#ifdef DIAGNOSTIC
/*
* Are you sure you want to ignore SIGSEGV
* in init? XXX
*/
printf("Process (pid %lu) got signal %d\n",
(u_long)p->p_pid, sig);

breakpoint(); // added by me
#endif
break; /* == ignore */
}

// [snip]
}


That breaks to DDB where I can call dump. So far so good.
But how do I get back to the stack(trace) and instruction that caused
the segv? Either in DDB or KGDB?


Thanks,

Johannes
Lucas Nali de Magalhães
2021-04-18 18:53:37 UTC
Permalink
Hi.

There are a few problems after the end of your email. I inlined them.
Post by Johannes Totz
My init(8) is crashing with a signal 11.
Crossing with the end, init is the mother of all processes. A longer
explanation is in the man page you cited, even. An explanation of signal 11
can be found by googling it and also in
https://www.cyberciti.biz/tips/segmentation-fault-on-linux-unix.html a text
I found googling. Short story: init is well tested, so it must be hardware fault.
Post by Johannes Totz
static int
issignal(struct thread *td)
{
// [snip]
/*
* Don't take default actions on system processes.
*/
if (p->p_pid <= 1) {
#ifdef DIAGNOSTIC
/*
* Are you sure you want to ignore SIGSEGV
* in init? XXX
*/
printf("Process (pid %lu) got signal %d\n",
(u_long)p->p_pid, sig);
breakpoint(); // added by me
#endif
break; /* == ignore */
}
// [snip]
}
That breaks to DDB where I can call dump. So far so good.
This also isn't the usual. Debugging a running process is possible but
the process you used is the wrong one. Debugging init, OTOH, is a
completely different story: init is the first process and is the most
important process of any unix. The actual command varies from
debugger to debugger but in gdb, "attach pid" may do the trick for
you. You will need to be extra cautious because of you are aiming init.
Ideally, init is the process supposed to catch the signals and keep the
system running. So a break into it may cause your system to crash.
Post by Johannes Totz
But how do I get back to the stack(trace) and instruction that caused the segv? Either in DDB or KGDB?
"bt" is a shortcut for "backtrace" and is the command to get a stack trace
in gdb. BTW, "attach" and "bt" are two of the most basic debugger commands.
--
rollingbits — 📧 ***@icloud.com 📧 ***@gmail.com 📧 ***@yahoo.com 📧 ***@terra.com.br 📧 ***@globo.com
Ian Lepore
2021-04-18 19:18:00 UTC
Permalink
Post by Lucas Nali de Magalhães
This also isn't the usual. Debugging a running process is possible but
the process you used is the wrong one. Debugging init, OTOH, is a
completely different story: init is the first process and is the most
important process of any unix. The actual command varies from
debugger to debugger but in gdb, "attach pid" may do the trick for
you. You will need to be extra cautious because of you are aiming init.
Ideally, init is the process supposed to catch the signals and keep
the system running. So a break into it may cause your system to crash.
Given that it is init that is segfaulting, how to you propose that the
OP lauch gdb in order to do an attach to init? In other words: there
is a reason the OP is trying to use the kernel debugger to examine
what's going on here.

-- Ian
Warner Losh
2021-04-18 19:37:50 UTC
Permalink
Post by Ian Lepore
Post by Lucas Nali de Magalhães
This also isn't the usual. Debugging a running process is possible but
the process you used is the wrong one. Debugging init, OTOH, is a
completely different story: init is the first process and is the most
important process of any unix. The actual command varies from
debugger to debugger but in gdb, "attach pid" may do the trick for
you. You will need to be extra cautious because of you are aiming init.
Ideally, init is the process supposed to catch the signals and keep
the system running. So a break into it may cause your system to crash.
Given that it is init that is segfaulting, how to you propose that the
OP lauch gdb in order to do an attach to init? In other words: there
is a reason the OP is trying to use the kernel debugger to examine
what's going on here.
Yea. I'm seeing this too on a new, lab machine. Not sure what is going on.
Since it's a new machine, I'm working through other, higher priority tasks
first. For me, it's only when I try a reboot -C that I see it... normal
reboot doesn't trigger it. And I don't see it on other models...

Warner

-- Ian
Post by Ian Lepore
_______________________________________________
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
Lucas Nali de Magalhães
2021-04-18 20:06:54 UTC
Permalink
Post by Ian Lepore
Post by Lucas Nali de Magalhães
This also isn't the usual. Debugging a running process is possible but
the process you used is the wrong one. Debugging init, OTOH, is a
completely different story: init is the first process and is the most
important process of any unix. The actual command varies from
debugger to debugger but in gdb, "attach pid" may do the trick for
you. You will need to be extra cautious because of you are aiming init.
Ideally, init is the process supposed to catch the signals and keep
the system running. So a break into it may cause your system to crash.
Given that it is init that is segfaulting, how to you propose that the
OP lauch gdb in order to do an attach to init? In other words: there
is a reason the OP is trying to use the kernel debugger to examine
what's going on here.
First the OP was able to modify init. Then it was asked the command
to do a stack trace. Thus the OP clearly hasn't the full knowledge of
the procedure and the risks. Besides this, kgdb is based on gdb. I
thought they should know.
--
rollingbits — 📧 ***@icloud.com 📧 ***@gmail.com 📧 ***@yahoo.com 📧 ***@terra.com.br 📧 ***@globo.com
Johannes Totz
2021-04-20 00:53:44 UTC
Permalink
Post by Lucas Nali de Magalhães
Post by Ian Lepore
Post by Lucas Nali de Magalhães
This also isn't the usual. Debugging a running process is possible but
the process you used is the wrong one. Debugging init, OTOH, is a
completely different story: init is the first process and is the most
important process of any unix. The actual command varies from
debugger to debugger but in gdb, "attach pid" may do the trick for
you. You will need to be extra cautious because of you are aiming init.
Ideally, init is the process supposed to catch the signals and keep
the system running. So a break into it may cause your system to crash.
Given that it is init that is segfaulting, how to you propose that the
OP lauch gdb in order to do an attach to init? In other words: there
is a reason the OP is trying to use the kernel debugger to examine
what's going on here.
First the OP was able to modify init. Then it was asked the command
to do a stack trace. Thus the OP clearly hasn't the full knowledge of
the procedure and the risks. Besides this, kgdb is based on gdb. I
thought they should know.
Hi Lucas and others,
thanks for responding.

I didn't modify init, I've been messing around in the kernel. And that
messing around makes init crash, so totally my own fault. But I would
have liked it to work instead, thusly trying to debug why the crash happens.

bt in kgdb gives me (beware of line break):

__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
55 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n"
(offsetof(struct pcpu,
(kgdb) bt
#0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
#1 doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:371
#2 0xffffffff804d700a in db_dump (dummy=<optimized out>,
dummy2=<unavailable>, dummy3=<unavailable>, dummy4=<unavailable>)
at /usr/src/sys/ddb/db_command.c:574
#3 0xffffffff804d6dcf in db_command (last_cmdp=<optimized out>,
cmd_table=<optimized out>, dopager=1) at /usr/src/sys/ddb/db_command.c:481
#4 0xffffffff804d6b3d in db_command_loop () at
/usr/src/sys/ddb/db_command.c:534
#5 0xffffffff804d9fc8 in db_trap (type=<optimized out>, code=<optimized
out>) at /usr/src/sys/ddb/db_main.c:252
#6 0xffffffff80744106 in kdb_trap (type=3, code=0,
tf=0xfffffe0021c75a20) at /usr/src/sys/kern/subr_kdb.c:693
#7 0xffffffff809d64a1 in trap (frame=0xfffffe0021c75a20) at
/usr/src/sys/amd64/amd64/trap.c:583
#8 <signal handler called>
#9 0xffffffff806feb45 in issignal (td=0xfffff80002213740) at
/usr/src/sys/amd64/include/cpufunc.h:65
#10 cursig (td=0xfffff80002213740) at /usr/src/sys/kern/kern_sig.c:599
#11 0xffffffff8075a428 in ast (framep=0xfffffe0021c75c00) at
/usr/src/sys/kern/subr_trap.c:329
#12 0xffffffff809b2979 in doreti_ast () at
/usr/src/sys/amd64/amd64/exception.S:1150
#13 0x0000000000000000 in ?? ()

...which is the stacktrace of the fault handler, not the
instruction/function that caused the fault.

Select frame #11 and:
p *framep

and:
disassemble tf_rip

But that would have been too easy...
Konstantin Belousov
2021-04-20 07:40:33 UTC
Permalink
Post by Johannes Totz
Post by Lucas Nali de Magalhães
Post by Ian Lepore
Post by Lucas Nali de Magalhães
This also isn't the usual. Debugging a running process is possible but
the process you used is the wrong one. Debugging init, OTOH, is a
completely different story: init is the first process and is the most
important process of any unix. The actual command varies from
debugger to debugger but in gdb, "attach pid" may do the trick for
you. You will need to be extra cautious because of you are aiming init.
Ideally, init is the process supposed to catch the signals and keep
the system running. So a break into it may cause your system to crash.
Given that it is init that is segfaulting, how to you propose that the
OP lauch gdb in order to do an attach to init? In other words: there
is a reason the OP is trying to use the kernel debugger to examine
what's going on here.
First the OP was able to modify init. Then it was asked the command
to do a stack trace. Thus the OP clearly hasn't the full knowledge of
the procedure and the risks. Besides this, kgdb is based on gdb. I
thought they should know.
Hi Lucas and others,
thanks for responding.
I didn't modify init, I've been messing around in the kernel. And that
messing around makes init crash, so totally my own fault. But I would have
liked it to work instead, thusly trying to debug why the crash happens.
__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
55 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct
pcpu,
(kgdb) bt
#0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
#1 doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:371
#2 0xffffffff804d700a in db_dump (dummy=<optimized out>,
dummy2=<unavailable>, dummy3=<unavailable>, dummy4=<unavailable>)
at /usr/src/sys/ddb/db_command.c:574
#3 0xffffffff804d6dcf in db_command (last_cmdp=<optimized out>,
cmd_table=<optimized out>, dopager=1) at /usr/src/sys/ddb/db_command.c:481
#4 0xffffffff804d6b3d in db_command_loop () at
/usr/src/sys/ddb/db_command.c:534
#5 0xffffffff804d9fc8 in db_trap (type=<optimized out>, code=<optimized
out>) at /usr/src/sys/ddb/db_main.c:252
#6 0xffffffff80744106 in kdb_trap (type=3, code=0, tf=0xfffffe0021c75a20)
at /usr/src/sys/kern/subr_kdb.c:693
#7 0xffffffff809d64a1 in trap (frame=0xfffffe0021c75a20) at
/usr/src/sys/amd64/amd64/trap.c:583
#8 <signal handler called>
#9 0xffffffff806feb45 in issignal (td=0xfffff80002213740) at
/usr/src/sys/amd64/include/cpufunc.h:65
#10 cursig (td=0xfffff80002213740) at /usr/src/sys/kern/kern_sig.c:599
#11 0xffffffff8075a428 in ast (framep=0xfffffe0021c75c00) at
/usr/src/sys/kern/subr_trap.c:329
#12 0xffffffff809b2979 in doreti_ast () at
/usr/src/sys/amd64/amd64/exception.S:1150
#13 0x0000000000000000 in ?? ()
...which is the stacktrace of the fault handler, not the
instruction/function that caused the fault.
p *framep
disassemble tf_rip
But that would have been too easy...
There is a tunable/sysctl machdep.uprintf_signal which makes kernel print
some information on trap, either on console or ctty. At least, it will
give you the $pc at the time of crash.

Intent of the functionality is exactly to get some data from trap when
debugger cannot be attached.

Continue reading on narkive:
Loading...