Meowthink
2018-08-28 15:47:20 UTC
Hi Peeter,
https://svnweb.freebsd.org/base/head/sys/x86/x86/cpu_machdep.c?r1=336763&r2=336762&pathrev=336763
works around the mwait issue, i.e. it sets
sysctl machdep.idle_mwait=0
sysctl machdep.idle=hlt
I think that shall not apply to 2400G, which is model 11h not 1h.
machdep.idle: acpi
machdep.idle_available: spin, mwait, hlt, acpi
machdep.idle_apl31: 0
machdep.idle_mwait: 1
It's much easier to prove your problem, since it's reproducible. But
mine was so random to catch...
Anyway, it seems like the IRET issue [1] is still not fixed? I'm
highly doubt that my issue is this related because my system became
significantly more stable since I stop that irq storm from bluetooth
module - Though it still panics occasionally.
So could anybody tell, what's the difference between FreeBSD
workaround [2] and the DragonflyBSD one?
# cpucontrol -m 0x8b /dev/cpuctl0
MSR 0x8b: 0x00000000 0x0810100b
Hence I should use mwait?
Still don't know what should I set. Any idea?
If I was you, I'd play around with the sysctls mentioned above and see
if it helps. Start with disabling both mwait and hlt, perhaps
machdep.idle=spin
machdep.idle_mwait=0
(assuming that 'spin' means hlt will not used) and then if that does
not lead to a panic, try enabling mwait. I can't test 2400G since I
don't have it any more. I booted FreeBSD a couple of times but did not
run it over long periods of time.
It works!
After hours and hours of different stressing. I got 8 copies of gcc
built without any problem.
But it costs lots of power and the fan will become very annoying. As
so, I don't think I'll test long term stability with this state.
machdep.idle: acpi -> spin
- will add ~5W, maybe some deeper C states disabled?
machdep.idle_mwait: 1 -> 0
- will add another ~50W, CPUs are working insomniac.
I tried to set machdep.idle_mwait to 1, or machdep.idle to mwait. Both
failed with panics when I start building gcc pass by pass.
I'm pretty sure mwait will cause problem, as once I experienced a
panic immediately after I issued the sysctl command (the 2nd dump info
followed)
So my next step will be hlt. Still need some time, though.
meowthink
------------------------------------------------------------------------
machdep.idle=mwait
panic: ffs_syncvnode: syncing truncated data.
cpuid = 7
KDB: stack backtrace:
#0 0xffffffff80b414b7 at kdb_backtrace+0x67
#1 0xffffffff80afa9e7 at vpanic+0x177
#2 0xffffffff80afa863 at panic+0x43
#3 0xffffffff80dcddc4 at ffs_syncvnode+0x5a4
#4 0xffffffff80dcc915 at ffs_fsync+0x25
#5 0xffffffff810ffcb2 at VOP_FSYNC_APV+0x82
#6 0xffffffff80bc3a62 at sched_sync+0x412
#7 0xffffffff80abd813 at fork_exit+0x83
#8 0xffffffff80f5cc7e at fork_trampoline+0xe
------------------------------------------------------------------------
machdep.idle_mwait=1
Fatal trap 9: general protection fault while in kernel mode
cpuid = 7; apic id = 07
instruction pointer = 0x20:0xffffffff80e094fe
stack pointer = 0x0:0xfffffe081e5df9e0
frame pointer = 0x0:0xfffffe081e5dfa50
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 17 (dom0)
trap number = 9
panic: general protection fault
cpuid = 7
KDB: stack backtrace:
#0 0xffffffff80b414b7 at kdb_backtrace+0x67
#1 0xffffffff80afa9e7 at vpanic+0x177
#2 0xffffffff80afa863 at panic+0x43
#3 0xffffffff80f7c14f at trap_fatal+0x35f
#4 0xffffffff80f7b70e at trap+0x5e
#5 0xffffffff80f5bccc at calltrap+0x8
#6 0xffffffff80e07a17 at vm_pageout+0x87
#7 0xffffffff80abd813 at fork_exit+0x83
#8 0xffffffff80f5cc7e at fork_trampoline+0xe
Unfortunately, that's for Ryzens family 17h model 00h-0fh, whereas my
Ryzen 5 2400G's model is 11h.
On the microcode. It shall be updated through UEFI/BIOS updates. I
think mine is now PinnaclePI-AM4_1.0.0.4 with microcode patchlevel
0x810100b.
Seems like ... the only thing I can do is sit down and wait?
The revisionRyzen 5 2400G's model is 11h.
On the microcode. It shall be updated through UEFI/BIOS updates. I
think mine is now PinnaclePI-AM4_1.0.0.4 with microcode patchlevel
0x810100b.
Seems like ... the only thing I can do is sit down and wait?
https://svnweb.freebsd.org/base/head/sys/x86/x86/cpu_machdep.c?r1=336763&r2=336762&pathrev=336763
works around the mwait issue, i.e. it sets
sysctl machdep.idle_mwait=0
sysctl machdep.idle=hlt
machdep.idle: acpi
machdep.idle_available: spin, mwait, hlt, acpi
machdep.idle_apl31: 0
machdep.idle_mwait: 1
Now it may or may not relate to your problem, but it appears that
Ryzen 2400G also has another issue with HLT, see the DragonFly bug
report
https://bugs.dragonflybsd.org/issues/3131
Thanks a lot for that info.Ryzen 2400G also has another issue with HLT, see the DragonFly bug
report
https://bugs.dragonflybsd.org/issues/3131
It's much easier to prove your problem, since it's reproducible. But
mine was so random to catch...
Anyway, it seems like the IRET issue [1] is still not fixed? I'm
highly doubt that my issue is this related because my system became
significantly more stable since I stop that irq storm from bluetooth
module - Though it still panics occasionally.
So could anybody tell, what's the difference between FreeBSD
workaround [2] and the DragonflyBSD one?
which AMD is aware of and is possibly working on, but it may not have
appeared in the errata yet. The bug report says that until this is
fixed, the workaround is to also disable HLT in cpu_idle. I am not
sure what is the correct value for the sysctl on FreeBSD, perhaps
sysctl machdep.idle=0
or some other value?
In the meantime, I have this microcodeappeared in the errata yet. The bug report says that until this is
fixed, the workaround is to also disable HLT in cpu_idle. I am not
sure what is the correct value for the sysctl on FreeBSD, perhaps
sysctl machdep.idle=0
or some other value?
# cpucontrol -m 0x8b /dev/cpuctl0
MSR 0x8b: 0x00000000 0x0810100b
Hence I should use mwait?
Still don't know what should I set. Any idea?
if it helps. Start with disabling both mwait and hlt, perhaps
machdep.idle=spin
machdep.idle_mwait=0
(assuming that 'spin' means hlt will not used) and then if that does
not lead to a panic, try enabling mwait. I can't test 2400G since I
don't have it any more. I booted FreeBSD a couple of times but did not
run it over long periods of time.
After hours and hours of different stressing. I got 8 copies of gcc
built without any problem.
But it costs lots of power and the fan will become very annoying. As
so, I don't think I'll test long term stability with this state.
machdep.idle: acpi -> spin
- will add ~5W, maybe some deeper C states disabled?
machdep.idle_mwait: 1 -> 0
- will add another ~50W, CPUs are working insomniac.
I tried to set machdep.idle_mwait to 1, or machdep.idle to mwait. Both
failed with panics when I start building gcc pass by pass.
I'm pretty sure mwait will cause problem, as once I experienced a
panic immediately after I issued the sysctl command (the 2nd dump info
followed)
So my next step will be hlt. Still need some time, though.
Cheers
Peeter
--
Cheers,Peeter
--
meowthink
------------------------------------------------------------------------
machdep.idle=mwait
panic: ffs_syncvnode: syncing truncated data.
cpuid = 7
KDB: stack backtrace:
#0 0xffffffff80b414b7 at kdb_backtrace+0x67
#1 0xffffffff80afa9e7 at vpanic+0x177
#2 0xffffffff80afa863 at panic+0x43
#3 0xffffffff80dcddc4 at ffs_syncvnode+0x5a4
#4 0xffffffff80dcc915 at ffs_fsync+0x25
#5 0xffffffff810ffcb2 at VOP_FSYNC_APV+0x82
#6 0xffffffff80bc3a62 at sched_sync+0x412
#7 0xffffffff80abd813 at fork_exit+0x83
#8 0xffffffff80f5cc7e at fork_trampoline+0xe
------------------------------------------------------------------------
machdep.idle_mwait=1
Fatal trap 9: general protection fault while in kernel mode
cpuid = 7; apic id = 07
instruction pointer = 0x20:0xffffffff80e094fe
stack pointer = 0x0:0xfffffe081e5df9e0
frame pointer = 0x0:0xfffffe081e5dfa50
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 17 (dom0)
trap number = 9
panic: general protection fault
cpuid = 7
KDB: stack backtrace:
#0 0xffffffff80b414b7 at kdb_backtrace+0x67
#1 0xffffffff80afa9e7 at vpanic+0x177
#2 0xffffffff80afa863 at panic+0x43
#3 0xffffffff80f7c14f at trap_fatal+0x35f
#4 0xffffffff80f7b70e at trap+0x5e
#5 0xffffffff80f5bccc at calltrap+0x8
#6 0xffffffff80e07a17 at vm_pageout+0x87
#7 0xffffffff80abd813 at fork_exit+0x83
#8 0xffffffff80f5cc7e at fork_trampoline+0xe