Mark Delany
2021-04-18 10:47:00 UTC
Hi all.
I rarely if ever post here so if there's a better place, LMK.
I've been running 12.2 on vultr.com instances for a long time without any issues. However
I recently attempted an upgrade to 13.0 and the system now exhibits a number of issues.
The most critical issue is that the system randomly wedged after running for a while
(anywhere from 10 minutes to a couple of hours) requiring a reboot to recover. No console
response or messages and limited network response (see below). No messages logged anywhere
as best I can tell.
The second issue is more annoying than critical: the system doesn't reboot with the
reboot/shutdown commands. The shutdown sequence seems to complete but the reboot never
occurs. I compiled and ran a "reboot(RB_AUTOBOOT | RB_VERBOSE)" but nothing interesting
showed up.
I have no idea whether the two issues are related excepting that neither occur with 12.2
Some details:
- I first upgraded with freebsd-update and then tried with a fresh ISO image and
completely overwrote the original file system.
- I've tried both UFS and ZFS root file systems.
- I tried with a fresh VM instance in case there was some sort of per-instance glitch
- The system is 99% idle with no memory pressure. It normally runs nsd, openntpd and a few
other processes installed via pkg, but nothing wierd as best I can tell.
- it has no kernel modules manually loaded
- It's configured with ipv4 and ipv6 and when it gets wedged I get a ping response from
the ipv6 address, but not from ipv4. Furthermore, if I try a tcp connection to ipv6 I
get a connection setup, but no data.
- The VM is configured as a single-CPU system
- I haven't raised the issue with vultr yet. Thought I'd see what the hive-mind thinks
first.
Not that it will surprise anyone, but I recently spun up 13.0 in Virtualbox on a lab
machine as well as on a different VM provider without any problems, so it's probably
something relatively unique to vultr.
That this is a virtually idle system on a single CPU with no oddball or unusual kernel
modules or network configs makes the situation surprising to me. There is no pattern that
I'm yet able to discern. The main thing I have left to try is to boot the system without
any networking activated, but apart from that I'm out of ideas in terms of identifying the
root cause.
So my questions are:
1. Anyone else having the same issue? Or not having the same issue?
2. Clues on how to diagnose? This is a non-critical system so I can try anything that
anyone suggests but I'm not particularly familiar with kernel-level debugging so a bit
of hand-holding might be needed if you have suggestions.
For those unfamiliar with vultr's VMs, here's the first part of dmesg:
FreeBSD 13.0-RELEASE #0 releng/13.0-n244733-ea31abc261f: Fri Apr 9 04:24:09 UTC 2021
***@releng1.nyi.freebsd.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64
FreeBSD clang version 11.0.1 (***@github.com:llvm/llvm-project.git llvmorg-11.0.1-0-g43ff75f2c3fe)
VT(vga): text 80x25
CPU: Intel Xeon Processor (Cascadelake) (2993.02-MHz K8-class CPU)
Origin="GenuineIntel" Id=0x50656 Family=0x6 Model=0x55 Stepping=6
Features=0x783fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE,SSE2>
Features2=0xfffa3203<SSE3,PCLMULQDQ,SSSE3,FMA,CX16,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND,HV>
AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
AMD Features2=0x21<LAHF,ABM>
Structured Extended Features=0xd18307a9<FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,AVX512F,AVX512DQ,CLFLUSHOPT,CLWB,AVX512CD,AVX512BW,AVX512VL>
Structured Extended Features2=0x808<PKU,AVX512VNNI>
Structured Extended Features3=0xa4000000<IBPB,ARCH_CAP,SSBD>
XSAVE Features=0x1<XSAVEOPT>
IA32_ARCH_CAPS=0x2b<RDCL_NO,IBRS_ALL,SKIP_L1DFL_VME,MDS_NO>
Hypervisor: Origin = "KVMKVMKVM"
real memory = 1073741824 (1024 MB)
avail memory = 997744640 (951 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <BOCHS BXPCAPIC>
random: registering fast source Intel Secure Key RNG
random: fast provider: "Intel Secure Key RNG"
random: unblocking device.
ioapic0 <Version 1.1> irqs 0-23
Timecounter "TSC-low" frequency 1496510010 Hz quality 800
in case it shows up anything odd to those who can decode this sort of stuff.
Mark.
I rarely if ever post here so if there's a better place, LMK.
I've been running 12.2 on vultr.com instances for a long time without any issues. However
I recently attempted an upgrade to 13.0 and the system now exhibits a number of issues.
The most critical issue is that the system randomly wedged after running for a while
(anywhere from 10 minutes to a couple of hours) requiring a reboot to recover. No console
response or messages and limited network response (see below). No messages logged anywhere
as best I can tell.
The second issue is more annoying than critical: the system doesn't reboot with the
reboot/shutdown commands. The shutdown sequence seems to complete but the reboot never
occurs. I compiled and ran a "reboot(RB_AUTOBOOT | RB_VERBOSE)" but nothing interesting
showed up.
I have no idea whether the two issues are related excepting that neither occur with 12.2
Some details:
- I first upgraded with freebsd-update and then tried with a fresh ISO image and
completely overwrote the original file system.
- I've tried both UFS and ZFS root file systems.
- I tried with a fresh VM instance in case there was some sort of per-instance glitch
- The system is 99% idle with no memory pressure. It normally runs nsd, openntpd and a few
other processes installed via pkg, but nothing wierd as best I can tell.
- it has no kernel modules manually loaded
- It's configured with ipv4 and ipv6 and when it gets wedged I get a ping response from
the ipv6 address, but not from ipv4. Furthermore, if I try a tcp connection to ipv6 I
get a connection setup, but no data.
- The VM is configured as a single-CPU system
- I haven't raised the issue with vultr yet. Thought I'd see what the hive-mind thinks
first.
Not that it will surprise anyone, but I recently spun up 13.0 in Virtualbox on a lab
machine as well as on a different VM provider without any problems, so it's probably
something relatively unique to vultr.
That this is a virtually idle system on a single CPU with no oddball or unusual kernel
modules or network configs makes the situation surprising to me. There is no pattern that
I'm yet able to discern. The main thing I have left to try is to boot the system without
any networking activated, but apart from that I'm out of ideas in terms of identifying the
root cause.
So my questions are:
1. Anyone else having the same issue? Or not having the same issue?
2. Clues on how to diagnose? This is a non-critical system so I can try anything that
anyone suggests but I'm not particularly familiar with kernel-level debugging so a bit
of hand-holding might be needed if you have suggestions.
For those unfamiliar with vultr's VMs, here's the first part of dmesg:
FreeBSD 13.0-RELEASE #0 releng/13.0-n244733-ea31abc261f: Fri Apr 9 04:24:09 UTC 2021
***@releng1.nyi.freebsd.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64
FreeBSD clang version 11.0.1 (***@github.com:llvm/llvm-project.git llvmorg-11.0.1-0-g43ff75f2c3fe)
VT(vga): text 80x25
CPU: Intel Xeon Processor (Cascadelake) (2993.02-MHz K8-class CPU)
Origin="GenuineIntel" Id=0x50656 Family=0x6 Model=0x55 Stepping=6
Features=0x783fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE,SSE2>
Features2=0xfffa3203<SSE3,PCLMULQDQ,SSSE3,FMA,CX16,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND,HV>
AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
AMD Features2=0x21<LAHF,ABM>
Structured Extended Features=0xd18307a9<FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,AVX512F,AVX512DQ,CLFLUSHOPT,CLWB,AVX512CD,AVX512BW,AVX512VL>
Structured Extended Features2=0x808<PKU,AVX512VNNI>
Structured Extended Features3=0xa4000000<IBPB,ARCH_CAP,SSBD>
XSAVE Features=0x1<XSAVEOPT>
IA32_ARCH_CAPS=0x2b<RDCL_NO,IBRS_ALL,SKIP_L1DFL_VME,MDS_NO>
Hypervisor: Origin = "KVMKVMKVM"
real memory = 1073741824 (1024 MB)
avail memory = 997744640 (951 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <BOCHS BXPCAPIC>
random: registering fast source Intel Secure Key RNG
random: fast provider: "Intel Secure Key RNG"
random: unblocking device.
ioapic0 <Version 1.1> irqs 0-23
Timecounter "TSC-low" frequency 1496510010 Hz quality 800
in case it shows up anything odd to those who can decode this sort of stuff.
Mark.