Discussion:
HEADSUP: Something has gone south with -current
Steve Kargl
2018-12-07 23:06:22 UTC
Permalink
Dell 7510 laptop was happily running FreeBSD12-alpha9
from Oct. 10th. I decided to update to top-of-tree
today, which would be FreeBSD13 at r341703.

% cd /usr/obj
% rm -rf usr
% cd ../src
% svn update
% make -j6 buildwould (OK)
% make -j6 buildkernel (OK)
% make installkernel (OK)
% mergemaster -p
% <reboot into single user mode>
% mount -a
% cd /usr/src
% make installworld

Dies with a segfault in make(1) half way through the update.
/sbin has been update.

Rebooted with new kernel. Laptop locks up.
Rebooted with kernel.old/kernel (known good kernel). Laptop locks up.
Rebooted with verbose info. Lockup occurs right after

Starting /sbin/init

is printed to console.

Reboot to Dell laptop BIOS and run system diagnostics.

Reboot with old FreeBSD installation cdrom. Mounted the
laptop's root filesystem on /mnt.

% chflags nochgs /mnt/sbin/init
% cp /mnt/sbin/init.bak /mnt/sbin/init

Reboot laptop and finally get back to multi-user mode. Post trauma
analysis

make core dumps.
devd core dumps.
init core dumps.
cc core dumps.
c++ core dumps.

Something seems to be broken.
--
Steve
Shawn Webb
2018-12-07 23:23:57 UTC
Permalink
Post by Steve Kargl
Dell 7510 laptop was happily running FreeBSD12-alpha9
from Oct. 10th. I decided to update to top-of-tree
today, which would be FreeBSD13 at r341703.
% cd /usr/obj
% rm -rf usr
% cd ../src
% svn update
% make -j6 buildwould (OK)
% make -j6 buildkernel (OK)
% make installkernel (OK)
% mergemaster -p
% <reboot into single user mode>
% mount -a
% cd /usr/src
% make installworld
Dies with a segfault in make(1) half way through the update.
/sbin has been update.
Rebooted with new kernel. Laptop locks up.
Rebooted with kernel.old/kernel (known good kernel). Laptop locks up.
Rebooted with verbose info. Lockup occurs right after
Starting /sbin/init
is printed to console.
Reboot to Dell laptop BIOS and run system diagnostics.
Reboot with old FreeBSD installation cdrom. Mounted the
laptop's root filesystem on /mnt.
% chflags nochgs /mnt/sbin/init
% cp /mnt/sbin/init.bak /mnt/sbin/init
Reboot laptop and finally get back to multi-user mode. Post trauma
analysis
make core dumps.
devd core dumps.
init core dumps.
cc core dumps.
c++ core dumps.
Something seems to be broken.
There have been (and still are) issues with the introduction of ifunc
in libc (r339898). The symptoms you're describing sound a lot like the
symptoms I experienced early on.

Do you have any non-standard settings in make.conf/src.conf?

Thanks,
--
Shawn Webb
Cofounder and Security Engineer
HardenedBSD

Tor-ified Signal: +1 443-546-8752
Tor+XMPP+OTR: ***@is.a.hacker.sx
GPG Key ID: 0x6A84658F52456EEE
GPG Key Fingerprint: 2ABA B6BD EF6A F486 BE89 3D9E 6A84 658F 5245 6EEE
Steve Kargl
2018-12-07 23:36:32 UTC
Permalink
Post by Shawn Webb
Post by Steve Kargl
Dell 7510 laptop was happily running FreeBSD12-alpha9
from Oct. 10th. I decided to update to top-of-tree
today, which would be FreeBSD13 at r341703.
analysis
make core dumps.
devd core dumps.
init core dumps.
cc core dumps.
c++ core dumps.
Something seems to be broken.
There have been (and still are) issues with the introduction of ifunc
in libc (r339898). The symptoms you're describing sound a lot like the
symptoms I experienced early on.
Do you have any non-standard settings in make.conf/src.conf?
Both are fairly benign. make.conf contains MALLOC_PRODUCTION="YES"
and src.conf contains a few WITHOUT_* options (eg, CTM, PPP, NDIS).

It seems to be associated with stripping static binaries. See
my follow-up post.
--
Steve
Steve Kargl
2018-12-07 23:30:19 UTC
Permalink
Post by Steve Kargl
make core dumps.
devd core dumps.
init core dumps.
cc core dumps.
c++ core dumps.
Something seems to be broken.
Further investigation,
as core dumps.
cpp core dumps.
/rescue/vi core dumps.

All of these programs are statically linked. Note, ar and ranlib
have static linkage, and appear to still work but these were not
replaced by the failing 'make installworld'.

Ah, so if I go into /usr/obj/usr/src/amd64.amd64/ar, this ar
is static and not stripped and works! But, if I do

cp ar ar.new
strip ar
./ar

This ar core dumps. So, stripping static binaries seems to
break the binary.
--
Steve
Steve Kargl
2018-12-07 23:52:33 UTC
Permalink
Post by Steve Kargl
Post by Steve Kargl
make core dumps.
devd core dumps.
init core dumps.
cc core dumps.
c++ core dumps.
Something seems to be broken.
Further investigation,
as core dumps.
cpp core dumps.
/rescue/vi core dumps.
All of these programs are statically linked. Note, ar and ranlib
have static linkage, and appear to still work but these were not
replaced by the failing 'make installworld'.
Ah, so if I go into /usr/obj/usr/src/amd64.amd64/ar, this ar
is static and not stripped and works! But, if I do
cp ar ar.new
strip ar
./ar
This ar core dumps. So, stripping static binaries seems to
break the binary.
Yep, definitely, a problem with stripping static binaries.

I copied both init and devd from /usr/obj to /sbin without
stripping the binaries. System rebooted as expected.
--
Steve
Steve Kargl
2018-12-08 00:03:19 UTC
Permalink
Post by Steve Kargl
Post by Steve Kargl
Post by Steve Kargl
make core dumps.
devd core dumps.
init core dumps.
cc core dumps.
c++ core dumps.
Something seems to be broken.
Further investigation,
as core dumps.
cpp core dumps.
/rescue/vi core dumps.
All of these programs are statically linked. Note, ar and ranlib
have static linkage, and appear to still work but these were not
replaced by the failing 'make installworld'.
Ah, so if I go into /usr/obj/usr/src/amd64.amd64/ar, this ar
is static and not stripped and works! But, if I do
cp ar ar.new
strip ar
./ar
This ar core dumps. So, stripping static binaries seems to
break the binary.
Yep, definitely, a problem with stripping static binaries.
I copied both init and devd from /usr/obj to /sbin without
stripping the binaries. System rebooted as expected.
Don't know if it's valid, but

% ./ar
% gdb82 ar.new ar.core
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x000000000029386c in __je_malloc_tsd_boot0 ()
(gdb) bt
#0 0x000000000029386c in __je_malloc_tsd_boot0 ()
#1 0x00000000002b6d08 in calloc ()
#2 0x000000000028275b in _thr_alloc ()
#3 0x000000000027ec98 in _libpthread_init ()
#4 0x000000000024d239 in handle_static_init ()
#5 0x000000000024d10e in _start ()
--
Steve
Konstantin Belousov
2018-12-08 00:08:20 UTC
Permalink
Post by Steve Kargl
Post by Steve Kargl
Post by Steve Kargl
make core dumps.
devd core dumps.
init core dumps.
cc core dumps.
c++ core dumps.
Something seems to be broken.
Further investigation,
as core dumps.
cpp core dumps.
/rescue/vi core dumps.
All of these programs are statically linked. Note, ar and ranlib
have static linkage, and appear to still work but these were not
replaced by the failing 'make installworld'.
Ah, so if I go into /usr/obj/usr/src/amd64.amd64/ar, this ar
is static and not stripped and works! But, if I do
cp ar ar.new
strip ar
./ar
This ar core dumps. So, stripping static binaries seems to
break the binary.
Yep, definitely, a problem with stripping static binaries.
I copied both init and devd from /usr/obj to /sbin without
stripping the binaries. System rebooted as expected.
Most likely this is an issue fixed by r339350.
Steve Kargl
2018-12-08 00:25:39 UTC
Permalink
Post by Konstantin Belousov
Post by Steve Kargl
Post by Steve Kargl
Post by Steve Kargl
make core dumps.
devd core dumps.
init core dumps.
cc core dumps.
c++ core dumps.
Something seems to be broken.
Further investigation,
as core dumps.
cpp core dumps.
/rescue/vi core dumps.
All of these programs are statically linked. Note, ar and ranlib
have static linkage, and appear to still work but these were not
replaced by the failing 'make installworld'.
Ah, so if I go into /usr/obj/usr/src/amd64.amd64/ar, this ar
is static and not stripped and works! But, if I do
cp ar ar.new
strip ar
./ar
This ar core dumps. So, stripping static binaries seems to
break the binary.
Yep, definitely, a problem with stripping static binaries.
I copied both init and devd from /usr/obj to /sbin without
stripping the binaries. System rebooted as expected.
Most likely this is an issue fixed by r339350.
My tree is at r341703. The last paragraph of the commit
message for r339350 is

Just remove filter_reloc. This fixes certain cases including statically
linked binaries containing ifuncs. Stripping binaries with relocations
referencing removed symbols was already broken, and after this change
may still be broken in a different way.

So, I guess I'm hitting the "broken in a different way".

The gdb82 backtrace ends up in jemalloc. I do build world with
MALLOC_PRODUCTION="YES". Perhaps, ifuncs+jemalloc aren't at
production level. I have few more broken static binaries that
I need to replace before I can rebuild without MALLOC_PRODUCTION.
--
Steve
Konstantin Belousov
2018-12-08 00:43:17 UTC
Permalink
Post by Steve Kargl
Post by Konstantin Belousov
Post by Steve Kargl
Post by Steve Kargl
Post by Steve Kargl
make core dumps.
devd core dumps.
init core dumps.
cc core dumps.
c++ core dumps.
Something seems to be broken.
Further investigation,
as core dumps.
cpp core dumps.
/rescue/vi core dumps.
All of these programs are statically linked. Note, ar and ranlib
have static linkage, and appear to still work but these were not
replaced by the failing 'make installworld'.
Ah, so if I go into /usr/obj/usr/src/amd64.amd64/ar, this ar
is static and not stripped and works! But, if I do
cp ar ar.new
strip ar
./ar
This ar core dumps. So, stripping static binaries seems to
break the binary.
Yep, definitely, a problem with stripping static binaries.
I copied both init and devd from /usr/obj to /sbin without
stripping the binaries. System rebooted as expected.
Most likely this is an issue fixed by r339350.
My tree is at r341703. The last paragraph of the commit
message for r339350 is
Which tree ? The strip that is used by install should be past this
revision.
Post by Steve Kargl
Just remove filter_reloc. This fixes certain cases including statically
linked binaries containing ifuncs. Stripping binaries with relocations
referencing removed symbols was already broken, and after this change
may still be broken in a different way.
So, I guess I'm hitting the "broken in a different way".
The gdb82 backtrace ends up in jemalloc. I do build world with
MALLOC_PRODUCTION="YES". Perhaps, ifuncs+jemalloc aren't at
production level. I have few more broken static binaries that
I need to replace before I can rebuild without MALLOC_PRODUCTION.
--
Steve
Steve Kargl
2018-12-08 01:02:03 UTC
Permalink
Post by Konstantin Belousov
Post by Steve Kargl
Post by Konstantin Belousov
Most likely this is an issue fixed by r339350.
My tree is at r341703. The last paragraph of the commit
message for r339350 is
Which tree ? The strip that is used by install should be past this
revision.
% cd /usr/src
% svn info
Path: .
Working Copy Root Path: /usr/src
URL: svn://svn.freebsd.org/base/head
Relative URL: ^/head
Repository Root: svn://svn.freebsd.org/base
Repository UUID: ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
Revision: 341703
Node Kind: directory
Schedule: normal
Last Changed Author: emaste
Last Changed Rev: 341703
Last Changed Date: 2018-12-07 08:52:52 -0800 (Fri, 07 Dec 2018)

This is the /usr/src that has led to the broken static binaries.

Looking at timestamps, I have

% ls -l /usr/bin/strip
-r-xr-xr-x 2 root wheel - 131144 Oct 10 17:10 /usr/bin/strip*

which is the strip from my Oct 10 build. This strip did not get
updated because 'make installworld' died. Does install during
an installworld use the old strip instead of freshly built strip?
--
Steve
Konstantin Belousov
2018-12-08 01:32:46 UTC
Permalink
Post by Steve Kargl
Post by Konstantin Belousov
Post by Steve Kargl
Post by Konstantin Belousov
Most likely this is an issue fixed by r339350.
My tree is at r341703. The last paragraph of the commit
message for r339350 is
Which tree ? The strip that is used by install should be past this
revision.
% cd /usr/src
% svn info
Path: .
Working Copy Root Path: /usr/src
URL: svn://svn.freebsd.org/base/head
Relative URL: ^/head
Repository Root: svn://svn.freebsd.org/base
Repository UUID: ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
Revision: 341703
Node Kind: directory
Schedule: normal
Last Changed Author: emaste
Last Changed Rev: 341703
Last Changed Date: 2018-12-07 08:52:52 -0800 (Fri, 07 Dec 2018)
This is the /usr/src that has led to the broken static binaries.
Looking at timestamps, I have
% ls -l /usr/bin/strip
-r-xr-xr-x 2 root wheel - 131144 Oct 10 17:10 /usr/bin/strip*
which is the strip from my Oct 10 build. This strip did not get
updated because 'make installworld' died. Does install during
an installworld use the old strip instead of freshly built strip?
It is installed (host) strip that is used, AFAIK. You can build
static lib/libelftc and usr.bin/strip from the later date and install
it to get past the issue.
Steve Kargl
2018-12-08 01:26:58 UTC
Permalink
Post by Steve Kargl
Post by Konstantin Belousov
Post by Steve Kargl
Post by Konstantin Belousov
Most likely this is an issue fixed by r339350.
My tree is at r341703. The last paragraph of the commit
message for r339350 is
Which tree ? The strip that is used by install should be past this
revision.
This is the /usr/src that has led to the broken static binaries.
Looking at timestamps, I have
% ls -l /usr/bin/strip
-r-xr-xr-x 2 root wheel - 131144 Oct 10 17:10 /usr/bin/strip*
which is the strip from my Oct 10 build. This strip did not get
updated because 'make installworld' died. Does install during
an installworld use the old strip instead of freshly built strip?
Looks like /usr/src/UPDATING could use an entry about r339350.

I was updating an r339290 world to r341703. This jumps across
r339350. /usr/bin/strip from r339290 apparently is used during
installworld, which renders a system rather broken.

20181013:
At r339350, /usr/bin/strip was updated to deal with the introduction
of ifuncs into FreeBSD. In particular, a /usr/bin/strip from an earlier
revision can lead to a broken system. To avoid mayhem, it is suggested
that one does

cd /usr/src/usr.bin/objcopy
make install

prior to 'make installworld'
--
Steve
Continue reading on narkive:
Loading...