Discussion:
Painfully slow compilation (read: "make buildworld buildkernel") on not-so-weak system
Lev Serebryakov
2018-12-08 12:13:18 UTC
Permalink
Hello Lev,
I've checked all "standard" places — CPU is not throttling, SSD looks
perfectly Ok according to SMART and there is no complains from AHCI driver
about timeouts and such, system doesn't start to use swap.
ZFS ARC was checked too. Here is statistics from top when single-job kernel
build is in action. A lot of free memory, small ARC, too much CPU is
consumed by interrupts, but there is free CPU clocks:

last pid: 19488; load averages: 7.03, 5.35, 5.10 up 0+14:43:04 15:09:55
417 threads: 7 running, 395 sleeping, 15 waiting
CPU 0: 50.0% user, 0.0% nice, 0.0% system, 16.4% interrupt, 33.6% idle
CPU 1: 16.4% user, 0.0% nice, 16.8% system, 0.0% interrupt, 66.8% idle
CPU 2: 0.0% user, 0.0% nice, 33.2% system, 0.0% interrupt, 66.8% idle
CPU 3: 33.2% user, 0.0% nice, 33.2% system, 0.0% interrupt, 33.6% idle
Mem: 28M Active, 315M Inact, 2076K Laundry, 2541M Wired, 1129K Buf, 5031M Free
ARC: 1025M Total, 197M MFU, 415M MRU, 514K Anon, 20M Header, 392M Other
189M Compressed, 563M Uncompressed, 2.98:1 Ratio
Swap: 16G Total, 16G Free




--
Best regards,
Lev mailto:***@FreeBSD.org
Lev Serebryakov
2018-12-08 12:18:59 UTC
Permalink
Hello Lev,
Another strange thing I noticed: when system is in such state, "top -SH"
shows that sometimes very low-profile processes, like clock software
interrupt (!) could consume large amount of CPU for short periods time. When
system is idle there never will be "intr{swi4: clock (0)}" consuming 55% CPU
for one "frame" or sshd, or screen itself.
Like this. This system doesn't have any significant network traffic now —
only one ssh connection, which is used as console. And 62.3% for network
card. WTF?!

PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
20128 root 101 0 104M 74M CPU1 1 0:31 100.00% cc
0 root -76 - 0 4608K - 2 53:25 62.23% kernel{if_config_tqg_0}
11 root -60 - 0 240K WAIT 0 25:45 24.89% intr{swi4: clock (0)}
9 root -8 - 0 160K tx->tx 0 7:38 24.88% zfskern{txg_thread_enter}
995 root 24 0 17M 7676K select 1 2:20 12.44% sendmail
13791 root 24 0 24M 15M select 0 0:04 12.44% make




--
Best regards,
Lev mailto:***@FreeBSD.org
Waitman Gobble via freebsd-hackers
2018-12-08 12:48:38 UTC
Permalink
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by Lev Serebryakov
Hello Lev,
Another strange thing I noticed: when system is in such state, "top -SH"
shows that sometimes very low-profile processes, like clock software
interrupt (!) could consume large amount of CPU for short periods time. When
system is idle there never will be "intr{swi4: clock (0)}" consuming 55% CPU
for one "frame" or sshd, or screen itself.
Like this. This system doesn't have any significant network traffic now —
only one ssh connection, which is used as console. And 62.3% for network
card. WTF?!
PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
20128 root 101 0 104M 74M CPU1 1 0:31 100.00% cc
0 root -76 - 0 4608K - 2 53:25 62.23% kernel{if_config_tqg_0}
11 root -60 - 0 240K WAIT 0 25:45 24.89% intr{swi4: clock (0)}
9 root -8 - 0 160K tx->tx 0 7:38 24.88% zfskern{txg_thread_enter}
995 root 24 0 17M 7676K select 1 2:20 12.44% sendmail
13791 root 24 0 24M 15M select 0 0:04 12.44% make
---------------------------------------------------------------------------------------------------------
Best regards,
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
I had super slow build for r341270, but I thought it was because I accidentally left WITNESS option set.

I killed it after about 10 hours, booted to single user and rebuilt kernel there.

Waitman
Eugene Grosbein
2018-12-08 13:27:13 UTC
Permalink
I'm completely lost. Is it problem of software? Hardware? If it is
hardware problem what should I blame?
Try using different kern.timecounter.hardware and/or kern.eventtimer.timer
but first try kern.eventtimer.periodic=1 instead of default 0.

If something of this helps, try going back to defaults and then disable power-saving settings, if any.
Lev Serebryakov
2018-12-08 13:44:55 UTC
Permalink
Hello Eugene,
Post by Eugene Grosbein
I'm completely lost. Is it problem of software? Hardware? If it is
hardware problem what should I blame?
Try using different kern.timecounter.hardware and/or kern.eventtimer.timer
but first try kern.eventtimer.periodic=1 instead of default 0.
Nothing helps. I've tried periodic=1 and replace hardware and time with HPT
(from TSC-Low and LAPIC), but system still "sticky" with single-job build
and unresposnive with multiple-job build, and still there is strange bursts
of CPU consumption from threads and processes which should be low-profile.
Post by Eugene Grosbein
If something of this helps, try going back to defaults and then disable power-saving settings, if any.
I'll try to disable C2/C3 and turn off Turbo as next step...
--
Best regards,
Lev mailto:***@FreeBSD.org
Lev Serebryakov
2018-12-08 14:20:42 UTC
Permalink
Hello Lev,
Even when build is single-job, system becomes unresponsive. With
4-job build running it could takes up to minute to switch screen's windows!
And even with 1-job kernel build upsmon's connection to remote upsd
flickers! Unbelievable.

Looks like each next compiler invocation is slower and more stressful than
previous one.
--
Best regards,
Lev mailto:***@FreeBSD.org
Mateusz Guzik
2018-12-08 14:27:42 UTC
Permalink
Post by Lev Serebryakov
Hello Lev,
Even when build is single-job, system becomes unresponsive. With
4-job build running it could takes up to minute to switch screen's windows!
And even with 1-job kernel build upsmon's connection to remote upsd
flickers! Unbelievable.
Looks like each next compiler invocation is slower and more stressful than
previous one.
Is this a fresh install?

Can you please narrow the problem down to a specific kernel revision?
Most importantly, does this show up with a 12.0 kernel?

I'm running one amd box and a number of intel boxes with various cpus,
no issues.
--
Mateusz Guzik <mjguzik gmail.com>
Lev Serebryakov
2018-12-08 16:58:37 UTC
Permalink
Hello Mateusz,
Post by Mateusz Guzik
Post by Lev Serebryakov
Looks like each next compiler invocation is slower and more stressful than
previous one.
Is this a fresh install?
Almost fresh. It was installed from some rather fresh 13 snapshot and then
upgraded to r341157 and custom kernel via source update. Now I'm trying to
update it second time without luck.

First upgrade was not so painful, as far as I can remember :-)
Post by Mateusz Guzik
Can you please narrow the problem down to a specific kernel revision?
I'm still not sure it is software or hardware problem.
Post by Mateusz Guzik
Most importantly, does this show up with a 12.0 kernel?
I didn't tried 12 kernel on this hardware.
Post by Mateusz Guzik
I'm running one amd box and a number of intel boxes with various cpus,
no issues.
Me too, but this is only one box which have 13 and try to compile
something, all other boxes are either 11/12 or are small NanoBSD installations
without toolchain...
--
Best regards,
Lev mailto:***@FreeBSD.org
Lev Serebryakov
2018-12-08 19:09:43 UTC
Permalink
Hello Lev,
Post by Lev Serebryakov
Post by Mateusz Guzik
Can you please narrow the problem down to a specific kernel revision?
I'm still not sure it is software or hardware problem.
Looks like Samsung 850 EVO doesn't like TRIMs sent by ZFS (and I've thought
it is good SSD, consumer-grade, but really good one!).

I've tuned down TRIMs with

vfs.zfs.per_txg_dirty_frees_percent=10
vfs.zfs.free_max_blocks=1000
vfs.zfs.vdev.trim_max_active=4

And it MOSTLY solved problem: there are some freezing from time to time (and
strange consumption of CPU by low-profile threads) with these settings.

When I've disabled TRIM completely all freezes are gone, and low-profile
threads consume tenths of percent of CPU, as it is intended.
--
Best regards,
Lev mailto:***@FreeBSD.org
Loading...