[bug] fsck refuses to repair damaged UFS using backup superblock

Discussion:

s***@cydem.org

2018-11-20 13:30:00 UTC

Howdy!

Since send-pr(1) is now gone, I guess the next option is to send a
message directly to the developers...

Yesterday, I ran into a bug in fsck_ffs that gave me a little scare.

Short story: on -CURRENT, fsck refuses to check a FS with a corrupted
superblock, even when an alternate (backup) SB location is given.

Long story. I've been testing a newly-built system based on an X399
platform with a 2950X CPU and an Optane 905P 480GB U.2 drive. The
system ran a ~2-day old -CURRENT; when compiling newest world and
kernel, I found the machine in a locked-up state. After a hard reset,
boot failed because the root FS became corrupted & was not available:
kernel: Superblock check-hash failed: recorded check-hash XXX != computed check-hash YYY

I have not yet figured out why the corruption happened... bad hardware?
bug in the NVMe driver?

"OK", I thought, "No worries. We'll just boot using another disk, fsck
the corrupted FS with a backup superblock, and be up in a moment".
The machine was doing nothing but compiling, so no valuable data loss.

So I did `dumpfs -m /dev/ada0p3` on the spare disk (which was the
source for the new disk image, thus had almost identical partitions
and filesystems) to get the FS details, then did `newfs -N [...]
/dev/ada0p3` to find locations of superblock backups, then finally
ran `fsck_ffs -b 192 /dev/nvd0p3` -- only to get the same "check-
-hash failed" message, plus another strange message: "Can't open
/dev/nvd0p3: [...]". Then fsck quits.
Note that `fsck_ffs -b ...` on a FS with good superblock works OK.

After fiddling with a debugger for a bit, I commented out the line
"return (0);" in /usr/src/sbin/fsck_ffs/setup.c:136, recompiled fsck,
and the FS was recovered successfully.

What was actually happening: fsck's setup.c calls ufs_disk_fillout()
from libufs' type.c, which in turn calls sbread() from the same
library, which then calls sbget(disk->d_fd, &fs, -1) [[where '-1'
is hard-coded to indicate the primary superblock]] that then simply
invokes ffs_sbget from ffs kernel driver -- and this returns ENOENT,
which eventually causes fsck to give up before even looking at the
specified backup superblock.

I don't know what exactly ufs_disk_fillout() does, but fortunately
for me, fsck worked without the "sbread(disk)" part of that function
having much luck on a disk with corrupted superblock. Also, I have a
feeling that calling a kernel's ffs driver function when using fsck
to fix a broken filesystem is not the best thing to do...

Please CC, as I am not subscribed.

--
[SorAlx] ridin' VN2000 Classic LT

Julian H. Stacey

2018-11-23 01:17:20 UTC

Permalink

Date: Tue, 20 Nov 2018 05:30:00 -0800
Howdy!
Since send-pr(1) is now gone, I guess the next option is to send a
message directly to the developers...
Yesterday, I ran into a bug in fsck_ffs that gave me a little scare.
Short story: on -CURRENT, fsck refuses to check a FS with a corrupted
superblock, even when an alternate (backup) SB location is given.
Long story. I've been testing a newly-built system based on an X399
platform with a 2950X CPU and an Optane 905P 480GB U.2 drive. The
system ran a ~2-day old -CURRENT; when compiling newest world and
kernel, I found the machine in a locked-up state. After a hard reset,
kernel: Superblock check-hash failed: recorded check-hash XXX != computed check-hash YYY
I have not yet figured out why the corruption happened... bad hardware?
bug in the NVMe driver?
"OK", I thought, "No worries. We'll just boot using another disk, fsck
the corrupted FS with a backup superblock, and be up in a moment".
The machine was doing nothing but compiling, so no valuable data loss.
So I did `dumpfs -m /dev/ada0p3` on the spare disk (which was the
source for the new disk image, thus had almost identical partitions
and filesystems) to get the FS details, then did `newfs -N [...]
/dev/ada0p3` to find locations of superblock backups, then finally
ran `fsck_ffs -b 192 /dev/nvd0p3` -- only to get the same "check-
-hash failed" message, plus another strange message: "Can't open
/dev/nvd0p3: [...]". Then fsck quits.
Note that `fsck_ffs -b ...` on a FS with good superblock works OK.
After fiddling with a debugger for a bit, I commented out the line
"return (0);" in /usr/src/sbin/fsck_ffs/setup.c:136, recompiled fsck,
and the FS was recovered successfully.
What was actually happening: fsck's setup.c calls ufs_disk_fillout()
from libufs' type.c, which in turn calls sbread() from the same
library, which then calls sbget(disk->d_fd, &fs, -1) [[where '-1'
is hard-coded to indicate the primary superblock]] that then simply
invokes ffs_sbget from ffs kernel driver -- and this returns ENOENT,
which eventually causes fsck to give up before even looking at the
specified backup superblock.
I don't know what exactly ufs_disk_fillout() does, but fortunately
for me, fsck worked without the "sbread(disk)" part of that function
having much luck on a disk with corrupted superblock. Also, I have a
feeling that calling a kernel's ffs driver function when using fsck
to fix a broken filesystem is not the best thing to do...
Please CC, as I am not subscribed.
--
[SorAlx] ridin' VN2000 Classic LT
_______________________________________________
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers

Cheers,
Julian

--
Julian Stacey, Computer Consultant, Systems Engineer, BSD Linux Unix, Munich.
Brexit referendum stole 3,700,000 votes from Brits abroad, inc. 700,000 in EU
UK PM lied it's democratic in Article 50 http://exitbrexit.uk/brexit/#lie
Campaign lies, criminal funded; Markets, jobs & pound down; New Referendum!

Julian H. Stacey

2018-11-23 19:07:17 UTC

Permalink

Post by s***@cydem.org
Please CC, as I am not subscribed.

An over active spam filter bounced my CC to ***@cydem.org so
unless he/she thinks to subscribes, a chance any other CC may not be seen.

Cheers,
Julian

--
Julian Stacey, Computer Consultant, Systems Engineer, BSD Linux Unix, Munich.
Brexit referendum stole 3,700,000 votes from Brits abroad, inc. 700,000 in EU
PM lied it's democratic in Article 50 http://exitbrexit.uk/brexit/#lie
Campaign lies, criminal funded; Markets, jobs & pound down; New Referendum!

Benjamin Kaduk

2018-11-24 01:19:49 UTC

Permalink

Post by s***@cydem.org
Howdy!
Since send-pr(1) is now gone, I guess the next option is to send a
message directly to the developers...

I'm not sure where one would get that impression, given that
https://www.freebsd.org/support/bugreports.html links to
https://bugs.freebsd.org/bugzilla/enter_bug.cgi .

-Ben

s***@cydem.org

2018-11-24 07:30:00 UTC

Permalink

Ben,

Post by Benjamin Kaduk

Post by s***@cydem.org
Since send-pr(1) is now gone, I guess the next option is to send a
message directly to the developers...

I'm not sure where one would get that impression, given that
https://www.freebsd.org/support/bugreports.html links to
https://bugs.freebsd.org/bugzilla/enter_bug.cgi .

I have tried the web-based bugzilla, but was greeted with
a log-in page when I went to the bug report address & no
form to report bugs.

Post by Benjamin Kaduk
-Ben

--
[SorAlx] ridin' VN2000 Classic LT

Benjamin Kaduk

2018-11-26 00:59:51 UTC

Permalink

Post by s***@cydem.org
Ben,

Post by Benjamin Kaduk

Post by s***@cydem.org
Since send-pr(1) is now gone, I guess the next option is to send a
message directly to the developers...

I'm not sure where one would get that impression, given that
https://www.freebsd.org/support/bugreports.html links to
https://bugs.freebsd.org/bugzilla/enter_bug.cgi .

I have tried the web-based bugzilla, but was greeted with
a log-in page when I went to the bug report address & no
form to report bugs.

I quote from the first linked page:

% An account will need to be created before a bug can be submitted. Please
% note that messages sent to a mailing list are not tracked as official
% problem reports, and may get lost in the noise!

The system is behaving as expected. The old system (that allowed
unauthenticated bug submission) was a spam magnet, and I'm given to
understand that dealing with that spam inflow essentially causes burnout
for all humans subjected to it.

-Ben