lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220526145402.slc4ve5vrlewyutm@riteshh-domain>
Date:   Thu, 26 May 2022 20:24:02 +0530
From:   Ritesh Harjani <ritesh.list@...il.com>
To:     Borislav Petkov <bp@...en8.de>,
        linux-ext4 <linux-ext4@...r.kernel.org>
Cc:     lkml <linux-kernel@...r.kernel.org>,
        Vaibhav Jain <vaibhav@...ux.ibm.com>, Jan Kara <jack@...e.cz>,
        Theodore Ts'o <tytso@....edu>
Subject: Re: EXT4-fs error (device sda5) in ext4_update_backup_sb:165:
 Filesystem failed CRC

On 22/04/28 02:55PM, Borislav Petkov wrote:
> Hi,
>
> the errors at the end of this mail come from one of my test boxes booted
> with latest Linus:
>
> 8f4dd16603ce ("Merge branch 'akpm' (patches from Andrew)")
>
> + tip/master.
>
> A second boot into the same kernel says:
>
> [    5.427329] EXT4-fs (sda5): warning: mounting fs with errors, running e2fsck is recommended
> [    5.435681] EXT4-fs (sda5): mounted filesystem with ordered data mode. Quota mode: disabled.
> ...
>
> [  316.621377] EXT4-fs (sda5): error count since last fsck: 14
> [  316.621645] EXT4-fs (sda5): initial error at time 1651146136: ext4_update_backup_sb:165
> [  316.621948] EXT4-fs (sda5): last error at time 1651146136: ext4_update_backup_sb:165

Could you please help us understand little more about your setup. Is this (sda5)
somehow a backup image saved/restored using e2image?

>
>
> And it used to work fine with rc3:
>
> EXT4-fs (sda5): mounted filesystem with ordered data mode. Quota mode: disabled.
>
> so before I go and fsck the partition, I thought I should report it
> first - maybe something new in ext4 land is not behaving as it should...
>
> And since rc3 I see:
>
> $ git log --oneline v5.18-rc3.. fs/ext4/
> c00c5e1d157b Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
> eb7054212eac ext4: update the cached overhead value in the superblock
> 85d825dbf489 ext4: force overhead calculation if the s_overhead_cluster makes no sense
> 10b01ee92df5 ext4: fix overhead calculation to account for the reserved gdt blocks

^^^ looks like these patches might have triggered the check on the backup
superblock if the on-disk s_overhead_cluster doesn't match with in kernel
calculation.

> 2da376228a24 ext4: limit length to bitmap_maxbytes - blocksize in punch_hole
> c186f0887fe7 ext4: fix use-after-free in ext4_search_dir
> b98535d09179 ext4: fix bug_on in start_this_handle during umount filesystem
> a2b0b205d125 ext4: fix symlink file size not match to file content
> ad5cd4f4ee4d ext4: fix fallocate to use file_modified to update permissions consistently
>
> so there is something which just got applied...
>
> [    4.742960] device-mapper: ioctl: 4.46.0-ioctl (2022-02-22) initialised: dm-devel@...hat.com
> [    4.766518] loop: module loaded
> [    4.836287] EXT4-fs (sda5): mounted filesystem with ordered data mode. Quota mode: disabled.
> [    4.840733] EXT4-fs (sda5): Invalid checksum for backup superblock 32768
>
> [    4.843142] EXT4-fs error (device sda5) in ext4_update_backup_sb:165: Filesystem failed CRC
> [    4.844802] EXT4-fs (sda5): Invalid checksum for backup superblock 98304
>
> [    4.847239] EXT4-fs error (device sda5) in ext4_update_backup_sb:165: Filesystem failed CRC
> [    4.848942] EXT4-fs (sda5): Invalid checksum for backup superblock 163840
>
> [    4.851344] EXT4-fs error (device sda5) in ext4_update_backup_sb:165: Filesystem failed CRC
> [    4.852919] EXT4-fs (sda5): Invalid checksum for backup superblock 229376
>
> [    4.855270] EXT4-fs error (device sda5) in ext4_update_backup_sb:165: Filesystem failed CRC
> [    4.856910] EXT4-fs (sda5): Invalid checksum for backup superblock 294912
>
> [    4.859279] EXT4-fs error (device sda5) in ext4_update_backup_sb:165: Filesystem failed CRC
> [    4.860946] EXT4-fs (sda5): Invalid checksum for backup superblock 819200
>
> [    4.863429] EXT4-fs error (device sda5) in ext4_update_backup_sb:165: Filesystem failed CRC
> [    4.865182] EXT4-fs (sda5): Invalid checksum for backup superblock 884736
>
> [    4.867793] EXT4-fs error (device sda5) in ext4_update_backup_sb:165: Filesystem failed CRC
> [    4.869583] EXT4-fs (sda5): Invalid checksum for backup superblock 1605632
>
> [    4.872285] EXT4-fs error (device sda5) in ext4_update_backup_sb:165: Filesystem failed CRC
> [    4.874109] EXT4-fs (sda5): Invalid checksum for backup superblock 2654208
>
> [    4.877056] EXT4-fs error (device sda5) in ext4_update_backup_sb:165: Filesystem failed CRC
> [    4.878751] EXT4-fs error (device sda5) in ext4_update_backup_sb:165: Filesystem failed CRC

All of the prints above shows the prints coming from ext4_update_backup_sb()
which is getting called during mount from ext4_fill_super() -> ext4_update_overhead()

So, recently I have also been reported with a similar problem, where in filesystem
image which was saved using e2image on v5.17 kernel (with e2fsck 1.45.5 (07-Jan-2020)).
Then on upgrading the kernel to v5.18, when this FS image (via e2image) was mounted
using loop device (or restored to a block device), the above error messages
were observed.

My theory so far is, that somehow the s_overhead_cluster calculation saved on
the disk was not correct (since I guess earlier version of e2fsprog 1.45.5 might
not be storing s_overhead_cluster information on disk durnig mkfs??).
Then on upgrading the kernel, the 3 patches mentioned would recalculate the
sbi->s_overhead for non-bigalloc filesystem during mount and if it doesn't match
the on disk es->s_overhead_cluster value, it will try to update all superblocks
via (ext4_update_overhead())

why CRC checksum failure -
	...Before updating backup superblock it will check the checksum to make sure
	that the superblock backup copy is not corrupt.
	And I guess e2image doesn't stores the backup superblocks while saving the
	image. So those blocks are all zeroed. Hence the superblock checksum problem
	is getting reported with the case which I am seeing it internally.

So, putting down my thoughts here for discussion -

- 1st is this consider a valid usecase to use e2image save/restore of disk image
  (users could backup using "-a" option which will also take the backup of all the FS
  data + critical metadata).

- Given we might use this way of updating backup superblock copies in kernel for
  even other values in future and users could upgrade their kernels but might
  still use older e2fsprogs, does it make sense to provide an option in e2image
  to save copies of backup superblocks too?

- I haven't yet spend much time for a solution for above problem. i.e. What
  should we do for users who might still might take up backup w/o this
  additional option to save backup superblocks. With this kernel thinks that the
  backup superblock is corrupt, since it's checksum doesn't match.

-ritesh

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ