linux-kernel - Re: [REGRESSION] LVM-on-LVM: error while submitting device barriers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAOCpoWeNYsMfzh8TSnFqwAG1BhAYnNt_J+AcUNqRLF7zmJGEFA@mail.gmail.com>
Date: Wed, 28 Feb 2024 12:25:57 -0500
From: Patrick Plenefisch <simonpatp@...il.com>
To: stable@...r.kernel.org, linux-kernel@...r.kernel.org
Cc: Alasdair Kergon <agk@...hat.com>, Mike Snitzer <snitzer@...nel.org>, 
	Mikulas Patocka <mpatocka@...hat.com>, Chris Mason <clm@...com>, Josef Bacik <josef@...icpanda.com>, 
	David Sterba <dsterba@...e.com>, regressions@...ts.linux.dev, dm-devel@...ts.linux.dev, 
	linux-btrfs@...r.kernel.org
Subject: Re: [REGRESSION] LVM-on-LVM: error while submitting device barriers

I'm unsure if this is just an LVM bug, or a BTRFS+LVM interaction bug,
but LVM is definitely involved somehow.
Upgrading from 5.10 to 6.1, I noticed one of my filesystems was
read-only. In dmesg, I found:

BTRFS error (device dm-75): bdev /dev/mapper/lvm-brokenDisk errs: wr
0, rd 0, flush 1, corrupt 0, gen 0
BTRFS warning (device dm-75): chunk 13631488 missing 1 devices, max
tolerance is 0 for writable mount
BTRFS: error (device dm-75) in write_all_supers:4379: errno=-5 IO
failure (errors while submitting device barriers.)
BTRFS info (device dm-75: state E): forced readonly
BTRFS warning (device dm-75: state E): Skipping commit of aborted transaction.
BTRFS: error (device dm-75: state EA) in cleanup_transaction:1992:
errno=-5 IO failure

At first I suspected a btrfs error, but a scrub found no errors, and
it continued to be read-write on 5.10 kernels.

Here is my setup:

/dev/lvm/brokenDisk is a lvm-on-lvm volume. I have /dev/sd{a,b,c,d}
(of varying sizes) in a lower VG, which has three LVs, all raid1
volumes. Two of the volumes are further used as PV's for an upper VGs.
One of the upper VGs has no issues. The non-PV LV has no issue. The
remaining one, /dev/lowerVG/lvmPool, hosting nested LVM, is used as a
PV for VG "lvm", and has 3 volumes inside. Two of those volumes have
no issues (and are btrfs), but the last one is /dev/lvm/brokenDisk.
This volume is the only one that exhibits this behavior, so something
is special.

Or described as layers:
/dev/sd{a,b,c,d} => PV => VG "lowerVG"
/dev/lowerVG/single (RAID1 LV) => BTRFS, works fine
/dev/lowerVG/works (RAID1 LV) => PV => VG "workingUpper"
/dev/workingUpper/{a,b,c} => BTRFS, works fine
/dev/lowerVG/lvmPool (RAID1 LV) => PV => VG "lvm"
/dev/lvm/{a,b} => BTRFS, works fine
/dev/lvm/brokenDisk => BTRFS, Exhibits errors

After some investigation, here is what I've found:

1. This regression was introduced in 5.19. 5.18 and earlier kernels I
can keep this filesystem rw and everything works as expected, while
5.19.0 and later the filesystem is immediately ro on any write
attempt. I couldn't build rc1, but I did confirm rc2 already has this
regression.
2. Passing /dev/lvm/brokenDisk to a KVM VM as /dev/vdb with an
unaffected kernel inside the vm exhibits the ro barrier problem on
unaffected kernels.
3. Passing /dev/lowerVG/lvmPool to a KVM VM as /dev/vdb with an
affected kernel inside the VM and using LVM inside the VM exhibits
correct behavior (I can keep the filesystem rw, no barrier errors on
host or guest)
4. A discussion in IRC with BTRFS folks, and they think the BTRFS
filesystem is fine (btrfs check and btrfs scrub also agree)
5. The dmesg error can be delayed indefinitely by not writing to the
disk, or reading with noatime
6. This affects Debian, Ubuntu, NixOS, and Solus, so I'm fairly
certain it's distro-agnostic, and purely a kernel issue.
7. I can't reproduce this with other LVM-on-LVM setups, so I think the
asymmetric nature of the raid1 volume is potentially contributing
8. There are no new smart errors/failures on any of the disks, disks are healthy
9. I previously had raidintegrity=y and caching enabled. They didn't
affect the issue

#regzbot introduced v5.18..v5.19-rc2

Patrick