linux-kernel - Re: [REGRESSION] LVM-on-LVM: error while submitting device barriers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOCpoWexiuYLu0fpPr71+Uzxw_tw3q4HGF9tKgx5FM4xMx9fWA@mail.gmail.com>
Date: Wed, 28 Feb 2024 14:37:45 -0500
From: Patrick Plenefisch <simonpatp@...il.com>
To: kreijack@...ind.it
Cc: stable@...r.kernel.org, linux-kernel@...r.kernel.org, 
	Alasdair Kergon <agk@...hat.com>, Mike Snitzer <snitzer@...nel.org>, 
	Mikulas Patocka <mpatocka@...hat.com>, Chris Mason <clm@...com>, Josef Bacik <josef@...icpanda.com>, 
	David Sterba <dsterba@...e.com>, regressions@...ts.linux.dev, dm-devel@...ts.linux.dev, 
	linux-btrfs@...r.kernel.org
Subject: Re: [REGRESSION] LVM-on-LVM: error while submitting device barriers

On Wed, Feb 28, 2024 at 2:19 PM Goffredo Baroncelli <kreijack@...ero.it> wrote:
>
> On 28/02/2024 18.25, Patrick Plenefisch wrote:
> > I'm unsure if this is just an LVM bug, or a BTRFS+LVM interaction bug,
> > but LVM is definitely involved somehow.
> > Upgrading from 5.10 to 6.1, I noticed one of my filesystems was
> > read-only. In dmesg, I found:
> >
> > BTRFS error (device dm-75): bdev /dev/mapper/lvm-brokenDisk errs: wr
> > 0, rd 0, flush 1, corrupt 0, gen 0
> > BTRFS warning (device dm-75): chunk 13631488 missing 1 devices, max
> > tolerance is 0 for writable mount
> > BTRFS: error (device dm-75) in write_all_supers:4379: errno=-5 IO
> > failure (errors while submitting device barriers.)
> > BTRFS info (device dm-75: state E): forced readonly
> > BTRFS warning (device dm-75: state E): Skipping commit of aborted transaction.
> > BTRFS: error (device dm-75: state EA) in cleanup_transaction:1992:
> > errno=-5 IO failure
> >
> > At first I suspected a btrfs error, but a scrub found no errors, and
> > it continued to be read-write on 5.10 kernels.
> >
> > Here is my setup:
> >
> > /dev/lvm/brokenDisk is a lvm-on-lvm volume. I have /dev/sd{a,b,c,d}
> > (of varying sizes) in a lower VG, which has three LVs, all raid1
> > volumes. Two of the volumes are further used as PV's for an upper VGs.
> > One of the upper VGs has no issues. The non-PV LV has no issue. The
> > remaining one, /dev/lowerVG/lvmPool, hosting nested LVM, is used as a
> > PV for VG "lvm", and has 3 volumes inside. Two of those volumes have
> > no issues (and are btrfs), but the last one is /dev/lvm/brokenDisk.
> > This volume is the only one that exhibits this behavior, so something
> > is special.
> >
> > Or described as layers:
> > /dev/sd{a,b,c,d} => PV => VG "lowerVG"
> > /dev/lowerVG/single (RAID1 LV) => BTRFS, works fine
> > /dev/lowerVG/works (RAID1 LV) => PV => VG "workingUpper"
> > /dev/workingUpper/{a,b,c} => BTRFS, works fine
> > /dev/lowerVG/lvmPool (RAID1 LV) => PV => VG "lvm"
> > /dev/lvm/{a,b} => BTRFS, works fine
> > /dev/lvm/brokenDisk => BTRFS, Exhibits errors
>
> I am a bit curious about the reasons of this setup.

The lowerVG is supposed to be a pool of storage for several VM's &
containers. [workingUpper] is for one VM, and [lvm] is for another VM.
However right now I'm still trying to organize the files directly
because I don't have all the VM's fully setup yet

> However I understood that:
>
> /dev/sda -+                +-- single (RAID1) -> ok             +-> a   ok
> /dev/sdb  |                |                                    |-> b   ok
> /dev/sdc  +--> [lowerVG]>--+-- works (RAID1) -> [workingUpper] -+-> c   ok
> /dev/sdd -+                |
>                             |                       +-> a          -> ok
>                             +-- lvmPool -> [lvm] ->-|
>                                                     +-> b          -> ok
>                                                     |
>                                                     +->brokenDisk  -> fail
>
> [xxx] means VG, the others are LVs that may act also as PV in
> an upper VG

Note that lvmPool is also RAID1, but yes

>
> So, it seems that
>
> 1) lowerVG/lvmPool/lvm/a
> 2) lowerVG/lvmPool/lvm/a
> 3) lowerVG/lvmPool/lvm/brokenDisk
>
> are equivalent ... so I don't understand how 1) and 2) are fine but 3) is
> problematic.

I assume you meant  lvm/b for 2?

>
> Is my understanding of the LVM layouts correct ?

Your understanding is correct. The only thing that comes to my mind to
cause the problem is asymmetry of the SATA devices. I have one 8TB
device, plus a 1.5TB, 3TB, and 3TB drives. Doing math on the actual
extents, lowerVG/single spans (3TB+3TB), and
lowerVG/lvmPool/lvm/brokenDisk spans (3TB+1.5TB). Both obviously have
the other leg of raid1 on the 8TB drive, but my thought was that the
jump across the 1.5+3TB drive gap was at least "interesting"

>
>
> >
> > After some investigation, here is what I've found:
> >
> > 1. This regression was introduced in 5.19. 5.18 and earlier kernels I
> > can keep this filesystem rw and everything works as expected, while
> > 5.19.0 and later the filesystem is immediately ro on any write
> > attempt. I couldn't build rc1, but I did confirm rc2 already has this
> > regression.
> > 2. Passing /dev/lvm/brokenDisk to a KVM VM as /dev/vdb with an
> > unaffected kernel inside the vm exhibits the ro barrier problem on
> > unaffected kernels.
>
> Is /dev/lvm/brokenDisk *always* problematic with affected ( >= 5.19 ) and
> UNaffected ( < 5.19 ) kernel ?

Yes, I didn't test it in as much depth, but 5.15 and 6.1 in the VM
(and 6.1 on the host) are identically problematic

>
> > 3. Passing /dev/lowerVG/lvmPool to a KVM VM as /dev/vdb with an
> > affected kernel inside the VM and using LVM inside the VM exhibits
> > correct behavior (I can keep the filesystem rw, no barrier errors on
> > host or guest)
>
> Is /dev/lowerVG/lvmPool problematic with only "affected" kernel ?

Uh, passing lvmPool directly to the VM is never problematic. I tested
5.10 and 6.1 in the VM (and 6.1 on the host), and neither setup throws
barrier errors.

> [...]
>
> --
> gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
> Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
>