lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 11 Mar 2024 21:13:08 +0800
From: Ming Lei <ming.lei@...hat.com>
To: Patrick Plenefisch <simonpatp@...il.com>
Cc: Mike Snitzer <snitzer@...nel.org>,
	Goffredo Baroncelli <kreijack@...ind.it>,
	linux-kernel@...r.kernel.org, Alasdair Kergon <agk@...hat.com>,
	Mikulas Patocka <mpatocka@...hat.com>, Chris Mason <clm@...com>,
	Josef Bacik <josef@...icpanda.com>, David Sterba <dsterba@...e.com>,
	regressions@...ts.linux.dev, dm-devel@...ts.linux.dev,
	linux-btrfs@...r.kernel.org, ming.lei@...hat.com
Subject: Re: LVM-on-LVM: error while submitting device barriers

On Sun, Mar 10, 2024 at 02:11:11PM -0400, Patrick Plenefisch wrote:
> On Sun, Mar 10, 2024 at 11:27 AM Mike Snitzer <snitzer@...nel.org> wrote:
> >
> > On Sun, Mar 10 2024 at  7:34P -0400,
> > Ming Lei <ming.lei@...hat.com> wrote:
> >
> > > On Sat, Mar 09, 2024 at 03:39:02PM -0500, Patrick Plenefisch wrote:
> > > > On Wed, Mar 6, 2024 at 11:00 AM Ming Lei <ming.lei@...hat.com> wrote:
> > > > >
> > > > > #!/usr/bin/bpftrace
> > > > >
> > > > > #ifndef BPFTRACE_HAVE_BTF
> > > > > #include <linux/blkdev.h>
> > > > > #endif
> > > > >
> > > > > kprobe:submit_bio_noacct,
> > > > > kprobe:submit_bio
> > > > > / (((struct bio *)arg0)->bi_opf & (1 << __REQ_PREFLUSH)) != 0 /
> > > > > {
> > > > >         $bio = (struct bio *)arg0;
> > > > >         @submit_stack[arg0] = kstack;
> > > > >         @tracked[arg0] = 1;
> > > > > }
> > > > >
> > > > > kprobe:bio_endio
> > > > > /@...cked[arg0] != 0/
> > > > > {
> > > > >         $bio = (struct bio *)arg0;
> > > > >
> > > > >         if (($bio->bi_flags & (1 << BIO_CHAIN)) && $bio->__bi_remaining.counter > 1) {
> > > > >                 return;
> > > > >         }
> > > > >
> > > > >         if ($bio->bi_status != 0) {
> > > > >                 printf("dev %s bio failed %d, submitter %s completion %s\n",
> > > > >                         $bio->bi_bdev->bd_disk->disk_name,
> > > > >                         $bio->bi_status, @submit_stack[arg0], kstack);
> > > > >         }
> > > > >         delete(@submit_stack[arg0]);
> > > > >         delete(@tracked[arg0]);
> > > > > }
> > > > >
> > > > > END {
> > > > >         clear(@submit_stack);
> > > > >         clear(@tracked);
> > > > > }
> > > > >
> > > >
> > > > Attaching 4 probes...
> > > > dev dm-77 bio failed 10, submitter
> > > >        submit_bio_noacct+5
> > > >        __send_duplicate_bios+358
> > > >        __send_empty_flush+179
> > > >        dm_submit_bio+857
> > > >        __submit_bio+132
> > > >        submit_bio_noacct_nocheck+345
> > > >        write_all_supers+1718
> > > >        btrfs_commit_transaction+2342
> > > >        transaction_kthread+345
> > > >        kthread+229
> > > >        ret_from_fork+49
> > > >        ret_from_fork_asm+27
> > > > completion
> > > >        bio_endio+5
> > > >        dm_submit_bio+955
> > > >        __submit_bio+132
> > > >        submit_bio_noacct_nocheck+345
> > > >        write_all_supers+1718
> > > >        btrfs_commit_transaction+2342
> > > >        transaction_kthread+345
> > > >        kthread+229
> > > >        ret_from_fork+49
> > > >        ret_from_fork_asm+27
> > > >
> > > > dev dm-86 bio failed 10, submitter
> > > >        submit_bio_noacct+5
> > > >        write_all_supers+1718
> > > >        btrfs_commit_transaction+2342
> > > >        transaction_kthread+345
> > > >        kthread+229
> > > >        ret_from_fork+49
> > > >        ret_from_fork_asm+27
> > > > completion
> > > >        bio_endio+5
> > > >        clone_endio+295
> > > >        clone_endio+295
> > > >        process_one_work+369
> > > >        worker_thread+635
> > > >        kthread+229
> > > >        ret_from_fork+49
> > > >        ret_from_fork_asm+27
> > > >
> > > >
> > > > For context, dm-86 is /dev/lvm/brokenDisk and dm-77 is /dev/lowerVG/lvmPool
> > >
> > > io_status is 10(BLK_STS_IOERR), which is produced in submission code path on
> > > /dev/dm-77(/dev/lowerVG/lvmPool) first, so looks it is one device mapper issue.
> > >
> > > The error should be from the following code only:
> > >
> > > static void __map_bio(struct bio *clone)
> > >
> > >       ...
> > >       if (r == DM_MAPIO_KILL)
> > >               dm_io_dec_pending(io, BLK_STS_IOERR);
> > >       else
> > >               dm_io_dec_pending(io, BLK_STS_DM_REQUEUE);
> > >     break;
> >
> > I agree that the above bpf stack traces for dm-77 indicate that
> > dm_submit_bio failed, which would end up in the above branch if the
> > target's ->map() returned DM_MAPIO_KILL or DM_MAPIO_REQUEUE.
> >
> > But such an early failure speaks to the flush bio never being
> > submitted to the underlying storage. No?
> >
> > dm-raid.c:raid_map does return DM_MAPIO_REQUEUE with:
> >
> >         /*
> >          * If we're reshaping to add disk(s)), ti->len and
> >          * mddev->array_sectors will differ during the process
> >          * (ti->len > mddev->array_sectors), so we have to requeue
> >          * bios with addresses > mddev->array_sectors here or
> >          * there will occur accesses past EOD of the component
> >          * data images thus erroring the raid set.
> >          */
> >         if (unlikely(bio_end_sector(bio) > mddev->array_sectors))
> >                 return DM_MAPIO_REQUEUE;
> >
> > But a flush doesn't have an end_sector (it'd be 0 afaik).. so it seems
> > weird relative to a flush.
> >
> > > Patrick, you mentioned lvmPool is raid1, can you explain how lvmPool is
> > > built? It is dm-raid1 target or over plain raid1 device which is
> > > build over /dev/lowerVG?
> 
> LVM raid1:
> lvcreate --type raid1 -m 1 ...

OK, that is the reason, as Mike mentioned.

dm-raid.c:raid_map returns DM_MAPIO_REQUEUE, which is translated into
BLK_STS_IOERR in dm_io_complete().

Empty flush bio is sent from btrfs, both .bi_size and .bi_sector are set
as zero, but the top dm is linear, which(linear_map()) maps new
bio->bi_iter.bi_sector, and the mapped bio is sent to dm-raid(raid_map()),
then DM_MAPIO_REQUEUE is returned.

The one-line patch I sent in last email should solve this issue.

https://lore.kernel.org/dm-devel/a783e5ed-db56-4100-956a-353170b1b7ed@inwind.it/T/#m8fce3ecb2f98370b7d7ce8db6714bbf644af5459

But DM_MAPIO_REQUEUE misuse needs close look, and I believe Mike is working
on that bigger problem.

I guess most of dm targets don't deal with empty bio well, at least
linear & dm-raid, not look into others yet, :-(


Thanks,
Ming


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ