lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Thu, 6 Oct 2016 07:57:59 +0100
From:   Sitsofe Wheeler <sitsofe@...il.com>
To:     Shaohua Li <shli@...nel.org>
Cc:     Jens Axboe <axboe@...nel.dk>, linux-raid@...r.kernel.org,
        linux-block@...r.kernel.org,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: kernel BUG at block/bio.c:1785 while trying to issue a discard to
 LVM on RAID1 md

On 5 October 2016 at 22:39, Shaohua Li <shli@...nel.org> wrote:
> On Wed, Oct 05, 2016 at 10:31:11PM +0100, Sitsofe Wheeler wrote:
>> On 3 October 2016 at 17:47, Sitsofe Wheeler <sitsofe@...il.com> wrote:
>> >
>> > While trying to do a discard (via blkdiscard --length 1048576
>> > /dev/<pathtodevice>) to an LVM device atop a two disk md RAID1 the
>> > following oops was generated:
>> >
>> > [  103.306243] md: resync of RAID array md127
>> > [  103.306246] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
>> > [  103.306248] md: using maximum available idle IO bandwidth (but not
>> > more than 200000 KB/sec) for resync.
>> > [  103.306251] md: using 128k window, over a total of 244194432k.
>> > [  103.308158] ------------[ cut here ]------------
>> > [  103.308205] kernel BUG at block/bio.c:1785!
>>
>> This still seems to be here but slightly modified with a 4.8.0 kernel:
>
> Does this fix the issue? Looks there is IO error
>
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 21dc00e..349eb11 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -2196,7 +2196,6 @@ static int narrow_write_error(struct r1bio *r1_bio, int i)
>                         wbio = bio_clone_mddev(r1_bio->master_bio, GFP_NOIO, mddev);
>                 }
>
> -               bio_set_op_attrs(wbio, REQ_OP_WRITE, 0);
>                 wbio->bi_iter.bi_sector = r1_bio->sector;
>                 wbio->bi_iter.bi_size = r1_bio->sectors << 9;
>

Yes the patch above fixes the issue and make blkdiscard just report
that the BLKDISCARD ioctl failed. Since having this patch applied
means the issue seen in
http://www.gossamer-threads.com/lists/linux/kernel/2538757?do=post_view_threaded#2538757
(BUG at arch/x86/kernel/pci-nommu.c:66 / BUG at
./include/linux/scatterlist.h:90) can't be reached does that mean
whatever was seen there is also spurious?

Additionally as this issue seems to have been a problem going back to
at least the 3.18 kernels, would a fix similar to this be eligible for
stable kernels?

-- 
Sitsofe | http://sucs.org/~sits/

Powered by blists - more mailing lists