linux-kernel - Re: [PATCH] block: make sure big bio is splitted into at most 256 bvecs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CACVXFVP8ACO75SZvmK-u_rx+_e8xYoA7GWiG1oqaOQ7+xHooEQ@mail.gmail.com>
Date:	Wed, 6 Apr 2016 09:11:10 +0800
From:	Ming Lei <ming.lei@...onical.com>
To:	Shaohua Li <shli@...com>
Cc:	Jens Axboe <axboe@...com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	linux-block@...r.kernel.org,
	Kent Overstreet <kent.overstreet@...il.com>,
	Christoph Hellwig <hch@...radead.org>,
	Eric Wheeler <bcache@...ts.ewheeler.net>,
	Sebastian Roesner <sroesner-kernelorg@...sner-online.de>,
	"4.2+" <stable@...r.kernel.org>
Subject: Re: [PATCH] block: make sure big bio is splitted into at most 256 bvecs

On Wed, Apr 6, 2016 at 9:04 AM, Shaohua Li <shli@...com> wrote:
> On Wed, Apr 06, 2016 at 08:47:56AM +0800, Ming Lei wrote:
>> On Wed, Apr 6, 2016 at 2:27 AM, Shaohua Li <shli@...com> wrote:
>> > On Wed, Apr 06, 2016 at 01:44:06AM +0800, Ming Lei wrote:
>> >> After arbitrary bio size is supported, the incoming bio may
>> >> be very big. We have to split the bio into small bios so that
>> >> each holds at most BIO_MAX_PAGES bvecs for safety reason, such
>> >> as bio_clone().
>> >>
>> >> This patch fixes the following kernel crash:
>> >>
>> >> > [  172.660142] BUG: unable to handle kernel NULL pointer dereference at
>> >> > 0000000000000028
>> >> > [  172.660229] IP: [<ffffffff811e53b4>] bio_trim+0xf/0x2a
>> >> > [  172.660289] PGD 7faf3e067 PUD 7f9279067 PMD 0
>> >> > [  172.660399] Oops: 0000 [#1] SMP
>> >> > [...]
>> >> > [  172.664780] Call Trace:
>> >> > [  172.664813]  [<ffffffffa007f3be>] ? raid1_make_request+0x2e8/0xad7 [raid1]
>> >> > [  172.664846]  [<ffffffff811f07da>] ? blk_queue_split+0x377/0x3d4
>> >> > [  172.664880]  [<ffffffffa005fb5f>] ? md_make_request+0xf6/0x1e9 [md_mod]
>> >> > [  172.664912]  [<ffffffff811eb860>] ? generic_make_request+0xb5/0x155
>> >> > [  172.664947]  [<ffffffffa0445c89>] ? prio_io+0x85/0x95 [bcache]
>> >> > [  172.664981]  [<ffffffffa0448252>] ? register_cache_set+0x355/0x8d0 [bcache]
>> >> > [  172.665016]  [<ffffffffa04497d3>] ? register_bcache+0x1006/0x1174 [bcache]
>> >>
>> >> Fixes: 54efd50(block: make generic_make_request handle arbitrarily sized bios)
>> >
>> > this bug is introduced by d2be537c3ba
>> >> Reported-by: Sebastian Roesner <sroesner-kernelorg@...sner-online.de>
>> >> Reported-by: Eric Wheeler <bcache@...ts.ewheeler.net>
>> >> Cc: stable@...r.kernel.org (4.2+)
>> >> Cc: Shaohua Li <shli@...com>
>> >> Signed-off-by: Ming Lei <ming.lei@...onical.com>
>> >> ---
>> >> I can reproduce the issue and verify the fix by the following approach:
>> >>       - create one raid1 over two virtio-blk
>> >>       - build bcache device over the above raid1 and another cache device.
>> >>       - set cache mode as writeback
>> >>       - run random write over ext4 on the bcache device
>> >>       - then the crash can be triggered
>> >
>> > can you explain why this is better than my original patch?
>>
>> Shaohua, your patch is wrong, please see the following link:
>>
>>     https://lkml.org/lkml/2016/3/30/893
>
> I don't see why, really, except it declares you are right :)

Never mind, I post it again, and maybe cause my poor english, :-)

blk_rq_get_max_sectors() which uses max sectors limit is used for
merging bios/reqs, and that means limits.max_sectors is for limitting
max sectors in one request or transfer. One request may include lots of
bios. Now this patch decreases the limit just for single bio's 256
bvec's limitation.
Is that correct? That is the reason why I suggest to change get_max_io_size()
for bio's 256 bvecs limit.

On the contrary, the default max sectors should have been increased
since hardware is becoming quicker, and we should send more to drive
in one request, IMO.

>
> why it's 2560 instead of 2048?

I don't know the exact reason why Jeff takes 2560, but I feel it can be
bigger because the hardware is becoming quicker.

Thanks,
Ming

> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html