[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <850263b4-fbc7-379e-2b9f-80f602ee0e72@huaweicloud.com>
Date: Tue, 2 Sep 2025 16:04:37 +0800
From: Yu Kuai <yukuai1@...weicloud.com>
To: Yu Kuai <yukuai1@...weicloud.com>, Bart Van Assche <bvanassche@....org>,
hch@...radead.org, colyli@...nel.org, hare@...e.de, dlemoal@...nel.org,
tieren@...as.com, axboe@...nel.dk, tj@...nel.org, josef@...icpanda.com,
song@...nel.org, kmo@...erainc.com, satyat@...gle.com, ebiggers@...gle.com,
neil@...wn.name, akpm@...ux-foundation.org
Cc: linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
cgroups@...r.kernel.org, linux-raid@...r.kernel.org, yi.zhang@...wei.com,
yangerkun@...wei.com, johnny.chenyi@...wei.com,
"yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [PATCH RFC v3 00/15] block: fix disordered IO in the case
recursive split
Hi,
在 2025/09/02 9:50, Yu Kuai 写道:
> Hi,
>
> 在 2025/09/01 22:09, Bart Van Assche 写道:
>> On 8/31/25 8:32 PM, Yu Kuai wrote:
>>> This set is just test for raid5 for now, see details in patch 9;
>>
>> Does this mean that this patch series doesn't fix reordering caused by
>> recursive splitting for zoned block devices? A test case that triggers
>> an I/O error is available here:
>> https://lore.kernel.org/linux-block/a8a714c7-de3d-4cc9-8c23-38b8dc06f5bb@acm.org/
>>
> I'll try this test.
>
This test can't run directly in my VM, then I debug a bit and modify the
test a bit, following is the result by the block trace event of
block_io_start:
Before this set:
dd-3014 [000] .N... 1918.939253: block_io_start: 252,2 WS
524288 () 0 + 1024 be,0,4 [dd]
kworker/0:1H-37 [000] ..... 1918.952434: block_io_start:
252,2 WS 524288 () 1024 + 1024 be,0,4 [kworker/0:1H]
dd-3014 [000] ..... 1918.973499: block_io_start:
252,2 WS 524288 () 8192 + 1024 be,0,4 [dd]
kworker/0:1H-37 [000] ..... 1918.984805: block_io_start:
252,2 WS 524288 () 9216 + 1024 be,0,4 [kworker/0:1H]
dd-3014 [000] .N... 1919.010224: block_io_start:
252,2 WS 524288 () 16384 + 1024 be,0,4 [dd]
kworker/0:1H-37 [000] ..... 1919.021667: block_io_start:
252,2 WS 524288 () 17408 + 1024 be,0,4 [kworker/0:1H]
dd-3014 [000] ..... 1919.053072: block_io_start:
252,2 WS 524288 () 24576 + 1024 be,0,4 [dd]
kworker/0:1H-37 [000] ..... 1919.064781: block_io_start:
252,2 WS 524288 () 25600 + 1024 be,0,4 [kworker/0:1H]
dd-3014 [000] .N... 1919.100657: block_io_start:
252,2 WS 524288 () 32768 + 1024 be,0,4 [dd]
kworker/0:1H-37 [000] ..... 1919.112999: block_io_start:
252,2 WS 524288 () 33792 + 1024 be,0,4 [kworker/0:1H]
dd-3014 [000] ..... 1919.145032: block_io_start:
252,2 WS 524288 () 40960 + 1024 be,0,4 [dd]
kworker/0:1H-37 [000] ..... 1919.156677: block_io_start:
252,2 WS 524288 () 41984 + 1024 be,0,4 [kworker/0:1H]
dd-3014 [000] .N... 1919.188287: block_io_start:
252,2 WS 524288 () 49152 + 1024 be,0,4 [dd]
kworker/0:1H-37 [000] ..... 1919.199869: block_io_start:
252,2 WS 524288 () 50176 + 1024 be,0,4 [kworker/0:1H]
dd-3014 [000] .N... 1919.233467: block_io_start:
252,2 WS 524288 () 57344 + 1024 be,0,4 [dd]
kworker/0:1H-37 [000] ..... 1919.245487: block_io_start:
252,2 WS 524288 () 58368 + 1024 be,0,4 [kworker/0:1H]
dd-3014 [000] .N... 1919.281146: block_io_start:
252,2 WS 524288 () 65536 + 1024 be,0,4 [dd]
kworker/0:1H-37 [000] ..... 1919.292812: block_io_start:
252,2 WS 524288 () 66560 + 1024 be,0,4 [kworker/0:1H]
dd-3014 [000] .N... 1919.326543: block_io_start:
252,2 WS 524288 () 73728 + 1024 be,0,4 [dd]
kworker/0:1H-37 [000] ..... 1919.338412: block_io_start:
252,2 WS 524288 () 74752 + 1024 be,0,4 [kworker/0:1H]
dd-3014 [000] .N... 1919.374312: block_io_start:
252,2 WS 524288 () 81920 + 1024 be,0,4 [dd]
kworker/0:1H-37 [000] ..... 1919.386481: block_io_start:
252,2 WS 524288 () 82944 + 1024 be,0,4 [kworker/0:1H]
dd-3014 [000] ..... 1919.419795: block_io_start:
252,2 WS 524288 () 90112 + 1024 be,0,4 [dd]
kworker/0:1H-37 [000] ..... 1919.431454: block_io_start:
252,2 WS 524288 () 91136 + 1024 be,0,4 [kworker/0:1H]
dd-3014 [000] .N... 1919.466208: block_io_start:
252,2 WS 524288 () 98304 + 1024 be,0,4 [dd]
We can see block_io_start is not sequential, and test will report out of
space failure.
With this set and zone device checking removed:
diff:
diff --git a/block/blk-core.c b/block/blk-core.c
index 6ca3c45f421c..37b5dd396e22 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -746,7 +746,7 @@ void submit_bio_noacct_nocheck(struct bio *bio, bool
split)
* it is active, and then process them after it returned.
*/
if (current->bio_list) {
- if (split && !bdev_is_zoned(bio->bi_bdev))
+ if (split)
bio_list_add_head(¤t->bio_list[0], bio);
else
bio_list_add(¤t->bio_list[0], bio);
result:
dd-612 [000] .N... 52.856395: block_io_start: 252,2
WS 524288 () 0 + 1024 be,0,4 [dd]
kworker/0:1H-37 [000] ..... 52.869947: block_io_start:
252,2 WS 524288 () 1024 + 1024 be,0,4 [kworker/0:1H]
kworker/0:1H-37 [000] ..... 52.880295: block_io_start:
252,2 WS 524288 () 2048 + 1024 be,0,4 [kworker/0:1H]
kworker/0:1H-37 [000] ..... 52.890541: block_io_start:
252,2 WS 524288 () 3072 + 1024 be,0,4 [kworker/0:1H]
kworker/0:1H-37 [000] ..... 52.900951: block_io_start:
252,2 WS 524288 () 4096 + 1024 be,0,4 [kworker/0:1H]
kworker/0:1H-37 [000] ..... 52.911370: block_io_start:
252,2 WS 524288 () 5120 + 1024 be,0,4 [kworker/0:1H]
kworker/0:1H-37 [000] ..... 52.922160: block_io_start:
252,2 WS 524288 () 6144 + 1024 be,0,4 [kworker/0:1H]
kworker/0:1H-37 [000] ..... 52.932823: block_io_start:
252,2 WS 524288 () 7168 + 1024 be,0,4 [kworker/0:1H]
dd-612 [000] .N... 52.968469: block_io_start:
252,2 WS 524288 () 8192 + 1024 be,0,4 [dd]
kworker/0:1H-37 [000] ..... 52.980892: block_io_start:
252,2 WS 524288 () 9216 + 1024 be,0,4 [kworker/0:1H]
kworker/0:1H-37 [000] ..... 52.991500: block_io_start:
252,2 WS 524288 () 10240 + 1024 be,0,4 [kworker/0:1H]
kworker/0:1H-37 [000] ..... 53.002088: block_io_start:
252,2 WS 524288 () 11264 + 1024 be,0,4 [kworker/0:1H]
kworker/0:1H-37 [000] ..... 53.012879: block_io_start:
252,2 WS 524288 () 12288 + 1024 be,0,4 [kworker/0:1H]
kworker/0:1H-37 [000] ..... 53.023518: block_io_start:
252,2 WS 524288 () 13312 + 1024 be,0,4 [kworker/0:1H]
kworker/0:1H-37 [000] ..... 53.034365: block_io_start:
252,2 WS 524288 () 14336 + 1024 be,0,4 [kworker/0:1H]
kworker/0:1H-37 [000] ..... 53.045077: block_io_start:
252,2 WS 524288 () 15360 + 1024 be,0,4 [kworker/0:1H]
dd-612 [000] .N... 53.082148: block_io_start:
252,2 WS 524288 () 16384 + 1024 be,0,4 [dd]
We can see that block_io_start is sequential now.
Thanks,
Kuai
> zoned block device is bypassed in patch 14 by:
>
> + if (split && !bdev_is_zoned(bio->bi_bdev))
> + bio_list_add_head(¤t->bio_list[0], bio);
>
> If I can find a reporducer for zoned block, and verify that recursive
> split can be fixed as well, I can remove the checking for zoned devices
> in the next verison.
>
> Thanks,
> Kuai
>
>>
>> I have not yet had the time to review this patch series but plan to take
>> a look soon.
>>
>> Thanks,
>>
>> Bart.
>> .
>>
>
> .
>
Powered by blists - more mailing lists