[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <a0423bf6-c924-0012-172b-1217069ff3ec@huaweicloud.com>
Date: Thu, 22 Jan 2026 10:12:03 +0800
From: Li Nan <linan666@...weicloud.com>
To: Xiao Ni <xni@...hat.com>, linan666@...weicloud.com
Cc: song@...nel.org, yukuai@...as.com, linux-raid@...r.kernel.org,
linux-kernel@...r.kernel.org, yangerkun@...wei.com, yi.zhang@...wei.com
Subject: Re: [PATCH 06/15] md/raid1,raid10: use folio for sync path IO
在 2026/1/20 23:53, Xiao Ni 写道:
> On Wed, Dec 17, 2025 at 8:11 PM <linan666@...weicloud.com> wrote:
>>
>> From: Li Nan <linan122@...wei.com>
>>
>> Convert all IO on the sync path to use folios. Rename page-related
>> identifiers to match folio.
>>
>> Retain some now-unnecessary while and for loops to minimize code
>> changes, clean them up in a subsequent patch.
>>
>> Signed-off-by: Li Nan <linan122@...wei.com >> -static inline int resync_alloc_pages(struct resync_pages *rp,
>> +static inline int resync_alloc_folio(struct resync_folio *rf,
>> gfp_t gfp_flags)
>> {
>> - int i;
>> -
>> - for (i = 0; i < RESYNC_PAGES; i++) {
>> - rp->pages[i] = alloc_page(gfp_flags);
>> - if (!rp->pages[i])
>> - goto out_free;
>> - }
>> + rf->folio = folio_alloc(gfp_flags, get_order(RESYNC_BLOCK_SIZE));
>> + if (!rf->folio)
>> + return -ENOMEM;
>
> Is it ok to add an error log here? Compare with the multipage
> situation, the possibility of failure will be somewhat higher because
> it needs to alloc a contiguous block of physical memory.
>
Hi, Xiao
Thanks for your review.
In patch 15 we fall back to a smaller order if sync folio alloc fails.
After that the alloc usually succeeds, so an error log seems noisy. Should
I add a log before the fallback and keep it in patch 15?
>> -static void md_bio_reset_resync_pages(struct bio *bio, struct resync_pages *rp,
>> +static void md_bio_reset_resync_folio(struct bio *bio, struct resync_folio *rf,
>> int size)
>> {
>> - int idx = 0;
>> -
>> /* initialize bvec table again */
>> do {
>> - struct page *page = resync_fetch_page(rp, idx);
>> - int len = min_t(int, size, PAGE_SIZE);
>> + struct folio *folio = resync_fetch_folio(rf);
>> + int len = min_t(int, size, RESYNC_BLOCK_SIZE);
>>
>> - if (WARN_ON(!bio_add_page(bio, page, len, 0))) {
>> + if (WARN_ON(!bio_add_folio(bio, folio, len, 0))) {
>
> Is it ok to use bio_add_folio(bio, folio, RESYNC_BLOCK_SIZE, 0)
> directly here? It removes `size -= len` below, so it's not useless to
> compare size and RESYNC_BLOCK_SIZE above?
>
Same as the previous one, the size is no longer a fixed value after
patch 15. I think keeping it here gives better compatibility.
>> /*
>> - * Allocate RESYNC_PAGES data pages and attach them to
>> - * the first bio.
>> + * Allocate data folio and attach them to the first bio.
>
> typo error
> s/attach them/attach it/g
>
I will fix it later. Thanks.
>> @@ -2284,44 +2278,30 @@ static void process_checks(struct r1bio *r1_bio)
>> }
>> r1_bio->read_disk = primary;
>> for (i = 0; i < conf->raid_disks * 2; i++) {
>> - int j = 0;
>> struct bio *pbio = r1_bio->bios[primary];
>> struct bio *sbio = r1_bio->bios[i];
>> blk_status_t status = sbio->bi_status;
>> - struct page **ppages = get_resync_pages(pbio)->pages;
>> - struct page **spages = get_resync_pages(sbio)->pages;
>> - struct bio_vec *bi;
>> - int page_len[RESYNC_PAGES] = { 0 };
>> - struct bvec_iter_all iter_all;
>> + struct folio *pfolio = get_resync_folio(pbio)->folio;
>> + struct folio *sfolio = get_resync_folio(sbio)->folio;
>>
>> if (sbio->bi_end_io != end_sync_read)
>> continue;
>> /* Now we can 'fixup' the error value */
>> sbio->bi_status = 0;
>>
>> - bio_for_each_segment_all(bi, sbio, iter_all)
>> - page_len[j++] = bi->bv_len;
>> -
>> - if (!status) {
>> - for (j = vcnt; j-- ; ) {
>> - if (memcmp(page_address(ppages[j]),
>> - page_address(spages[j]),
>> - page_len[j]))
>> - break;
>> - }
>> - } else
>> - j = 0;
>> - if (j >= 0)
>> + if (status || memcmp(folio_address(pfolio),
>> + folio_address(sfolio),
>> + r1_bio->sectors << 9)) {
>> atomic64_add(r1_bio->sectors, &mddev->resync_mismatches);
>> - if (j < 0 || (test_bit(MD_RECOVERY_CHECK, &mddev->recovery)
>> - && !status)) {
>> - /* No need to write to this device. */
>> - sbio->bi_end_io = NULL;
>> - rdev_dec_pending(conf->mirrors[i].rdev, mddev);
>> - continue;
>> + if (!test_bit(MD_RECOVERY_CHECK, &mddev->recovery)) {
>> + bio_copy_data(sbio, pbio);
>> + continue;
>> + }
>
> The logic is changed here. The original logic:
> 1. read ok, no mismatch: no bio_copy_data
> 2. read ok, mismatch, check: no bio_copy_data
> 3. read ok, mismatch, no check: need bio_copy_data
> 4. read fail: need bio_copy_data
>
> The 4 is broken.
>
> How about adding a temporary need_write to make logic more clear?
>
> something like:
> if (!status) {
> int ret = 0;
> ret = memcpy(folio_address(pfolio),
> folio_address(sfolio),
> r1_bio->sectors << 9);
> if (ret) {
> atomic64_add(r1_bio->sectors, &mddev->resync_mismatches);
> if (!test_bit(MD_RECOVERY_CHECK, &mddev->recovery))
> need_write = true;
> }
> } else
> need_write = true;
>
> if (need_write)
> bio_copy_data(sbio, pbio);
> else {
> /* No need to write to this device. */
> sbio->bi_end_io = NULL;
> rdev_dec_pending(conf->mirrors[i].rdev, mddev);
> }
>
Nice catch, read failis is indeed broken. I’ll fix it in v2.
>> }
>>
>> - bio_copy_data(sbio, pbio);
>> + /* No need to write to this device. */
>> + sbio->bi_end_io = NULL;
>> + rdev_dec_pending(conf->mirrors[i].rdev, mddev);
>> }
>> }
>> >> @@ -3020,24 +2997,24 @@ static sector_t (struct mddev *mddev, sector_t
sector_nr,
>> }
>>
>> for (i = 0 ; i < conf->raid_disks * 2; i++) {
>> - struct resync_pages *rp;
>> + struct resync_folio *rf;
>>
>> bio = r1_bio->bios[i];
>> - rp = get_resync_pages(bio);
>> + rf = get_resync_folio(bio);
>> if (bio->bi_end_io) {
>> - page = resync_fetch_page(rp, page_idx);
>> + folio = resync_fetch_folio(rf);
>>
>> /*
>> * won't fail because the vec table is big
>> * enough to hold all these pages
>> */
>
> The comments above may not be needed anymore. Because there is only
> one vec in the bio.
>
I will clean it up.
>> - __bio_add_page(bio, page, len, 0);
>> + bio_add_folio_nofail(bio, folio, len, 0);
>> }
>> }
>> nr_sectors += len>>9;
>> sector_nr += len>>9;
>> sync_blocks -= (len>>9);
>
> These three lines are not needed anymore.
>
It is cleaned up in later patches. In this patch I only want minimal
changes, just folio API and naming replacements. Do you think I should move
those cleanups into this patch?
>> /*
>> - * Allocate RESYNC_PAGES data pages and attach them
>> - * where needed.
>> + * Allocate data folio and attach them where needed.
>
> typo error
> s/attach them/attach it/g
>
I will fix it in v2.
>> */
>> for (j = 0; j < nalloc; j++) {
>> struct bio *rbio = r10_bio->devs[j].repl_bio;
>> - struct resync_pages *rp, *rp_repl;
>> + struct resync_folio *rf, *rf_repl;
>>
>> - rp = &rps[j];
>> + rf = &rfs[j];
>> if (rbio)
>> - rp_repl = &rps[nalloc + j];
>> + rf_repl = &rfs[nalloc + j];
>>
>> bio = r10_bio->devs[j].bio;
>>
>> if (!j || test_bit(MD_RECOVERY_SYNC,
>> &conf->mddev->recovery)) {
>> - if (resync_alloc_pages(rp, gfp_flags))
>> + if (resync_alloc_folio(rf, gfp_flags))
>> goto out_free_pages;
>
> s/out_free_pages/out_free_folio/g
>
I will fix it in v2.
>> } else {
>> - memcpy(rp, &rps[0], sizeof(*rp));
>> - resync_get_all_pages(rp);
>> + memcpy(rf, &rfs[0], sizeof(*rf));
>> + resync_get_all_folio(rf);
>
> Maybe the name resync_get_folio is better?
Agree, I will rename it in v2.
>> @@ -2492,19 +2480,21 @@ static void fix_recovery_read_error(struct r10bio *r10_bio)
>>
>> rdev = conf->mirrors[dr].rdev;
>> addr = r10_bio->devs[0].addr + sect;
>> - ok = sync_page_io(rdev,
>> - addr,
>> - s << 9,
>> - pages[idx],
>> - REQ_OP_READ, false);
>> + ok = sync_folio_io(rdev,
>> + addr,
>> + s << 9,
>> + sect << 9,
>> + folio,
>> + REQ_OP_READ, false);
>
> By the comments at the beginning of fix_recovery_read_error, it needs
> to submit io with a page size unit, right? If so, it still needs to
> use sync_page_io here.
>
Here 's' is PAGE_SIZE. We just use a 'page' from folio. In patch 10,
I will change it to use logical block size instead, which should be more
reasonable.
>> /*
>> * Allow skipping a full rebuild for incremental assembly
>> @@ -3277,7 +3265,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>> * with 2 bios in each, that correspond to the bios in the main one.
>> * In this case, the subordinate r10bios link back through a
>> * borrowed master_bio pointer, and the counter in the master
>> - * includes a ref from each subordinate.
>> + * bio_add_folio includes a ref from each subordinate.
>
> What's the reason change this? And I don't understand the new version.
>
It looks like a typo. I will remove it.
> Best Regards
> Xiao
Thanks again for your careful review.
--
Thanks,
Nan
Powered by blists - more mailing lists