[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CALTww2-kpMGb8cYTtFcg5=5Y+8W_eBSHv=SdEExOivTK7OisLQ@mail.gmail.com>
Date: Thu, 22 Jan 2026 15:01:02 +0800
From: Xiao Ni <xni@...hat.com>
To: Li Nan <linan666@...weicloud.com>
Cc: song@...nel.org, yukuai@...as.com, linux-raid@...r.kernel.org,
linux-kernel@...r.kernel.org, yangerkun@...wei.com, yi.zhang@...wei.com
Subject: Re: [PATCH 06/15] md/raid1,raid10: use folio for sync path IO
On Thu, Jan 22, 2026 at 10:12 AM Li Nan <linan666@...weicloud.com> wrote:
>
>
>
> 在 2026/1/20 23:53, Xiao Ni 写道:
> > On Wed, Dec 17, 2025 at 8:11 PM <linan666@...weicloud.com> wrote:
> >>
> >> From: Li Nan <linan122@...wei.com>
> >>
> >> Convert all IO on the sync path to use folios. Rename page-related
> >> identifiers to match folio.
> >>
> >> Retain some now-unnecessary while and for loops to minimize code
> >> changes, clean them up in a subsequent patch.
> >>
> >> Signed-off-by: Li Nan <linan122@...wei.com >> -static inline int resync_alloc_pages(struct resync_pages *rp,
> >> +static inline int resync_alloc_folio(struct resync_folio *rf,
> >> gfp_t gfp_flags)
> >> {
> >> - int i;
> >> -
> >> - for (i = 0; i < RESYNC_PAGES; i++) {
> >> - rp->pages[i] = alloc_page(gfp_flags);
> >> - if (!rp->pages[i])
> >> - goto out_free;
> >> - }
> >> + rf->folio = folio_alloc(gfp_flags, get_order(RESYNC_BLOCK_SIZE));
> >> + if (!rf->folio)
> >> + return -ENOMEM;
> >
> > Is it ok to add an error log here? Compare with the multipage
> > situation, the possibility of failure will be somewhat higher because
> > it needs to alloc a contiguous block of physical memory.
> >
>
> Hi, Xiao
>
> Thanks for your review.
>
> In patch 15 we fall back to a smaller order if sync folio alloc fails.
> After that the alloc usually succeeds, so an error log seems noisy. Should
> I add a log before the fallback and keep it in patch 15?
Thanks for the explanation. No change is needed here :)
>
> >> -static void md_bio_reset_resync_pages(struct bio *bio, struct resync_pages *rp,
> >> +static void md_bio_reset_resync_folio(struct bio *bio, struct resync_folio *rf,
> >> int size)
> >> {
> >> - int idx = 0;
> >> -
> >> /* initialize bvec table again */
> >> do {
> >> - struct page *page = resync_fetch_page(rp, idx);
> >> - int len = min_t(int, size, PAGE_SIZE);
> >> + struct folio *folio = resync_fetch_folio(rf);
> >> + int len = min_t(int, size, RESYNC_BLOCK_SIZE);
> >>
> >> - if (WARN_ON(!bio_add_page(bio, page, len, 0))) {
> >> + if (WARN_ON(!bio_add_folio(bio, folio, len, 0))) {
> >
> > Is it ok to use bio_add_folio(bio, folio, RESYNC_BLOCK_SIZE, 0)
> > directly here? It removes `size -= len` below, so it's not useless to
> > compare size and RESYNC_BLOCK_SIZE above?
> >
>
> Same as the previous one, the size is no longer a fixed value after
> patch 15. I think keeping it here gives better compatibility.
Thanks for the explanation.
>
> >> /*
> >> - * Allocate RESYNC_PAGES data pages and attach them to
> >> - * the first bio.
> >> + * Allocate data folio and attach them to the first bio.
> >
> > typo error
> > s/attach them/attach it/g
> >
>
> I will fix it later. Thanks.
>
> >> @@ -2284,44 +2278,30 @@ static void process_checks(struct r1bio *r1_bio)
> >> }
> >> r1_bio->read_disk = primary;
> >> for (i = 0; i < conf->raid_disks * 2; i++) {
> >> - int j = 0;
> >> struct bio *pbio = r1_bio->bios[primary];
> >> struct bio *sbio = r1_bio->bios[i];
> >> blk_status_t status = sbio->bi_status;
> >> - struct page **ppages = get_resync_pages(pbio)->pages;
> >> - struct page **spages = get_resync_pages(sbio)->pages;
> >> - struct bio_vec *bi;
> >> - int page_len[RESYNC_PAGES] = { 0 };
> >> - struct bvec_iter_all iter_all;
> >> + struct folio *pfolio = get_resync_folio(pbio)->folio;
> >> + struct folio *sfolio = get_resync_folio(sbio)->folio;
> >>
> >> if (sbio->bi_end_io != end_sync_read)
> >> continue;
> >> /* Now we can 'fixup' the error value */
> >> sbio->bi_status = 0;
> >>
> >> - bio_for_each_segment_all(bi, sbio, iter_all)
> >> - page_len[j++] = bi->bv_len;
> >> -
> >> - if (!status) {
> >> - for (j = vcnt; j-- ; ) {
> >> - if (memcmp(page_address(ppages[j]),
> >> - page_address(spages[j]),
> >> - page_len[j]))
> >> - break;
> >> - }
> >> - } else
> >> - j = 0;
> >> - if (j >= 0)
> >> + if (status || memcmp(folio_address(pfolio),
> >> + folio_address(sfolio),
> >> + r1_bio->sectors << 9)) {
> >> atomic64_add(r1_bio->sectors, &mddev->resync_mismatches);
> >> - if (j < 0 || (test_bit(MD_RECOVERY_CHECK, &mddev->recovery)
> >> - && !status)) {
> >> - /* No need to write to this device. */
> >> - sbio->bi_end_io = NULL;
> >> - rdev_dec_pending(conf->mirrors[i].rdev, mddev);
> >> - continue;
> >> + if (!test_bit(MD_RECOVERY_CHECK, &mddev->recovery)) {
> >> + bio_copy_data(sbio, pbio);
> >> + continue;
> >> + }
> >
> > The logic is changed here. The original logic:
> > 1. read ok, no mismatch: no bio_copy_data
> > 2. read ok, mismatch, check: no bio_copy_data
> > 3. read ok, mismatch, no check: need bio_copy_data
> > 4. read fail: need bio_copy_data
> >
> > The 4 is broken.
> >
> > How about adding a temporary need_write to make logic more clear?
> >
> > something like:
> > if (!status) {
> > int ret = 0;
> > ret = memcpy(folio_address(pfolio),
> > folio_address(sfolio),
> > r1_bio->sectors << 9);
> > if (ret) {
> > atomic64_add(r1_bio->sectors, &mddev->resync_mismatches);
> > if (!test_bit(MD_RECOVERY_CHECK, &mddev->recovery))
> > need_write = true;
> > }
> > } else
> > need_write = true;
> >
> > if (need_write)
> > bio_copy_data(sbio, pbio);
> > else {
> > /* No need to write to this device. */
> > sbio->bi_end_io = NULL;
> > rdev_dec_pending(conf->mirrors[i].rdev, mddev);
> > }
> >
>
> Nice catch, read failis is indeed broken. I’ll fix it in v2.
>
> >> }
> >>
> >> - bio_copy_data(sbio, pbio);
> >> + /* No need to write to this device. */
> >> + sbio->bi_end_io = NULL;
> >> + rdev_dec_pending(conf->mirrors[i].rdev, mddev);
> >> }
> >> }
> >> >> @@ -3020,24 +2997,24 @@ static sector_t (struct mddev *mddev, sector_t
> sector_nr,
> >> }
> >>
> >> for (i = 0 ; i < conf->raid_disks * 2; i++) {
> >> - struct resync_pages *rp;
> >> + struct resync_folio *rf;
> >>
> >> bio = r1_bio->bios[i];
> >> - rp = get_resync_pages(bio);
> >> + rf = get_resync_folio(bio);
> >> if (bio->bi_end_io) {
> >> - page = resync_fetch_page(rp, page_idx);
> >> + folio = resync_fetch_folio(rf);
> >>
> >> /*
> >> * won't fail because the vec table is big
> >> * enough to hold all these pages
> >> */
> >
> > The comments above may not be needed anymore. Because there is only
> > one vec in the bio.
> >
>
> I will clean it up.
>
> >> - __bio_add_page(bio, page, len, 0);
> >> + bio_add_folio_nofail(bio, folio, len, 0);
> >> }
> >> }
> >> nr_sectors += len>>9;
> >> sector_nr += len>>9;
> >> sync_blocks -= (len>>9);
> >
> > These three lines are not needed anymore.
> >
>
> It is cleaned up in later patches. In this patch I only want minimal
> changes, just folio API and naming replacements. Do you think I should move
> those cleanups into this patch?
I think it's a better choice. But it really depends on you. I'm ok if
you prefer your patch sequence.
>
>
> >> /*
> >> - * Allocate RESYNC_PAGES data pages and attach them
> >> - * where needed.
> >> + * Allocate data folio and attach them where needed.
> >
> > typo error
> > s/attach them/attach it/g
> >
>
> I will fix it in v2.
>
> >> */
> >> for (j = 0; j < nalloc; j++) {
> >> struct bio *rbio = r10_bio->devs[j].repl_bio;
> >> - struct resync_pages *rp, *rp_repl;
> >> + struct resync_folio *rf, *rf_repl;
> >>
> >> - rp = &rps[j];
> >> + rf = &rfs[j];
> >> if (rbio)
> >> - rp_repl = &rps[nalloc + j];
> >> + rf_repl = &rfs[nalloc + j];
> >>
> >> bio = r10_bio->devs[j].bio;
> >>
> >> if (!j || test_bit(MD_RECOVERY_SYNC,
> >> &conf->mddev->recovery)) {
> >> - if (resync_alloc_pages(rp, gfp_flags))
> >> + if (resync_alloc_folio(rf, gfp_flags))
> >> goto out_free_pages;
> >
> > s/out_free_pages/out_free_folio/g
> >
>
> I will fix it in v2.
>
> >> } else {
> >> - memcpy(rp, &rps[0], sizeof(*rp));
> >> - resync_get_all_pages(rp);
> >> + memcpy(rf, &rfs[0], sizeof(*rf));
> >> + resync_get_all_folio(rf);
> >
> > Maybe the name resync_get_folio is better?
>
> Agree, I will rename it in v2.
>
> >> @@ -2492,19 +2480,21 @@ static void fix_recovery_read_error(struct r10bio *r10_bio)
> >>
> >> rdev = conf->mirrors[dr].rdev;
> >> addr = r10_bio->devs[0].addr + sect;
> >> - ok = sync_page_io(rdev,
> >> - addr,
> >> - s << 9,
> >> - pages[idx],
> >> - REQ_OP_READ, false);
> >> + ok = sync_folio_io(rdev,
> >> + addr,
> >> + s << 9,
> >> + sect << 9,
> >> + folio,
> >> + REQ_OP_READ, false);
> >
> > By the comments at the beginning of fix_recovery_read_error, it needs
> > to submit io with a page size unit, right? If so, it still needs to
> > use sync_page_io here.
> >
>
> Here 's' is PAGE_SIZE. We just use a 'page' from folio. In patch 10,
> I will change it to use logical block size instead, which should be more
> reasonable.
Ok.
>
> >> /*
> >> * Allow skipping a full rebuild for incremental assembly
> >> @@ -3277,7 +3265,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
> >> * with 2 bios in each, that correspond to the bios in the main one.
> >> * In this case, the subordinate r10bios link back through a
> >> * borrowed master_bio pointer, and the counter in the master
> >> - * includes a ref from each subordinate.
> >> + * bio_add_folio includes a ref from each subordinate.
> >
> > What's the reason change this? And I don't understand the new version.
> >
>
> It looks like a typo. I will remove it.
>
> > Best Regards
> > Xiao
>
> Thanks again for your careful review.
You're welcome.
Best Regards
Xiao
>
> --
> Thanks,
> Nan
>
Powered by blists - more mailing lists