linux-kernel - Re: [PATCH 06/15] md/raid1,raid10: use folio for sync path IO

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CALTww2-kpMGb8cYTtFcg5=5Y+8W_eBSHv=SdEExOivTK7OisLQ@mail.gmail.com>
Date: Thu, 22 Jan 2026 15:01:02 +0800
From: Xiao Ni <xni@...hat.com>
To: Li Nan <linan666@...weicloud.com>
Cc: song@...nel.org, yukuai@...as.com, linux-raid@...r.kernel.org, 
	linux-kernel@...r.kernel.org, yangerkun@...wei.com, yi.zhang@...wei.com
Subject: Re: [PATCH 06/15] md/raid1,raid10: use folio for sync path IO

On Thu, Jan 22, 2026 at 10:12 AM Li Nan <linan666@...weicloud.com> wrote:
>
>
>
> 在 2026/1/20 23:53, Xiao Ni 写道:
> > On Wed, Dec 17, 2025 at 8:11 PM <linan666@...weicloud.com> wrote:
> >>
> >> From: Li Nan <linan122@...wei.com>
> >>
> >> Convert all IO on the sync path to use folios. Rename page-related
> >> identifiers to match folio.
> >>
> >> Retain some now-unnecessary while and for loops to minimize code
> >> changes, clean them up in a subsequent patch.
> >>
> >> Signed-off-by: Li Nan <linan122@...wei.com >> -static inline int resync_alloc_pages(struct resync_pages *rp,
> >> +static inline int resync_alloc_folio(struct resync_folio *rf,
> >>                                       gfp_t gfp_flags)
> >>   {
> >> -       int i;
> >> -
> >> -       for (i = 0; i < RESYNC_PAGES; i++) {
> >> -               rp->pages[i] = alloc_page(gfp_flags);
> >> -               if (!rp->pages[i])
> >> -                       goto out_free;
> >> -       }
> >> +       rf->folio = folio_alloc(gfp_flags, get_order(RESYNC_BLOCK_SIZE));
> >> +       if (!rf->folio)
> >> +               return -ENOMEM;
> >
> > Is it ok to add an error log here? Compare with the multipage
> > situation, the possibility of failure will be somewhat higher because
> > it needs to alloc a contiguous block of physical memory.
> >
>
> Hi, Xiao
>
> Thanks for your review.
>
> In patch 15 we fall back to a smaller order if sync folio alloc fails.
> After that the alloc usually succeeds, so an error log seems noisy. Should
> I add a log before the fallback and keep it in patch 15?

Thanks for the explanation. No change is needed here :)

>
> >> -static void md_bio_reset_resync_pages(struct bio *bio, struct resync_pages *rp,
> >> +static void md_bio_reset_resync_folio(struct bio *bio, struct resync_folio *rf,
> >>                                 int size)
> >>   {
> >> -       int idx = 0;
> >> -
> >>          /* initialize bvec table again */
> >>          do {
> >> -               struct page *page = resync_fetch_page(rp, idx);
> >> -               int len = min_t(int, size, PAGE_SIZE);
> >> +               struct folio *folio = resync_fetch_folio(rf);
> >> +               int len = min_t(int, size, RESYNC_BLOCK_SIZE);
> >>
> >> -               if (WARN_ON(!bio_add_page(bio, page, len, 0))) {
> >> +               if (WARN_ON(!bio_add_folio(bio, folio, len, 0))) {
> >
> > Is it ok to use bio_add_folio(bio, folio, RESYNC_BLOCK_SIZE, 0)
> > directly here? It removes `size -= len` below, so it's not useless to
> > compare size and RESYNC_BLOCK_SIZE above?
> >
>
> Same as the previous one, the size is no longer a fixed value after
> patch 15. I think keeping it here gives better compatibility.

Thanks for the explanation.

>
> >>          /*
> >> -        * Allocate RESYNC_PAGES data pages and attach them to
> >> -        * the first bio.
> >> +        * Allocate data folio and attach them to the first bio.
> >
> > typo error
> > s/attach them/attach it/g
> >
>
> I will fix it later. Thanks.
>
> >> @@ -2284,44 +2278,30 @@ static void process_checks(struct r1bio *r1_bio)
> >>                  }
> >>          r1_bio->read_disk = primary;
> >>          for (i = 0; i < conf->raid_disks * 2; i++) {
> >> -               int j = 0;
> >>                  struct bio *pbio = r1_bio->bios[primary];
> >>                  struct bio *sbio = r1_bio->bios[i];
> >>                  blk_status_t status = sbio->bi_status;
> >> -               struct page **ppages = get_resync_pages(pbio)->pages;
> >> -               struct page **spages = get_resync_pages(sbio)->pages;
> >> -               struct bio_vec *bi;
> >> -               int page_len[RESYNC_PAGES] = { 0 };
> >> -               struct bvec_iter_all iter_all;
> >> +               struct folio *pfolio = get_resync_folio(pbio)->folio;
> >> +               struct folio *sfolio = get_resync_folio(sbio)->folio;
> >>
> >>                  if (sbio->bi_end_io != end_sync_read)
> >>                          continue;
> >>                  /* Now we can 'fixup' the error value */
> >>                  sbio->bi_status = 0;
> >>
> >> -               bio_for_each_segment_all(bi, sbio, iter_all)
> >> -                       page_len[j++] = bi->bv_len;
> >> -
> >> -               if (!status) {
> >> -                       for (j = vcnt; j-- ; ) {
> >> -                               if (memcmp(page_address(ppages[j]),
> >> -                                          page_address(spages[j]),
> >> -                                          page_len[j]))
> >> -                                       break;
> >> -                       }
> >> -               } else
> >> -                       j = 0;
> >> -               if (j >= 0)
> >> +               if (status || memcmp(folio_address(pfolio),
> >> +                                    folio_address(sfolio),
> >> +                                    r1_bio->sectors << 9)) {
> >>                          atomic64_add(r1_bio->sectors, &mddev->resync_mismatches);
> >> -               if (j < 0 || (test_bit(MD_RECOVERY_CHECK, &mddev->recovery)
> >> -                             && !status)) {
> >> -                       /* No need to write to this device. */
> >> -                       sbio->bi_end_io = NULL;
> >> -                       rdev_dec_pending(conf->mirrors[i].rdev, mddev);
> >> -                       continue;
> >> +                       if (!test_bit(MD_RECOVERY_CHECK, &mddev->recovery)) {
> >> +                               bio_copy_data(sbio, pbio);
> >> +                               continue;
> >> +                       }
> >
> > The logic is changed here. The original logic:
> > 1. read ok, no mismatch: no bio_copy_data
> > 2. read ok, mismatch, check: no bio_copy_data
> > 3. read ok, mismatch, no check: need bio_copy_data
> > 4. read fail: need bio_copy_data
> >
> > The 4 is broken.
> >
> > How about adding a temporary need_write to make logic more clear?
> >
> > something like:
> >          if (!status) {
> >              int ret = 0;
> >              ret = memcpy(folio_address(pfolio),
> >                               folio_address(sfolio),
> >                               r1_bio->sectors << 9);
> >              if (ret) {
> >                  atomic64_add(r1_bio->sectors, &mddev->resync_mismatches);
> >                  if (!test_bit(MD_RECOVERY_CHECK, &mddev->recovery))
> >                      need_write = true;
> >              }
> >          } else
> >              need_write = true;
> >
> >          if (need_write)
> >              bio_copy_data(sbio, pbio);
> >          else {
> >              /* No need to write to this device. */
> >              sbio->bi_end_io = NULL;
> >              rdev_dec_pending(conf->mirrors[i].rdev, mddev);
> >          }
> >
>
> Nice catch, read failis is indeed broken. I’ll fix it in v2.
>
> >>                  }
> >>
> >> -               bio_copy_data(sbio, pbio);
> >> +               /* No need to write to this device. */
> >> +               sbio->bi_end_io = NULL;
> >> +               rdev_dec_pending(conf->mirrors[i].rdev, mddev);
> >>          }
> >>   }
> >> >> @@ -3020,24 +2997,24 @@ static sector_t   (struct mddev *mddev, sector_t
> sector_nr,
> >>                  }
> >>
> >>                  for (i = 0 ; i < conf->raid_disks * 2; i++) {
> >> -                       struct resync_pages *rp;
> >> +                       struct resync_folio *rf;
> >>
> >>                          bio = r1_bio->bios[i];
> >> -                       rp = get_resync_pages(bio);
> >> +                       rf = get_resync_folio(bio);
> >>                          if (bio->bi_end_io) {
> >> -                               page = resync_fetch_page(rp, page_idx);
> >> +                               folio = resync_fetch_folio(rf);
> >>
> >>                                  /*
> >>                                   * won't fail because the vec table is big
> >>                                   * enough to hold all these pages
> >>                                   */
> >
> > The comments above may not be needed anymore. Because there is only
> > one vec in the bio.
> >
>
> I will clean it up.
>
> >> -                               __bio_add_page(bio, page, len, 0);
> >> +                               bio_add_folio_nofail(bio, folio, len, 0);
> >>                          }
> >>                  }
> >>                  nr_sectors += len>>9;
> >>                  sector_nr += len>>9;
> >>                  sync_blocks -= (len>>9);
> >
> > These three lines are not needed anymore.
> >
>
> It is cleaned up in later patches. In this patch I only want minimal
> changes, just folio API and naming replacements. Do you think I should move
> those cleanups into this patch?

I think it's a better choice. But it really depends on you. I'm ok if
you prefer your patch sequence.

>
>
> >>          /*
> >> -        * Allocate RESYNC_PAGES data pages and attach them
> >> -        * where needed.
> >> +        * Allocate data folio and attach them where needed.
> >
> > typo error
> > s/attach them/attach it/g
> >
>
> I will fix it in v2.
>
> >>           */
> >>          for (j = 0; j < nalloc; j++) {
> >>                  struct bio *rbio = r10_bio->devs[j].repl_bio;
> >> -               struct resync_pages *rp, *rp_repl;
> >> +               struct resync_folio *rf, *rf_repl;
> >>
> >> -               rp = &rps[j];
> >> +               rf = &rfs[j];
> >>                  if (rbio)
> >> -                       rp_repl = &rps[nalloc + j];
> >> +                       rf_repl = &rfs[nalloc + j];
> >>
> >>                  bio = r10_bio->devs[j].bio;
> >>
> >>                  if (!j || test_bit(MD_RECOVERY_SYNC,
> >>                                     &conf->mddev->recovery)) {
> >> -                       if (resync_alloc_pages(rp, gfp_flags))
> >> +                       if (resync_alloc_folio(rf, gfp_flags))
> >>                                  goto out_free_pages;
> >
> > s/out_free_pages/out_free_folio/g
> >
>
> I will fix it in v2.
>
> >>                  } else {
> >> -                       memcpy(rp, &rps[0], sizeof(*rp));
> >> -                       resync_get_all_pages(rp);
> >> +                       memcpy(rf, &rfs[0], sizeof(*rf));
> >> +                       resync_get_all_folio(rf);
> >
> > Maybe the name resync_get_folio is better?
>
> Agree, I will rename it in v2.
>
> >> @@ -2492,19 +2480,21 @@ static void fix_recovery_read_error(struct r10bio *r10_bio)
> >>
> >>                  rdev = conf->mirrors[dr].rdev;
> >>                  addr = r10_bio->devs[0].addr + sect;
> >> -               ok = sync_page_io(rdev,
> >> -                                 addr,
> >> -                                 s << 9,
> >> -                                 pages[idx],
> >> -                                 REQ_OP_READ, false);
> >> +               ok = sync_folio_io(rdev,
> >> +                                  addr,
> >> +                                  s << 9,
> >> +                                  sect << 9,
> >> +                                  folio,
> >> +                                  REQ_OP_READ, false);
> >
> > By the comments at the beginning of fix_recovery_read_error, it needs
> > to submit io with a page size unit, right? If so, it still needs to
> > use sync_page_io here.
> >
>
> Here 's' is PAGE_SIZE. We just use a 'page' from folio. In patch 10,
> I will change it to use logical block size instead, which should be more
> reasonable.

Ok.

>
> >>          /*
> >>           * Allow skipping a full rebuild for incremental assembly
> >> @@ -3277,7 +3265,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
> >>           * with 2 bios in each, that correspond to the bios in the main one.
> >>           * In this case, the subordinate r10bios link back through a
> >>           * borrowed master_bio pointer, and the counter in the master
> >> -        * includes a ref from each subordinate.
> >> +        * bio_add_folio includes a ref from each subordinate.
> >
> > What's the reason change this? And I don't understand the new version.
> >
>
> It looks like a typo. I will remove it.
>
> > Best Regards
> > Xiao
>
> Thanks again for your careful review.

You're welcome.

Best Regards
Xiao
>
> --
> Thanks,
> Nan
>