linux-kernel - Re: [PATCH 06/15] md/raid1,raid10: use folio for sync path IO

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a0423bf6-c924-0012-172b-1217069ff3ec@huaweicloud.com>
Date: Thu, 22 Jan 2026 10:12:03 +0800
From: Li Nan <linan666@...weicloud.com>
To: Xiao Ni <xni@...hat.com>, linan666@...weicloud.com
Cc: song@...nel.org, yukuai@...as.com, linux-raid@...r.kernel.org,
 linux-kernel@...r.kernel.org, yangerkun@...wei.com, yi.zhang@...wei.com
Subject: Re: [PATCH 06/15] md/raid1,raid10: use folio for sync path IO



在 2026/1/20 23:53, Xiao Ni 写道:
> On Wed, Dec 17, 2025 at 8:11 PM <linan666@...weicloud.com> wrote:
>>
>> From: Li Nan <linan122@...wei.com>
>>
>> Convert all IO on the sync path to use folios. Rename page-related
>> identifiers to match folio.
>>
>> Retain some now-unnecessary while and for loops to minimize code
>> changes, clean them up in a subsequent patch.
>>
>> Signed-off-by: Li Nan <linan122@...wei.com >> -static inline int resync_alloc_pages(struct resync_pages *rp,
>> +static inline int resync_alloc_folio(struct resync_folio *rf,
>>                                       gfp_t gfp_flags)
>>   {
>> -       int i;
>> -
>> -       for (i = 0; i < RESYNC_PAGES; i++) {
>> -               rp->pages[i] = alloc_page(gfp_flags);
>> -               if (!rp->pages[i])
>> -                       goto out_free;
>> -       }
>> +       rf->folio = folio_alloc(gfp_flags, get_order(RESYNC_BLOCK_SIZE));
>> +       if (!rf->folio)
>> +               return -ENOMEM;
> 
> Is it ok to add an error log here? Compare with the multipage
> situation, the possibility of failure will be somewhat higher because
> it needs to alloc a contiguous block of physical memory.
> 

Hi, Xiao

Thanks for your review.

In patch 15 we fall back to a smaller order if sync folio alloc fails.
After that the alloc usually succeeds, so an error log seems noisy. Should
I add a log before the fallback and keep it in patch 15?

>> -static void md_bio_reset_resync_pages(struct bio *bio, struct resync_pages *rp,
>> +static void md_bio_reset_resync_folio(struct bio *bio, struct resync_folio *rf,
>>                                 int size)
>>   {
>> -       int idx = 0;
>> -
>>          /* initialize bvec table again */
>>          do {
>> -               struct page *page = resync_fetch_page(rp, idx);
>> -               int len = min_t(int, size, PAGE_SIZE);
>> +               struct folio *folio = resync_fetch_folio(rf);
>> +               int len = min_t(int, size, RESYNC_BLOCK_SIZE);
>>
>> -               if (WARN_ON(!bio_add_page(bio, page, len, 0))) {
>> +               if (WARN_ON(!bio_add_folio(bio, folio, len, 0))) {
> 
> Is it ok to use bio_add_folio(bio, folio, RESYNC_BLOCK_SIZE, 0)
> directly here? It removes `size -= len` below, so it's not useless to
> compare size and RESYNC_BLOCK_SIZE above?
> 

Same as the previous one, the size is no longer a fixed value after
patch 15. I think keeping it here gives better compatibility.

>>          /*
>> -        * Allocate RESYNC_PAGES data pages and attach them to
>> -        * the first bio.
>> +        * Allocate data folio and attach them to the first bio.
> 
> typo error
> s/attach them/attach it/g
> 

I will fix it later. Thanks.

>> @@ -2284,44 +2278,30 @@ static void process_checks(struct r1bio *r1_bio)
>>                  }
>>          r1_bio->read_disk = primary;
>>          for (i = 0; i < conf->raid_disks * 2; i++) {
>> -               int j = 0;
>>                  struct bio *pbio = r1_bio->bios[primary];
>>                  struct bio *sbio = r1_bio->bios[i];
>>                  blk_status_t status = sbio->bi_status;
>> -               struct page **ppages = get_resync_pages(pbio)->pages;
>> -               struct page **spages = get_resync_pages(sbio)->pages;
>> -               struct bio_vec *bi;
>> -               int page_len[RESYNC_PAGES] = { 0 };
>> -               struct bvec_iter_all iter_all;
>> +               struct folio *pfolio = get_resync_folio(pbio)->folio;
>> +               struct folio *sfolio = get_resync_folio(sbio)->folio;
>>
>>                  if (sbio->bi_end_io != end_sync_read)
>>                          continue;
>>                  /* Now we can 'fixup' the error value */
>>                  sbio->bi_status = 0;
>>
>> -               bio_for_each_segment_all(bi, sbio, iter_all)
>> -                       page_len[j++] = bi->bv_len;
>> -
>> -               if (!status) {
>> -                       for (j = vcnt; j-- ; ) {
>> -                               if (memcmp(page_address(ppages[j]),
>> -                                          page_address(spages[j]),
>> -                                          page_len[j]))
>> -                                       break;
>> -                       }
>> -               } else
>> -                       j = 0;
>> -               if (j >= 0)
>> +               if (status || memcmp(folio_address(pfolio),
>> +                                    folio_address(sfolio),
>> +                                    r1_bio->sectors << 9)) {
>>                          atomic64_add(r1_bio->sectors, &mddev->resync_mismatches);
>> -               if (j < 0 || (test_bit(MD_RECOVERY_CHECK, &mddev->recovery)
>> -                             && !status)) {
>> -                       /* No need to write to this device. */
>> -                       sbio->bi_end_io = NULL;
>> -                       rdev_dec_pending(conf->mirrors[i].rdev, mddev);
>> -                       continue;
>> +                       if (!test_bit(MD_RECOVERY_CHECK, &mddev->recovery)) {
>> +                               bio_copy_data(sbio, pbio);
>> +                               continue;
>> +                       }
> 
> The logic is changed here. The original logic:
> 1. read ok, no mismatch: no bio_copy_data
> 2. read ok, mismatch, check: no bio_copy_data
> 3. read ok, mismatch, no check: need bio_copy_data
> 4. read fail: need bio_copy_data
> 
> The 4 is broken.
> 
> How about adding a temporary need_write to make logic more clear?
> 
> something like:
>          if (!status) {
>              int ret = 0;
>              ret = memcpy(folio_address(pfolio),
>                               folio_address(sfolio),
>                               r1_bio->sectors << 9);
>              if (ret) {
>                  atomic64_add(r1_bio->sectors, &mddev->resync_mismatches);
>                  if (!test_bit(MD_RECOVERY_CHECK, &mddev->recovery))
>                      need_write = true;
>              }
>          } else
>              need_write = true;
> 
>          if (need_write)
>              bio_copy_data(sbio, pbio);
>          else {
>              /* No need to write to this device. */
>              sbio->bi_end_io = NULL;
>              rdev_dec_pending(conf->mirrors[i].rdev, mddev);
>          }
> 

Nice catch, read failis is indeed broken. I’ll fix it in v2.

>>                  }
>>
>> -               bio_copy_data(sbio, pbio);
>> +               /* No need to write to this device. */
>> +               sbio->bi_end_io = NULL;
>> +               rdev_dec_pending(conf->mirrors[i].rdev, mddev);
>>          }
>>   }
>> >> @@ -3020,24 +2997,24 @@ static sector_t 	(struct mddev *mddev, sector_t 
sector_nr,
>>                  }
>>
>>                  for (i = 0 ; i < conf->raid_disks * 2; i++) {
>> -                       struct resync_pages *rp;
>> +                       struct resync_folio *rf;
>>
>>                          bio = r1_bio->bios[i];
>> -                       rp = get_resync_pages(bio);
>> +                       rf = get_resync_folio(bio);
>>                          if (bio->bi_end_io) {
>> -                               page = resync_fetch_page(rp, page_idx);
>> +                               folio = resync_fetch_folio(rf);
>>
>>                                  /*
>>                                   * won't fail because the vec table is big
>>                                   * enough to hold all these pages
>>                                   */
> 
> The comments above may not be needed anymore. Because there is only
> one vec in the bio.
> 

I will clean it up.

>> -                               __bio_add_page(bio, page, len, 0);
>> +                               bio_add_folio_nofail(bio, folio, len, 0);
>>                          }
>>                  }
>>                  nr_sectors += len>>9;
>>                  sector_nr += len>>9;
>>                  sync_blocks -= (len>>9);
> 
> These three lines are not needed anymore.
> 

It is cleaned up in later patches. In this patch I only want minimal
changes, just folio API and naming replacements. Do you think I should move
those cleanups into this patch?


>>          /*
>> -        * Allocate RESYNC_PAGES data pages and attach them
>> -        * where needed.
>> +        * Allocate data folio and attach them where needed.
> 
> typo error
> s/attach them/attach it/g
> 

I will fix it in v2.

>>           */
>>          for (j = 0; j < nalloc; j++) {
>>                  struct bio *rbio = r10_bio->devs[j].repl_bio;
>> -               struct resync_pages *rp, *rp_repl;
>> +               struct resync_folio *rf, *rf_repl;
>>
>> -               rp = &rps[j];
>> +               rf = &rfs[j];
>>                  if (rbio)
>> -                       rp_repl = &rps[nalloc + j];
>> +                       rf_repl = &rfs[nalloc + j];
>>
>>                  bio = r10_bio->devs[j].bio;
>>
>>                  if (!j || test_bit(MD_RECOVERY_SYNC,
>>                                     &conf->mddev->recovery)) {
>> -                       if (resync_alloc_pages(rp, gfp_flags))
>> +                       if (resync_alloc_folio(rf, gfp_flags))
>>                                  goto out_free_pages;
> 
> s/out_free_pages/out_free_folio/g
> 

I will fix it in v2.

>>                  } else {
>> -                       memcpy(rp, &rps[0], sizeof(*rp));
>> -                       resync_get_all_pages(rp);
>> +                       memcpy(rf, &rfs[0], sizeof(*rf));
>> +                       resync_get_all_folio(rf);
> 
> Maybe the name resync_get_folio is better?

Agree, I will rename it in v2.

>> @@ -2492,19 +2480,21 @@ static void fix_recovery_read_error(struct r10bio *r10_bio)
>>
>>                  rdev = conf->mirrors[dr].rdev;
>>                  addr = r10_bio->devs[0].addr + sect;
>> -               ok = sync_page_io(rdev,
>> -                                 addr,
>> -                                 s << 9,
>> -                                 pages[idx],
>> -                                 REQ_OP_READ, false);
>> +               ok = sync_folio_io(rdev,
>> +                                  addr,
>> +                                  s << 9,
>> +                                  sect << 9,
>> +                                  folio,
>> +                                  REQ_OP_READ, false);
> 
> By the comments at the beginning of fix_recovery_read_error, it needs
> to submit io with a page size unit, right? If so, it still needs to
> use sync_page_io here.
> 

Here 's' is PAGE_SIZE. We just use a 'page' from folio. In patch 10,
I will change it to use logical block size instead, which should be more
reasonable.

>>          /*
>>           * Allow skipping a full rebuild for incremental assembly
>> @@ -3277,7 +3265,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>>           * with 2 bios in each, that correspond to the bios in the main one.
>>           * In this case, the subordinate r10bios link back through a
>>           * borrowed master_bio pointer, and the counter in the master
>> -        * includes a ref from each subordinate.
>> +        * bio_add_folio includes a ref from each subordinate.
> 
> What's the reason change this? And I don't understand the new version.
> 

It looks like a typo. I will remove it.

> Best Regards
> Xiao

Thanks again for your careful review.

-- 
Thanks,
Nan