linux-kernel - Re: [PATCH 6/6] shmem: add large folios support to the write path

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230918075758.vlufrhq22es2dhuu@sarkhan>
Date:   Mon, 18 Sep 2023 08:00:12 +0000
From:   Daniel Gomez <da.gomez@...sung.com>
To:     Yosry Ahmed <yosryahmed@...gle.com>
CC:     "minchan@...nel.org" <minchan@...nel.org>,
        "senozhatsky@...omium.org" <senozhatsky@...omium.org>,
        "axboe@...nel.dk" <axboe@...nel.dk>,
        "djwong@...nel.org" <djwong@...nel.org>,
        "willy@...radead.org" <willy@...radead.org>,
        "hughd@...gle.com" <hughd@...gle.com>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        "mcgrof@...nel.org" <mcgrof@...nel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-block@...r.kernel.org" <linux-block@...r.kernel.org>,
        "linux-xfs@...r.kernel.org" <linux-xfs@...r.kernel.org>,
        "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "gost.dev@...sung.com" <gost.dev@...sung.com>,
        Pankaj Raghav <p.raghav@...sung.com>
Subject: Re: [PATCH 6/6] shmem: add large folios support to the write path

On Fri, Sep 15, 2023 at 11:26:37AM -0700, Yosry Ahmed wrote:
> On Fri, Sep 15, 2023 at 2:51 AM Daniel Gomez <da.gomez@...sung.com> wrote:
> >
> > Add large folio support for shmem write path matching the same high
> > order preference mechanism used for iomap buffered IO path as used in
> > __filemap_get_folio().
> >
> > Use the __folio_get_max_order to get a hint for the order of the folio
> > based on file size which takes care of the mapping requirements.
> >
> > Swap does not support high order folios for now, so make it order 0 in
> > case swap is enabled.
>
> I didn't take a close look at the series, but I am not sure I
> understand the rationale here. Reclaim will split high order shmem
> folios anyway, right?

For context, this is part of the enablement of large block sizes (LBS)
effort [1][2][3], so the assumption here is that the kernel will
reclaim memory with the same (large) block sizes that were written to
the device.

I'll add more context in the V2.

[1] https://kernelnewbies.org/KernelProjects/large-block-size
[2] https://docs.google.com/spreadsheets/d/e/2PACX-1vS7sQfw90S00l2rfOKm83Jlg0px8KxMQE4HHp_DKRGbAGcAV-xu6LITHBEc4xzVh9wLH6WM2lR0cZS8/pubhtml#
[3] https://lore.kernel.org/all/ZQfbHloBUpDh+zCg@dread.disaster.area/
>
> It seems like we only enable high order folios if the "noswap" mount
> option is used, which is fairly recent. I doubt it is widely used.

For now, I skipped the swap path as it currently lacks support for
high order folios. But I'm currently looking into it as part of the LBS
effort (please check spreadsheet at [2] for that).
>
> >
> > Signed-off-by: Daniel Gomez <da.gomez@...sung.com>
> > ---
> >  mm/shmem.c | 16 +++++++++++++---
> >  1 file changed, 13 insertions(+), 3 deletions(-)
> >
> > diff --git a/mm/shmem.c b/mm/shmem.c
> > index adff74751065..26ca555b1669 100644
> > --- a/mm/shmem.c
> > +++ b/mm/shmem.c
> > @@ -1683,13 +1683,19 @@ static struct folio *shmem_alloc_folio(gfp_t gfp,
> >  }
> >
> >  static struct folio *shmem_alloc_and_acct_folio(gfp_t gfp, struct inode *inode,
> > -               pgoff_t index, bool huge, unsigned int *order)
> > +               pgoff_t index, bool huge, unsigned int *order,
> > +               struct shmem_sb_info *sbinfo)
> >  {
> >         struct shmem_inode_info *info = SHMEM_I(inode);
> >         struct folio *folio;
> >         int nr;
> >         int err;
> >
> > +       if (!sbinfo->noswap)
> > +               *order = 0;
> > +       else
> > +               *order = (*order == 1) ? 0 : *order;
> > +
> >         if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
> >                 huge = false;
> >         nr = huge ? HPAGE_PMD_NR : 1U << *order;
> > @@ -2032,6 +2038,8 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
> >                 return 0;
> >         }
> >
> > +       order = mapping_size_order(inode->i_mapping, index, len);
> > +
> >         if (!shmem_is_huge(inode, index, false,
> >                            vma ? vma->vm_mm : NULL, vma ? vma->vm_flags : 0))
> >                 goto alloc_nohuge;
> > @@ -2039,11 +2047,11 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
> >         huge_gfp = vma_thp_gfp_mask(vma);
> >         huge_gfp = limit_gfp_mask(huge_gfp, gfp);
> >         folio = shmem_alloc_and_acct_folio(huge_gfp, inode, index, true,
> > -                                          &order);
> > +                                          &order, sbinfo);
> >         if (IS_ERR(folio)) {
> >  alloc_nohuge:
> >                 folio = shmem_alloc_and_acct_folio(gfp, inode, index, false,
> > -                                                  &order);
> > +                                                  &order, sbinfo);
> >         }
> >         if (IS_ERR(folio)) {
> >                 int retry = 5;
> > @@ -2147,6 +2155,8 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
> >         if (folio_test_large(folio)) {
> >                 folio_unlock(folio);
> >                 folio_put(folio);
> > +               if (order > 0)
> > +                       order--;
> >                 goto alloc_nohuge;
> >         }
> >  unlock:
> > --
> > 2.39.2
> >