linux-kernel - Re: [PATCH v2] mm: fault in complete folios instead of individual pages for tmpfs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <d0b24095-d7fc-459d-85ed-9c0797e9fca3@lucifer.local>
Date: Wed, 9 Jul 2025 14:13:37 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Baolin Wang <baolin.wang@...ux.alibaba.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, hughd@...gle.com,
        david@...hat.com, ziy@...dia.com, Liam.Howlett@...cle.com,
        npache@...hat.com, ryan.roberts@....com, dev.jain@....com,
        baohua@...nel.org, vbabka@...e.cz, rppt@...nel.org, surenb@...gle.com,
        mhocko@...e.com, linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] mm: fault in complete folios instead of individual
 pages for tmpfs

On Tue, Jul 08, 2025 at 03:53:56PM +0800, Baolin Wang wrote:
>
>
> On 2025/7/7 21:33, Lorenzo Stoakes wrote:
> > On Sun, Jul 06, 2025 at 10:02:35AM +0800, Baolin Wang wrote:
> > >
> > >
> > > On 2025/7/5 06:18, Andrew Morton wrote:
> > > > On Fri,  4 Jul 2025 11:19:26 +0800 Baolin Wang <baolin.wang@...ux.alibaba.com> wrote:
> > > >
> > > > > After commit acd7ccb284b8 ("mm: shmem: add large folio support for tmpfs"),
> > > > > tmpfs can also support large folio allocation (not just PMD-sized large
> > > > > folios).
> > > > >
> > > > > However, when accessing tmpfs via mmap(), although tmpfs supports large folios,
> > > > > we still establish mappings at the base page granularity, which is unreasonable.
> > > > >
> > > > > We can map multiple consecutive pages of a tmpfs folios at once according to
> > > > > the size of the large folio. On one hand, this can reduce the overhead of page
> > > > > faults; on the other hand, it can leverage hardware architecture optimizations
> > > > > to reduce TLB misses, such as contiguous PTEs on the ARM architecture.
> > > > >
> > > > > Moreover, tmpfs mount will use the 'huge=' option to control large folio
> > > > > allocation explicitly. So it can be understood that the process's RSS statistics
> > > > > might increase, and I think this will not cause any obvious effects for users.
> > > > >
> > > > > Performance test:
> > > > > I created a 1G tmpfs file, populated with 64K large folios, and write-accessed it
> > > > > sequentially via mmap(). I observed a significant performance improvement:
> > > >
> > > > That doesn't sound like a crazy thing to do.
> > > >
> > > > > Before the patch:
> > > > > real	0m0.158s
> > > > > user	0m0.008s
> > > > > sys	0m0.150s
> > > > >
> > > > > After the patch:
> > > > > real	0m0.021s
> > > > > user	0m0.004s
> > > > > sys	0m0.017s
> > > >
> > > > And look at that.
> > > >
> > > > > diff --git a/mm/memory.c b/mm/memory.c
> > > > > index 0f9b32a20e5b..9944380e947d 100644
> > > > > --- a/mm/memory.c
> > > > > +++ b/mm/memory.c
> > > > > @@ -5383,10 +5383,10 @@ vm_fault_t finish_fault(struct vm_fault *vmf)
> > > > >    	/*
> > > > >    	 * Using per-page fault to maintain the uffd semantics, and same
> > > > > -	 * approach also applies to non-anonymous-shmem faults to avoid
> > > > > +	 * approach also applies to non shmem/tmpfs faults to avoid
> > > > >    	 * inflating the RSS of the process.
> > > > >    	 */
> > > > > -	if (!vma_is_anon_shmem(vma) || unlikely(userfaultfd_armed(vma)) ||
> > > > > +	if (!vma_is_shmem(vma) || unlikely(userfaultfd_armed(vma)) ||
> > > > >    	    unlikely(needs_fallback)) {
> > > > >    		nr_pages = 1;
> > > > >    	} else if (nr_pages > 1) {
> > > >
> > > > and that's it?
> > > >
> > > > I'm itching to get this into -stable, really.  What LTS user wouldn't
> > > > want this?
> > >
> > > This is an improvement rather than a bugfix, so I don't think it needs to go
> > > into LTS.
> > >
> > > Could it be viewed as correcting an oversight in
> > > > acd7ccb284b8?
> > >
> > > Yes, I should have added this optimization in the series of the commit
> > > acd7ccb284b8. But obviously, I missed this :(.
> >
> > Buuut if this was an oversight for that patch that causes an unnecessary
> > perf degradation, surely this should have fixes tag + cc stable no?
>
> IMO, this commit acd7ccb284b8 won't cause perf degradation, instead it is
> used to introduce a new feature, while the current patch is a further
> reasonable optimization. As I mentioned, this is an improvement, not a
> bugfix or a patch to address performance regression.

4Well :) you say yourself it was an oversight, and it very clearly has a perf
_impact_, which if you compare backwards to acd7ccb284b8 is a degradation, but I
get your point.

However, since you say 'oversight' this seems to me that you really meant to
have included it but hadn't noticed, and additionally, since it just seems to be
an unequivical good - let's maybe flip this round - why NOT backport it to
stable?

Cheers, Lorenzo