lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKbZUD3Gk8Qb4zznpCszXHzfAO82=rkTOb0_z6yVU0CXWAMoSA@mail.gmail.com>
Date: Fri, 7 Mar 2025 13:35:00 +0000
From: Pedro Falcato <pedro.falcato@...il.com>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Cc: Yang Shi <yang@...amperecomputing.com>, Liam.Howlett@...cle.com, vbabka@...e.cz, 
	jannh@...gle.com, oliver.sang@...el.com, akpm@...ux-foundation.org, 
	linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm: vma: skip anonymous vma when inserting vma to file
 rmap tree

On Fri, Mar 7, 2025 at 1:12 PM Lorenzo Stoakes
<lorenzo.stoakes@...cle.com> wrote:
>
> On Thu, Mar 06, 2025 at 01:49:48PM -0800, Yang Shi wrote:
> > LKP reported 800% performance improvement for small-allocs benchmark
> > from vm-scalability [1] with patch ("/dev/zero: make private mapping
> > full anonymous mapping") [2], but the patch was nack'ed since it changes
> > the output of smaps somewhat.
>
> Yeah sorry about that, but unfortunately something we really do have to
> think about (among other things, the VMA edge cases are always the source
> of weirdness...)
>
> >
> > The profiling shows one of the major sources of the performance
> > improvement is the less contention to i_mmap_rwsem.
>
> Great work tracking that down! Sorry I lost track of the other thread.
>
> >
> > The small-allocs benchmark creates a lot of 40K size memory maps by
> > mmap'ing private /dev/zero then triggers page fault on the mappings.
> > When creating private mapping for /dev/zero, the anonymous VMA is
> > created, but it has valid vm_file.  Kernel basically assumes anonymous
> > VMAs should have NULL vm_file, for example, mmap inserts VMA to the file
> > rmap tree if vm_file is not NULL.  So the private /dev/zero mapping
> > will be inserted to the file rmap tree, this resulted in the contention
> > to i_mmap_rwsem.  But it is actually anonymous VMA, so it is pointless
> > to insert it to file rmap tree.
>
> Ughhhh god haha.
>
> >
> > Skip anonymous VMA for this case.  Over 400% performance improvement was
> > reported [3].
>
> That's insane. Amazing work.
>

Ok, so the real question (to Yang) is: who are these /dev/zero users
that require an insane degree of scalability, and why didn't they
switch to regular MAP_ANONYMOUS? Are they in the room with us?

> >
> > It is not on par with the 800% improvement from the original patch.  It is
> > because page fault handler needs to access some members of struct file
> > if vm_file is not NULL, for example, f_mode and f_mapping.  They are in
> > the same cacheline with file refcount.  When mmap'ing a file the file
> > refcount is inc'ed and dec'ed, this caused bad cache false sharing
> > problem.  The further debug showed checking whether the VMA is anonymous
> > or not can alleviate the problem.  But I'm not sure whether it is the
> > best way to handle it, maybe we should consider shuffle the layout of
> > struct file.
>
> Interesting, I guess you'll take a look at this also?

... And this is probably a non-issue in 99% of !/dev/zero mmaps unless
it's something like libc.so.6 at an insane rate of execs/second.

This seems like a patch in search of a problem and I really don't see
why we should wart up the mmap code otherwise. Not that I have a huge
problem with this patch, which is somewhat simple and obvious.
It'd be great if there was a real workload driving this rather than
useless synthetic benchmarks.

-- 
Pedro

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ