lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 7 May 2024 11:48:44 -0400
From: "Liam R. Howlett" <Liam.Howlett@...cle.com>
To: Andrii Nakryiko <andrii.nakryiko@...il.com>
Cc: Greg KH <gregkh@...uxfoundation.org>, Andrii Nakryiko <andrii@...nel.org>,
        linux-fsdevel@...r.kernel.org, brauner@...nel.org,
        viro@...iv.linux.org.uk, akpm@...ux-foundation.org,
        linux-kernel@...r.kernel.org, bpf@...r.kernel.org, linux-mm@...ck.org,
        Suren Baghdasaryan <surenb@...gle.com>,
        Matthew Wilcox <willy@...radead.org>
Subject: Re: [PATCH 5/5] selftests/bpf: a simple benchmark tool for
 /proc/<pid>/maps APIs

. Adding Suren & Willy to the Cc

* Andrii Nakryiko <andrii.nakryiko@...il.com> [240504 18:14]:
> On Sat, May 4, 2024 at 8:32 AM Greg KH <gregkh@...uxfoundation.org> wrote:
> >
> > On Fri, May 03, 2024 at 05:30:06PM -0700, Andrii Nakryiko wrote:
> > > I also did an strace run of both cases. In text-based one the tool did
> > > 68 read() syscalls, fetching up to 4KB of data in one go.
> >
> > Why not fetch more at once?
> >
> 
> I didn't expect to be interrogated so much on the performance of the
> text parsing front, sorry. :) You can probably tune this, but where is
> the reasonable limit? 64KB? 256KB? 1MB? See below for some more
> production numbers.

The reason the file reads are limited to 4KB is because this file is
used for monitoring processes.  We have a significant number of
organisations polling this file so frequently that the mmap lock
contention becomes an issue. (reading a file is free, right?)  People
also tend to try to figure out why a process is slow by reading this
file - which amplifies the lock contention.

What happens today is that the lock is yielded after 4KB to allow time
for mmap writes to happen.  This also means your data may be
inconsistent from one 4KB block to the next (the write may be around
this boundary).

This new interface also takes the lock in do_procmap_query() and does
the 4kb blocks as well.  Extending this size means more time spent
blocking mmap writes, but a more consistent view of the world (less
"tearing" of the addresses).

We are working to reduce these issues by switching the /proc/<pid>/maps
file to use rcu lookup.  I would recommend we do not proceed with this
interface using the old method and instead, implement it using rcu from
the start - if it fits your use case (or we can make it fit your use
case).

At least, for most page faults, we can work around the lock contention
(since v6.6), but not all and not on all archs.

..

> 
> > > In comparison,
> > > ioctl-based implementation had to do only 6 ioctl() calls to fetch all
> > > relevant VMAs.
> > >
> > > It is projected that savings from processing big production applications
> > > would only widen the gap in favor of binary-based querying ioctl API, as
> > > bigger applications will tend to have even more non-executable VMA
> > > mappings relative to executable ones.
> >
> > Define "bigger applications" please.  Is this some "large database
> > company workload" type of thing, or something else?
> 
> I don't have a definition. But I had in mind, as one example, an
> ads-serving service we use internally (it's a pretty large application
> by pretty much any metric you can come up with). I just randomly
> picked one of the production hosts, found one instance of that
> service, and looked at its /proc/<pid>/maps file. Hopefully it will
> satisfy your need for specifics.
> 
> # cat /proc/1126243/maps | wc -c
> 1570178
> # cat /proc/1126243/maps | wc -l
> 28875
> # cat /proc/1126243/maps | grep ' ..x. ' | wc -l
> 7347

We have distributions increasing the map_count to an insane number to
allow games to work [1].  It is, unfortunately, only a matter of time until
this is regularly an issue as it is being normalised and allowed by an
increased number of distributions (fedora, arch, ubuntu).  So, despite
my email address, I am not talking about large database companies here.

Also, note that applications that use guard VMAs double the number for
the guards.  Fun stuff.

We are really doing a lot in the VMA area to reduce the mmap locking
contention and it seems you have a use case for a new interface that can
leverage these changes.

We have at least two talks around this area at LSF if you are attending.

Thanks,
Liam

[1] https://lore.kernel.org/linux-mm/8f6e2d69-b4df-45f3-aed4-5190966e2dea@valvesoftware.com/


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ