[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJuCfpEV9ZkF9LeRXUvBP0MVrgg9BQRxN3KiC0QDsY++KzUrOg@mail.gmail.com>
Date: Tue, 12 Oct 2021 13:59:40 -0700
From: Suren Baghdasaryan <surenb@...gle.com>
To: Johannes Weiner <hannes@...xchg.org>
Cc: Michal Hocko <mhocko@...e.com>, Kees Cook <keescook@...omium.org>,
Pavel Machek <pavel@....cz>,
Rasmus Villemoes <linux@...musvillemoes.dk>,
David Hildenbrand <david@...hat.com>,
John Hubbard <jhubbard@...dia.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Colin Cross <ccross@...gle.com>,
Sumit Semwal <sumit.semwal@...aro.org>,
Dave Hansen <dave.hansen@...el.com>,
Matthew Wilcox <willy@...radead.org>,
"Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
Vlastimil Babka <vbabka@...e.cz>,
Jonathan Corbet <corbet@....net>,
Al Viro <viro@...iv.linux.org.uk>,
Randy Dunlap <rdunlap@...radead.org>,
Kalesh Singh <kaleshsingh@...gle.com>,
Peter Xu <peterx@...hat.com>, rppt@...nel.org,
Peter Zijlstra <peterz@...radead.org>,
Catalin Marinas <catalin.marinas@....com>,
vincenzo.frascino@....com,
Chinwen Chang (張錦文)
<chinwen.chang@...iatek.com>,
Axel Rasmussen <axelrasmussen@...gle.com>,
Andrea Arcangeli <aarcange@...hat.com>,
Jann Horn <jannh@...gle.com>, apopple@...dia.com,
Yu Zhao <yuzhao@...gle.com>, Will Deacon <will@...nel.org>,
fenghua.yu@...el.com, thunder.leizhen@...wei.com,
Hugh Dickins <hughd@...gle.com>, feng.tang@...el.com,
Jason Gunthorpe <jgg@...pe.ca>, Roman Gushchin <guro@...com>,
Thomas Gleixner <tglx@...utronix.de>, krisman@...labora.com,
Chris Hyser <chris.hyser@...cle.com>,
Peter Collingbourne <pcc@...gle.com>,
"Eric W. Biederman" <ebiederm@...ssion.com>,
Jens Axboe <axboe@...nel.dk>, legion@...nel.org,
Rolf Eike Beer <eb@...ix.com>,
Cyrill Gorcunov <gorcunov@...il.com>,
Muchun Song <songmuchun@...edance.com>,
Viresh Kumar <viresh.kumar@...aro.org>,
Thomas Cedeno <thomascedeno@...gle.com>, sashal@...nel.org,
cxfcosmos@...il.com, LKML <linux-kernel@...r.kernel.org>,
linux-fsdevel@...r.kernel.org, linux-doc@...r.kernel.org,
linux-mm <linux-mm@...ck.org>,
kernel-team <kernel-team@...roid.com>,
Tim Murray <timmurray@...gle.com>
Subject: Re: [PATCH v10 3/3] mm: add anonymous vma name refcounting
On Tue, Oct 12, 2021 at 1:41 PM Johannes Weiner <hannes@...xchg.org> wrote:
>
> On Tue, Oct 12, 2021 at 11:52:42AM -0700, Suren Baghdasaryan wrote:
> > On Tue, Oct 12, 2021 at 11:26 AM Johannes Weiner <hannes@...xchg.org> wrote:
> > >
> > > On Mon, Oct 11, 2021 at 10:36:24PM -0700, Suren Baghdasaryan wrote:
> > > > On Mon, Oct 11, 2021 at 8:00 PM Johannes Weiner <hannes@...xchg.org> wrote:
> > > > >
> > > > > On Mon, Oct 11, 2021 at 06:20:25PM -0700, Suren Baghdasaryan wrote:
> > > > > > On Mon, Oct 11, 2021 at 6:18 PM Suren Baghdasaryan <surenb@...gle.com> wrote:
> > > > > > >
> > > > > > > On Mon, Oct 11, 2021 at 1:36 AM Michal Hocko <mhocko@...e.com> wrote:
> > > > > > > >
> > > > > > > > On Fri 08-10-21 13:58:01, Kees Cook wrote:
> > > > > > > > > - Strings for "anon" specifically have no required format (this is good)
> > > > > > > > > it's informational like the task_struct::comm and can (roughly)
> > > > > > > > > anything. There's no naming convention for memfds, AF_UNIX, etc. Why
> > > > > > > > > is one needed here? That seems like a completely unreasonable
> > > > > > > > > requirement.
> > > > > > > >
> > > > > > > > I might be misreading the justification for the feature. Patch 2 is
> > > > > > > > talking about tools that need to understand memeory usage to make
> > > > > > > > further actions. Also Suren was suggesting "numbering convetion" as an
> > > > > > > > argument against.
> > > > > > > >
> > > > > > > > So can we get a clear example how is this being used actually? If this
> > > > > > > > is just to be used to debug by humans than I can see an argument for
> > > > > > > > human readable form. If this is, however, meant to be used by tools to
> > > > > > > > make some actions then the argument for strings is much weaker.
> > > > > > >
> > > > > > > The simplest usecase is when we notice that a process consumes more
> > > > > > > memory than usual and we do "cat /proc/$(pidof my_process)/maps" to
> > > > > > > check which area is contributing to this growth. The names we assign
> > > > > > > to anonymous areas are descriptive enough for a developer to get an
> > > > > > > idea where the increased consumption is coming from and how to proceed
> > > > > > > with their investigation.
> > > > > > > There are of course cases when tools are involved, but the end-user is
> > > > > > > always a human and the final report should contain easily
> > > > > > > understandable data.
> > > > > > >
> > > > > > > IIUC, the main argument here is whether the userspace can provide
> > > > > > > tools to perform the translations between ids and names, with the
> > > > > > > kernel accepting and reporting ids instead of strings. Technically
> > > > > > > it's possible, but to be practical that conversion should be fast
> > > > > > > because we will need to make name->id conversion potentially for each
> > > > > > > mmap. On the consumer side the performance is not as critical, but the
> > > > > > > fact that instead of dumping /proc/$pid/maps we will have to parse the
> > > > > > > file, do id->name conversion and replace all [anon:id] with
> > > > > > > [anon:name] would be an issue when we do that in bulk, for example
> > > > > > > when collecting system-wide data for a bugreport.
> > > > >
> > > > > Is that something you need to do client-side? Or could the bug tool
> > > > > upload the userspace-maintained name:ids database alongside the
> > > > > /proc/pid/maps dump for external processing?
> > > >
> > > > You can generate a bugreport and analyze it locally or submit it as an
> > > > attachment to a bug for further analyzes.
> > > > Sure, we can attach the id->name conversion table to the bugreport but
> > > > either way, some tool would have to post-process it to resolve the
> > > > ids. If we are not analyzing the results immediately then that step
> > > > can be postponed and I think that's what you mean? If so, then yes,
> > > > that is correct.
> > >
> > > Right, somebody needs to do it at some point, but I suppose it's less
> > > of a problem if a developer machine does it than a mobile device.
> >
> > True, and that's why I mentioned that it's not as critical as the
> > efficiency at mmap() time. In any case, if we could avoid translations
> > at all that would be ideal.
> >
> > >
> > > One advantage of an ID over a string - besides not having to maintain
> > > a deduplicating arbitrary string storage in the kernel - is that we
> > > may be able to auto-assign unique IDs to VMAs in the kernel, in a way
> > > that we could not with strings. You'd still have to do IPC calls to
> > > write new name mappings into your db, but you wouldn't have to do the
> > > prctl() to assign stuff in the kernel at all.
> >
> > You still have to retrieve that tag from the kernel to record it in
> > your db, so this would still require some syscall, no?
>
> Don't you have to do this with the string setting interface as well?
> How do you know the vma address to pass into the prctl()? Is this
> somehow coordinated with the mmap()?
Sure. The sequence is:
ptr = mmap(NULL, size, ...);
prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, ptr, size, name);
>
> > > (We'd have to think of a solution of how IDs work with vma merging and
> > > splitting, but I think to a certain degree that's policy and we should
> > > be able to find something workable - a MAP_ID flag, using anon_vma as
> > > identity, assigning IDs at mmap time and do merges only for protection
> > > changes etc. etc.)
> >
> > Overall, I think keeping the kernel out of this and letting it treat
> > this tag as a cookie which only userspace cares about is simpler.
> > Unless you see other uses where kernel's involvement is needed.
>
> It depends on what you consider keeping the kernel out of it. A small
> extension to assign unique IDs to mappings automatically in an
> intuitive way (with a compat option to disable) is a much smaller ABI
> commitment than a prctl()-controlled string storage.
I'm not saying it's hard or complex. I just don't see the advantage of
generating these IDs in the kernel vs passing them from userspace.
Maybe I'm missing some usecase?
> When I say policy on how to assign the ID, I didn't mean that it
> should be a free for all. Rather that we should pick one reasonable
> way to do it, comparable to picking the parameters for how long the
> stored strings could be, which characters to allow etc.
Powered by blists - more mailing lists