[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJuCfpHaF1e0V=wAoNO36nRL2A5EaNnuQrvZ2K3wh6PL6FrwZQ@mail.gmail.com>
Date: Mon, 11 Oct 2021 18:20:25 -0700
From: Suren Baghdasaryan <surenb@...gle.com>
To: Michal Hocko <mhocko@...e.com>
Cc: Kees Cook <keescook@...omium.org>, Pavel Machek <pavel@....cz>,
Rasmus Villemoes <linux@...musvillemoes.dk>,
David Hildenbrand <david@...hat.com>,
John Hubbard <jhubbard@...dia.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Colin Cross <ccross@...gle.com>,
Sumit Semwal <sumit.semwal@...aro.org>,
Dave Hansen <dave.hansen@...el.com>,
Matthew Wilcox <willy@...radead.org>,
"Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
Vlastimil Babka <vbabka@...e.cz>,
Johannes Weiner <hannes@...xchg.org>,
Jonathan Corbet <corbet@....net>,
Al Viro <viro@...iv.linux.org.uk>,
Randy Dunlap <rdunlap@...radead.org>,
Kalesh Singh <kaleshsingh@...gle.com>,
Peter Xu <peterx@...hat.com>, rppt@...nel.org,
Peter Zijlstra <peterz@...radead.org>,
Catalin Marinas <catalin.marinas@....com>,
vincenzo.frascino@....com,
Chinwen Chang (張錦文)
<chinwen.chang@...iatek.com>,
Axel Rasmussen <axelrasmussen@...gle.com>,
Andrea Arcangeli <aarcange@...hat.com>,
Jann Horn <jannh@...gle.com>, apopple@...dia.com,
Yu Zhao <yuzhao@...gle.com>, Will Deacon <will@...nel.org>,
fenghua.yu@...el.com, thunder.leizhen@...wei.com,
Hugh Dickins <hughd@...gle.com>, feng.tang@...el.com,
Jason Gunthorpe <jgg@...pe.ca>, Roman Gushchin <guro@...com>,
Thomas Gleixner <tglx@...utronix.de>, krisman@...labora.com,
Chris Hyser <chris.hyser@...cle.com>,
Peter Collingbourne <pcc@...gle.com>,
"Eric W. Biederman" <ebiederm@...ssion.com>,
Jens Axboe <axboe@...nel.dk>, legion@...nel.org,
Rolf Eike Beer <eb@...ix.com>,
Cyrill Gorcunov <gorcunov@...il.com>,
Muchun Song <songmuchun@...edance.com>,
Viresh Kumar <viresh.kumar@...aro.org>,
Thomas Cedeno <thomascedeno@...gle.com>, sashal@...nel.org,
cxfcosmos@...il.com, LKML <linux-kernel@...r.kernel.org>,
linux-fsdevel@...r.kernel.org, linux-doc@...r.kernel.org,
linux-mm <linux-mm@...ck.org>,
kernel-team <kernel-team@...roid.com>,
Tim Murray <timmurray@...gle.com>
Subject: Re: [PATCH v10 3/3] mm: add anonymous vma name refcounting
On Mon, Oct 11, 2021 at 6:18 PM Suren Baghdasaryan <surenb@...gle.com> wrote:
>
> On Mon, Oct 11, 2021 at 1:36 AM Michal Hocko <mhocko@...e.com> wrote:
> >
> > On Fri 08-10-21 13:58:01, Kees Cook wrote:
> > > - Strings for "anon" specifically have no required format (this is good)
> > > it's informational like the task_struct::comm and can (roughly)
> > > anything. There's no naming convention for memfds, AF_UNIX, etc. Why
> > > is one needed here? That seems like a completely unreasonable
> > > requirement.
> >
> > I might be misreading the justification for the feature. Patch 2 is
> > talking about tools that need to understand memeory usage to make
> > further actions. Also Suren was suggesting "numbering convetion" as an
> > argument against.
> >
> > So can we get a clear example how is this being used actually? If this
> > is just to be used to debug by humans than I can see an argument for
> > human readable form. If this is, however, meant to be used by tools to
> > make some actions then the argument for strings is much weaker.
>
> The simplest usecase is when we notice that a process consumes more
> memory than usual and we do "cat /proc/$(pidof my_process)/maps" to
> check which area is contributing to this growth. The names we assign
> to anonymous areas are descriptive enough for a developer to get an
> idea where the increased consumption is coming from and how to proceed
> with their investigation.
> There are of course cases when tools are involved, but the end-user is
> always a human and the final report should contain easily
> understandable data.
>
> IIUC, the main argument here is whether the userspace can provide
> tools to perform the translations between ids and names, with the
> kernel accepting and reporting ids instead of strings. Technically
> it's possible, but to be practical that conversion should be fast
> because we will need to make name->id conversion potentially for each
> mmap. On the consumer side the performance is not as critical, but the
> fact that instead of dumping /proc/$pid/maps we will have to parse the
> file, do id->name conversion and replace all [anon:id] with
> [anon:name] would be an issue when we do that in bulk, for example
> when collecting system-wide data for a bugreport.
>
> I went ahead and implemented the proposed userspace solution involving
> tmpfs as a repository for name->id mapping (more precisely
> filename->inode mapping). Profiling shows that open()+fstat()+close()
> takes:
> - roughly 15 times longer than mmap() with 1000 unique names each
> being reused 50 times.
> - roughly 3 times longer than mmap() with 100 unique names each being
> reused 500 times. This is due to lstat() optimization suggested by
> Rasmus which avoids open() and close().
> For comparison, proposed prctl() takes roughly the same amount of time
> as mmap() and does not depend on the number of unique names.
>
> I'm still evaluating the proposal to use memfds but I'm not sure if
> the issue that David Hildenbrand mentioned about additional memory
> consumed in pagecache (which has to be addressed) is the only one we
> will encounter with this approach. If anyone knows of any potential
> issues with using memfds as named anonymous memory, I would really
> appreciate your feedback before I go too far in that direction.
> Thanks,
> Suren.
Just noticed that timmurray@ was dropped from the last reply. Adding him back.
>
> > --
> > Michal Hocko
> > SUSE Labs
Powered by blists - more mailing lists