[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJuCfpGbHShimigbm7ckT76hK9YUc_gy0jb9e5ddq7yjqDKOig@mail.gmail.com>
Date: Tue, 3 Sep 2024 19:04:51 -0700
From: Suren Baghdasaryan <surenb@...gle.com>
To: Kent Overstreet <kent.overstreet@...ux.dev>
Cc: Andrew Morton <akpm@...ux-foundation.org>, corbet@....net, arnd@...db.de,
mcgrof@...nel.org, rppt@...nel.org, paulmck@...nel.org, thuth@...hat.com,
tglx@...utronix.de, bp@...en8.de, xiongwei.song@...driver.com,
ardb@...nel.org, david@...hat.com, vbabka@...e.cz, mhocko@...e.com,
hannes@...xchg.org, roman.gushchin@...ux.dev, dave@...olabs.net,
willy@...radead.org, liam.howlett@...cle.com, pasha.tatashin@...een.com,
souravpanda@...gle.com, keescook@...omium.org, dennis@...nel.org,
jhubbard@...dia.com, yuzhao@...gle.com, vvvvvv@...gle.com,
rostedt@...dmis.org, iamjoonsoo.kim@....com, rientjes@...gle.com,
minchan@...gle.com, kaleshsingh@...gle.com, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-arch@...r.kernel.org, linux-mm@...ck.org,
linux-modules@...r.kernel.org, kernel-team@...roid.com
Subject: Re: [PATCH v2 5/6] alloc_tag: make page allocation tag reference size configurable
On Tue, Sep 3, 2024 at 6:17 PM Kent Overstreet
<kent.overstreet@...ux.dev> wrote:
>
> On Tue, Sep 03, 2024 at 06:07:28PM GMT, Suren Baghdasaryan wrote:
> > On Sun, Sep 1, 2024 at 10:09 PM Andrew Morton <akpm@...ux-foundation.org> wrote:
> > >
> > > On Sun, 1 Sep 2024 21:41:27 -0700 Suren Baghdasaryan <surenb@...gle.com> wrote:
> > >
> > > > Introduce CONFIG_PGALLOC_TAG_REF_BITS to control the size of the
> > > > page allocation tag references. When the size is configured to be
> > > > less than a direct pointer, the tags are searched using an index
> > > > stored as the tag reference.
> > > >
> > > > ...
> > > >
> > > > +config PGALLOC_TAG_REF_BITS
> > > > + int "Number of bits for page allocation tag reference (10-64)"
> > > > + range 10 64
> > > > + default "64"
> > > > + depends on MEM_ALLOC_PROFILING
> > > > + help
> > > > + Number of bits used to encode a page allocation tag reference.
> > > > +
> > > > + Smaller number results in less memory overhead but limits the number of
> > > > + allocations which can be tagged (including allocations from modules).
> > > > +
> > >
> > > In other words, "we have no idea what's best for you, you're on your
> > > own".
> > >
> > > I pity our poor users.
> > >
> > > Can we at least tell them what they should look at to determine whether
> > > whatever random number they chose was helpful or harmful?
> >
> > At the end of my reply in
> > https://lore.kernel.org/all/CAJuCfpGNYgx0GW4suHRzmxVH28RGRnFBvFC6WO+F8BD4HDqxXA@mail.gmail.com/#t
> > I suggested using all unused page flags. That would simplify things
> > for the user at the expense of potentially using more memory than we
> > need.
>
> Why would that use more memory, and how much?
Say our kernel uses 5000 page allocations and there are additional 100
allocations from all the modules we are loading at runtime. They all
can be addressed using 13 bits (8192 addressable tags), so the
contiguous memory we will be preallocating to store these tags is 8192
* sizeof(alloc_tag). sizeof(alloc_tag) is 40 bytes as of today but
might increase in the future if we add more fields there for other
uses (like gfp_flags for example). So, currently this would use 320KB.
If we always use 16 bits we would be preallocating 2.5MB. So, that
would be 2.2MB of wasted memory. Using more than 16 bits (65536
addressable tags) will be impractical anytime soon (current number
IIRC is a bit over 4000).
>
> > In practice 13 bits should be more than enough to cover all
> > kernel page allocations with enough headroom for page allocations
> > coming from loadable modules. I guess using 13 as the default would
> > cover most cases. In the unlikely case a specific system needs more
> > tags, the user can increase this value. It can also be set to 64 to
> > force direct references instead of indexing for better performance.
> > Would that approach be acceptable?
>
> Any knob that has to be kept track of and adjusted is a real hassle -
> e.g. lockdep has a bunch of knobs that have to be periodically tweaked,
> that's used by _developers_, and they're often wrong.
Yes, I understand, but this config would allow us not to waste these
couple of MBs, provide a way for the user to request direct addressing
of the tags and it also helps us to deal with the case I described in
the last paragraph of my posting at
https://lore.kernel.org/all/CAJuCfpGNYgx0GW4suHRzmxVH28RGRnFBvFC6WO+F8BD4HDqxXA@mail.gmail.com/#t
Powered by blists - more mailing lists