lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c55739ec-3f0c-4f37-ad86-fe337d71d5a2@nvidia.com>
Date: Wed, 4 Sep 2024 11:58:07 -0700
From: John Hubbard <jhubbard@...dia.com>
To: Suren Baghdasaryan <surenb@...gle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, kent.overstreet@...ux.dev,
 corbet@....net, arnd@...db.de, mcgrof@...nel.org, rppt@...nel.org,
 paulmck@...nel.org, thuth@...hat.com, tglx@...utronix.de, bp@...en8.de,
 xiongwei.song@...driver.com, ardb@...nel.org, david@...hat.com,
 vbabka@...e.cz, mhocko@...e.com, hannes@...xchg.org,
 roman.gushchin@...ux.dev, dave@...olabs.net, willy@...radead.org,
 liam.howlett@...cle.com, pasha.tatashin@...een.com, souravpanda@...gle.com,
 keescook@...omium.org, dennis@...nel.org, yuzhao@...gle.com,
 vvvvvv@...gle.com, rostedt@...dmis.org, iamjoonsoo.kim@....com,
 rientjes@...gle.com, minchan@...gle.com, kaleshsingh@...gle.com,
 linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
 linux-arch@...r.kernel.org, linux-mm@...ck.org,
 linux-modules@...r.kernel.org, kernel-team@...roid.com
Subject: Re: [PATCH v2 6/6] alloc_tag: config to store page allocation tag
 refs in page flags

On 9/4/24 9:08 AM, Suren Baghdasaryan wrote:
> On Tue, Sep 3, 2024 at 7:06 PM 'John Hubbard' via kernel-team
> <kernel-team@...roid.com> wrote:
>> On 9/3/24 6:25 PM, John Hubbard wrote:
>>> On 9/3/24 11:19 AM, Suren Baghdasaryan wrote:
>>>> On Sun, Sep 1, 2024 at 10:16 PM Andrew Morton <akpm@...ux-foundation.org> wrote:
>>>>> On Sun,  1 Sep 2024 21:41:28 -0700 Suren Baghdasaryan <surenb@...gle.com> wrote:
...
>> The configuration should disable itself, in this case. But if that is
>> too big of a change for now, I suppose we could fall back to an error
>> message to the effect of, "please disable CONFIG_PGALLOC_TAG_USE_PAGEFLAGS
>> because the kernel build system is still too primitive to do that for you". :)
> 
> I don't think we can detect this at build time. We need to know how
> many page allocations there are, which we find out only after we build
> the kernel image (from the section size that holds allocation tags).
> Therefore it would have to be a post-build check. So I think the best
> we can do is to generate the error like the one you suggested after we
> build the image.
> Dependency on CONFIG_PAGE_EXTENSION is yet another complexity because
> if we auto-disable CONFIG_PGALLOC_TAG_USE_PAGEFLAGS, we would have to
> also auto-enable CONFIG_PAGE_EXTENSION if it's not already enabled.
> 
> I'll dig around some more to see if there is a better way.
>>
>>>> - If there are enough unused bits but we have to push last_cpupid out
>>>> of page flags, we issue a warning and continue. The user can disable
>>>> CONFIG_PGALLOC_TAG_USE_PAGEFLAGS if last_cpupid has to stay in page
>>>> flags.
>>
>> Let's try to decide now, what that tradeoff should be. Just pick one based
>> on what some of us perceive to be the expected usefulness and frequency of
>> use between last_cpuid and these tag refs.
>>
>> If someone really needs to change the tradeoff for that one bit, then that
>> someone is also likely able to hack up a change for it.
> 
> Yeah, from all the feedback, I realize that by pursuing the maximum
> flexibility I made configuring this mechanism close to impossible. I
> think the first step towards simplifying this would be to identify
> usable configurations. From that POV, I can see 3 useful modes:
> 
> 1. Page flags are not used. In this mode we will use direct pointer
> references and page extensions, like we do today. This mode is used
> when we don't have enough page flags. This can be a safe default which
> keeps things as they are today and should always work.

Definitely my favorite so far.

> 2. Page flags are used but not forced. This means we will try to use
> all free page flags bits (up to a reasonable limit of 16) without
> pushing out last_cpupid.

This is a logical next step, agreed.

> 3. Page flags are forced. This means we will try to use all free page
> flags bits after pushing last_cpupid out of page flags. This mode
> could be used if the user cares about memory profiling more than the
> performance overhead caused by last_cpupid.
> 
> I'm not 100% sure (3) is needed, so I think we can skip it until
> someone asks for it. It should be easy to add that in the future.

Right.

> If we detect at build time that we don't have enough page flag bits to
> cover kernel allocations for modes (2) or (3), we issue an error
> prompting the user to reconfigure to mode (1).
> 
> Ideally, I would like to have (2) as default mode and automatically
> fall back to (1) when it's impossible but as I mentioned before, I
> don't yet see a way to do that automatically.
> 
> For loadable modules, I think my earlier suggestion should work fine.
> If a module causes us to run out of space for tags, we disable memory
> profiling at runtime and log a warning for the user stating that we
> disabled memory profiling and if the user needs it they should
> configure mode (1). I *think* I can even disable profiling only for
> that module and not globally but I need to try that first.
> 
> I can start with modes (1) and (2) support which requires only
> CONFIG_PGALLOC_TAG_USE_PAGEFLAGS defaulted to N. Any user can try
> enabling this config and if that builds fine then keeping it for
> better performance and memory usage. Does that sound acceptable?
> Thanks,
> Suren.
> 

How badly do we need (2)? Because this is really expensive:

    a) It adds complexity to a complex,delicate core part of mm.

    b) It adds constraints, which prevent possible future features.

It's not yet clear that (2) is valuable enough (compared to (1))
to compensate, at least from what I've read. Unless I missed
something big.


thanks,
-- 
John Hubbard


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ