lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <919f547e-beb7-34b7-7835-9e1625600323@suse.cz>
Date:   Fri, 26 Nov 2021 15:50:15 +0100
From:   Vlastimil Babka <vbabka@...e.cz>
To:     NeilBrown <neilb@...e.de>,
        Andrew Morton <akpm@...ux-foundation.org>
Cc:     Uladzislau Rezki <urezki@...il.com>,
        Michal Hocko <mhocko@...nel.org>,
        Dave Chinner <david@...morbit.com>,
        Christoph Hellwig <hch@....de>, linux-fsdevel@...r.kernel.org,
        linux-mm@...ck.org, LKML <linux-kernel@...r.kernel.org>,
        Ilya Dryomov <idryomov@...il.com>,
        Jeff Layton <jlayton@...nel.org>,
        Michal Hocko <mhocko@...e.com>
Subject: Re: [PATCH v2 2/4] mm/vmalloc: add support for __GFP_NOFAIL

On 11/24/21 06:23, NeilBrown wrote:
>> 
>> I forget why radix_tree_preload used a cpu-local store rather than a
>> per-task one.
>> 
>> Plus "what order pages would you like" and "on which node" and "in
>> which zone", etc...
> 
> "what order" - only order-0 I hope.  I'd hazard a guess that 90% of
> current NOFAIL allocations only need one page (providing slub is used -
> slab seems to insist on high-order pages sometimes).

Yeah AFAIK SLUB can prefer higher orders than SLAB, but also allows fallback
to smallest order that's enough (thus 0 unless the objects are larger than a
page).

> "which node" - whichever.  Unless __GFP_HARDWALL is set, alloc_page()
> will fall-back to "whichever" anyway, and NOFAIL with HARDWALL is
> probably a poor choice.
> "which zone" - NORMAL.  I cannot find any NOFAIL allocations that want
> DMA.  fs/ntfs asks for __GFP_HIGHMEM with NOFAIL, but that that doesn't
> *requre* highmem.
> 
> Of course, before designing this interface too precisely we should check
> if anyone can use it.  From a quick through the some of the 100-ish
> users of __GFP_NOFAIL I'd guess that mempools would help - the
> preallocation should happen at init-time, not request-time.  Maybe if we
> made mempools even more light weight .... though that risks allocating a
> lot of memory that will never get used.
> 
> This brings me back to the idea that
>     alloc_page(wait and reclaim allowed)
> should only fail on OOM_KILL.  That way kernel threads are safe, and
> user-threads are free to return ENOMEM knowing it won't get to

Hm I thought that's already pretty much the case of the "too small to fail"
of today. IIRC there's exactly that gotcha that OOM KILL can result in such
allocation failure. But I believe that approach is rather fragile. If you
encounter such an allocation not checking the resulting page != NULL, you
can only guess which one is true:

- the author simply forgot to check at all
- the author relied on "too small to fail" without realizing the gotcha
- at the time of writing the code was verified that it can be only run in
kernel thread context, not user and
  - it is still true
  - it stopped being true at some later point
  - might be hard to even decide which is the case

IIRC at some point we tried to abolish the "too small to fail" rule because
of this, but Linus denied that. But the opposite - make it hard guarantee in
all cases - also didn't happen, so...

> user-space.  If user-thread code really needs NOFAIL, it punts to a
> workqueue and waits - aborting the wait if it is killed, while the work
> item still runs eventually.
> 
> NeilBrown
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ