linux-kernel - Re: [PATCH] bcachefs: Switch to memalloc_flags

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZtVzP2wfQoJrBXjF@tiehlicka>
Date: Mon, 2 Sep 2024 10:11:43 +0200
From: Michal Hocko <mhocko@...e.com>
To: Yafang Shao <laoar.shao@...il.com>
Cc: Dave Chinner <david@...morbit.com>,
	Kent Overstreet <kent.overstreet@...ux.dev>,
	Matthew Wilcox <willy@...radead.org>, linux-fsdevel@...r.kernel.org,
	linux-mm@...ck.org, linux-kernel@...r.kernel.org,
	Dave Chinner <dchinner@...hat.com>
Subject: Re: [PATCH] bcachefs: Switch to memalloc_flags_do() for vmalloc
 allocations

On Mon 02-09-24 11:02:50, Yafang Shao wrote:
> On Sun, Sep 1, 2024 at 11:35 AM Dave Chinner <david@...morbit.com> wrote:
[...]
> > AIUI, the memory allocation looping has back-offs already built in
> > to it when memory reserves are exhausted and/or reclaim is
> > congested.
> >
> > e.g:
> >
> > get_page_from_freelist()
> >   (zone below watermark)
> >   node_reclaim()
> >     __node_reclaim()
> >       shrink_node()
> >         reclaim_throttle()
> 
> It applies to all kinds of allocations.
> 
> >
> > And the call to recalim_throttle() will do the equivalent of
> > memalloc_retry_wait() (a 2ms sleep).
> 
> I'm wondering if we should take special action for __GFP_NOFAIL, as
> currently, it only results in an endless loop with no intervention.

If the memory allocator/reclaim is trashing on couple of remaining pages
that are easy to drop and reallocated again then the same endless loop
is de-facto the behavior for _all_ non-costly allocations. All of them
will loop. This is not really great but so far we haven't really
developed a reliable thrashing detection that would suit all potential
workloads. There are some that simply benefit from work not being lost
even if the cost is a severe performance penalty. A general conclusion
has been that workloads which would rather see OOM killer triggering
early should implement that policy in the userspace. We have PSI,
refault counters and other tools that could be used to detect
pathological patterns and trigger workload specific action.

I really do not see why GFP_NOFAIL should be any special in this
specific case. 
-- 
Michal Hocko
SUSE Labs