linux-kernel - Re: [PATCH] bcachefs: Switch to memalloc_flags

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALOAHbAbzJL31jeGfXnbXmbXMpPv-Ak3o3t0tusjs-N-NHisiQ@mail.gmail.com>
Date: Mon, 2 Sep 2024 17:01:12 +0800
From: Yafang Shao <laoar.shao@...il.com>
To: Michal Hocko <mhocko@...e.com>
Cc: Dave Chinner <david@...morbit.com>, Kent Overstreet <kent.overstreet@...ux.dev>, 
	Matthew Wilcox <willy@...radead.org>, linux-fsdevel@...r.kernel.org, linux-mm@...ck.org, 
	linux-kernel@...r.kernel.org, Dave Chinner <dchinner@...hat.com>
Subject: Re: [PATCH] bcachefs: Switch to memalloc_flags_do() for vmalloc allocations

On Mon, Sep 2, 2024 at 4:11 PM Michal Hocko <mhocko@...e.com> wrote:
>
> On Mon 02-09-24 11:02:50, Yafang Shao wrote:
> > On Sun, Sep 1, 2024 at 11:35 AM Dave Chinner <david@...morbit.com> wrote:
> [...]
> > > AIUI, the memory allocation looping has back-offs already built in
> > > to it when memory reserves are exhausted and/or reclaim is
> > > congested.
> > >
> > > e.g:
> > >
> > > get_page_from_freelist()
> > >   (zone below watermark)
> > >   node_reclaim()
> > >     __node_reclaim()
> > >       shrink_node()
> > >         reclaim_throttle()
> >
> > It applies to all kinds of allocations.
> >
> > >
> > > And the call to recalim_throttle() will do the equivalent of
> > > memalloc_retry_wait() (a 2ms sleep).
> >
> > I'm wondering if we should take special action for __GFP_NOFAIL, as
> > currently, it only results in an endless loop with no intervention.
>
> If the memory allocator/reclaim is trashing on couple of remaining pages
> that are easy to drop and reallocated again then the same endless loop
> is de-facto the behavior for _all_ non-costly allocations. All of them
> will loop. This is not really great but so far we haven't really
> developed a reliable thrashing detection that would suit all potential
> workloads. There are some that simply benefit from work not being lost
> even if the cost is a severe performance penalty. A general conclusion
> has been that workloads which would rather see OOM killer triggering
> early should implement that policy in the userspace. We have PSI,
> refault counters and other tools that could be used to detect
> pathological patterns and trigger workload specific action.

Indeed, we're currently working on developing that policy.

>
> I really do not see why GFP_NOFAIL should be any special in this
> specific case.

I believe there's no way to stop it from looping, even if you
implement a sophisticated user space OOM killer. ;)

--
Regards
Yafang