linux-kernel - Re: [PATCH] mm, percpu: do not consider sleepable allocations atomic

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Z8hpWSsDuMX1salt@tiehlicka>
Date: Wed, 5 Mar 2025 16:10:17 +0100
From: Michal Hocko <mhocko@...e.com>
To: Vlastimil Babka <vbabka@...e.cz>
Cc: Dennis Zhou <dennis@...nel.org>, Tejun Heo <tj@...nel.org>,
	Filipe Manana <fdmanana@...e.com>,
	Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] mm, percpu: do not consider sleepable allocations atomic

Sorry, I have missed follow ups here.

On Fri 21-02-25 10:48:28, Vlastimil Babka wrote:
> On 2/21/25 03:36, Dennis Zhou wrote:
> > I've thought about this in the back of my head for the past few weeks. I
> > think I have 2 questions about this change.
> > 
> > 1. Back to what TJ said earlier about probing. I feel like GFP_KERNEL
> >    allocations should be okay because that more or less is control plane
> >    time? I'm not sure dropping PR_SET_IO_FLUSHER is all that big of a
> >    work around?
> 
> This solves the iscsid case but not other cases, where GFP_KERNEL
> allocations are fundamentally impossible.

Agreed

> 
> > 2. This change breaks the feedback loop as we discussed above.
> >    Historically we've targeted 2-4 free pages worth of percpu memory.
> >    This is done by kicking the percpu work off. That does GFP_KERNEL
> >    allocations and if that requires reclaim then it goes and does it.
> >    However, now we're saying kswapd is going to work in parallel while
> >    we try to get pages in the worker thread.
> > 
> >    Given you're more versed in the reclaim side. I presume it must be
> >    pretty bad if we're failing to get order-0 pages even if we have
> >    NOFS/NOIO set?
> 
> IMHO yes, so I don't think we need to pre-emptively fear that situation that
> much. OTOH in the current state, depleting pcpu's atomic reserves and
> failing pcpu_alloc due to not being allowed to take the mutex can happen
> easily and even if there's plenty of free memory.

Agreed

> >    My feeling is that we should add back some knowledge of the
> >    dependency so if the worker fails to get pages, it doesn't reschedule
> >    immediately. Maybe it's as simple as adding a sleep in the worker or
> >    playing with delayed work...
> 
> I think if we wanted things to be more robust (and perhaps there's no need
> to, see above), the best way would be to make the worker preallocate with
> GFP_KERNEL outside of pcpu_alloc_mutex.

Yes this would work as it would break the lock chain dependency.

> I assume it's probably not easy to
> implement as page table allocations are involved in the process and we don't
> have a way to supply preallocated memory for those.

Why would this be a concern if the allocation is done outside of the
lock?
-- 
Michal Hocko
SUSE Labs