[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z6zS4Dtyway78Gif@tiehlicka>
Date: Wed, 12 Feb 2025 17:57:04 +0100
From: Michal Hocko <mhocko@...e.com>
To: Tejun Heo <tj@...nel.org>
Cc: Dennis Zhou <dennis@...nel.org>, Filipe Manana <fdmanana@...e.com>,
Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] mm, percpu: do not consider sleepable allocations atomic
On Tue 11-02-25 10:55:20, Tejun Heo wrote:
> Hello, Michal.
>
> On Thu, Feb 06, 2025 at 01:26:33PM +0100, Michal Hocko wrote:
> ...
> > It has turned out that iscsid has worked around this by dropping
> > PR_SET_IO_FLUSHER (https://github.com/open-iscsi/open-iscsi/pull/382)
> > when scanning host. But we can do better in this case on the kernel side
>
> FWIW, requiring GFP_KERNEL context for probing doesn't sound too crazy to
> me.
>
> > @@ -2204,7 +2204,12 @@ static void pcpu_balance_workfn(struct work_struct *work)
> > * to grow other chunks. This then gives pcpu_reclaim_populated() time
> > * to move fully free chunks to the active list to be freed if
> > * appropriate.
> > + *
> > + * Enforce GFP_NOIO allocations because we have pcpu_alloc users
> > + * constrained to GFP_NOIO/NOFS contexts and they could form lock
> > + * dependency through pcpu_alloc_mutex
> > */
> > + unsigned int flags = memalloc_noio_save();
>
> Just for context, the reason why the allocation mask support was limited to
> GFP_KERNEL or not rather than supporting full range of GFP flags is because
> percpu memory area expansion can involve page table allocations in the
> vmalloc area which always uses GFP_KERNEL. memalloc_noio_save() masks IO
> part out of that, right? It might be worthwhile to explain why we aren't
> passing down GPF flags throughout and instead depending on masking.
I have gone with masking because that seemed easier to review and more
robust solution. vmalloc does support NOFS/NOIO contexts these days (it
will just uses scoped masking in those cases). Propagating the gfp
throughout the worker code path is likely possible, but I haven't really
explored that in detail to be sure. Would that be preferable even if the
fix would be more involved?
> Also, doesn't the above always prevent percpu allocations from doing fs/io
> reclaims?
Yes it does. Probably worth mentioning in the changelog. These
allocations should be rare so having a constrained reclaim didn't really
seem problematic to me. There should be kswapd running in the background
with the full reclaim power.
> ie. Shouldn't the masking only be used if the passed in gfp
> doesn't allow fs/io?
This is a good question. I have to admit that my understanding might be
incorrect but wouldn't it be possible that we could get the lock
dependency chain if GFP_KERNEL and scoped NOFS alloc_pcp calls are
competing?
fs/io lock
pcpu_alloc_noprof(NOFS/NOIO)
pcpu_alloc_noprof(GFP_KERNEL)
pcpu_schedule_balance_work
pcpu_alloc_mutex
pcpu_alloc_mutex
allocation_deadlock throgh fs/io lock
This is currently not possible because constrained allocations only do
trylock.
Makes sense?
--
Michal Hocko
SUSE Labs
Powered by blists - more mailing lists