[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250822105712.CRp0on1Y@linutronix.de>
Date: Fri, 22 Aug 2025 12:57:12 +0200
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Borislav Petkov <bp@...en8.de>, Thomas Gleixner <tglx@...utronix.de>,
Peter Zijlstra <peterz@...radead.org>, x86-ml <x86@...nel.org>,
lkml <linux-kernel@...r.kernel.org>
Subject: Re: [GIT PULL] locking/urgent for v6.17-rc1
On 2025-08-21 14:19:31 [-0400], Linus Torvalds wrote:
> On Sat, 9 Aug 2025 at 14:02, Borislav Petkov <bp@...en8.de> wrote:
> >
> > please pull a locking/urgent fix for v6.17-rc1.
>
> Ok, so this clearly wasn't a fix.
>
> > Thomas Gleixner (1):
> > futex: Move futex cleanup to __mmdrop()
>
> So this causes problems, because __mmdrop is not done in thread
> context, and the kvfree() calls then cause issues:
>
> https://lore.kernel.org/all/20250821102721.6deae493@kernel.org/
> https://lore.kernel.org/all/20250818131902.5039-1-hdanton@sina.com/
>
> Hilf Danton sent out a patch, but honestly, that patch looks like pure
> bandaid, and will make the exit path horribly much slower by moving
> things into workqueues. It might not be visible in profiles exactly
> *because* it's then hidden in workqueues, but it's not great.
vfree() has an in_interrupt() check. Extending it to an irq-check would
check this but not a section where a spin_lock_t is held since we can't
check for disabled preemption. And therefore another band aid.
> I think it's a mistake to allow vmalloc'ing those hashes in the first
> place, and I suggest the local hash be size-limited to the point where
> it's just a kmalloc() and thus works in all contexts.
default auto scaling with 512 CPUs without lockdep maxes out
64 + (64 * 512 * 4) = 128KiB + 64.
For kmalloc() the slab cache is used up to 8KiB which crosses the limit
with 32 CPUs. Then we have kmalloc()'s max allocation limit which is at
4MiB.
Given that Jakub hits the warning after sometime might indicate that it
was possible to fulfill the requirement initially but over time the
memory became too fragmented. So it allocated virtual memory.
> Or maybe the mistake was the mm-private hashing in the first place.
> Maybe that hash shouldn't be allocated at mm_alloc() ->
> futex_mm_init() at all. Only initialized by the futex code when
> needed, and then dropped in exit_mmap().
mm_alloc() is fine. This does only an alloc_percpu() and its counter part
can be used in atomic context. So the hash pointer should end up in
kfree(NULL).
Let me stare at the initial report leading to the fix. Maybe we can
avoid the leak and the atomic context altogether.
> So the problems seem deeper than just "free'd in the wrong context".
>
> Linus
Sebastian
Powered by blists - more mailing lists