[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <p7gi44yt26bpjbjkvuhd54tqp3vn7z6wq346gmvazg5t3kir4p@gdf64eax44rm>
Date: Wed, 14 Jan 2026 07:21:33 -0800
From: Breno Leitao <leitao@...ian.org>
To: Chris Mason <clm@...a.com>
Cc: Alexander Potapenko <glider@...gle.com>,
Marco Elver <elver@...gle.com>, Dmitry Vyukov <dvyukov@...gle.com>,
Andrew Morton <akpm@...ux-foundation.org>, kasan-dev@...glegroups.com, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, kernel-team@...a.com, stable@...r.kernel.org
Subject: Re: [PATCH v2] mm/kfence: add reboot notifier to disable KFENCE on
shutdown
Hello Chris,
On Tue, Jan 13, 2026 at 06:02:27AM -0800, Chris Mason wrote:
> On Thu, 27 Nov 2025 06:51:54 -0800 Breno Leitao <leitao@...ian.org> wrote:
> > @@ -820,6 +821,25 @@ static struct notifier_block kfence_check_canary_notifier = {
> > static struct delayed_work kfence_timer;
> >
> > #ifdef CONFIG_KFENCE_STATIC_KEYS
> > +static int kfence_reboot_callback(struct notifier_block *nb,
> > + unsigned long action, void *data)
> > +{
> > + /*
> > + * Disable kfence to avoid static keys IPI synchronization during
> > + * late shutdown/kexec
> > + */
> > + WRITE_ONCE(kfence_enabled, false);
> > + /* Cancel any pending timer work */
> > + cancel_delayed_work_sync(&kfence_timer);
> ^^^^^^^^^^^^^^^
>
> Can cancel_delayed_work_sync() deadlock here?
>
> If toggle_allocation_gate() is currently executing and blocked inside
> wait_event_idle() (waiting for kfence_allocation_gate > 0), then
> cancel_delayed_work_sync() will block forever waiting for the work to
> complete.
>
> The wait_event_idle() condition depends only on allocations occurring
> to increment kfence_allocation_gate - setting kfence_enabled to false
> does not wake up this wait. During shutdown when allocations may have
> stopped, the work item could remain blocked indefinitely, causing the
> reboot notifier to hang.
>
> The call chain is:
> kfence_reboot_callback()
> -> cancel_delayed_work_sync(&kfence_timer)
> -> __flush_work()
> -> wait_for_completion(&barr.done)
> // waits forever because...
>
> toggle_allocation_gate() [currently running]
> -> wait_event_idle(allocation_wait, kfence_allocation_gate > 0)
> // never wakes up if no allocations happen
This is spot on, I think this is a real case if the following happen:
1) toggle_allocation_gate() passed beyond kfence_enabled and is waiting
for kfence_allocation_gate to be > 0.
a) kfence_allocation_gate is increased on allocation time
2) There is no more kernel allocation, thus, kfence_allocation_gate is
not incremented
3) cancel_delayed_work_sync() is for kfence_allocation_gate > 0, but
given there is no more allocation, this will never happen.
> Would it be safer to use cancel_delayed_work() (non-sync) here.
In this case toggle_allocation_gate() task will continue to be idle,
waiting for to be kfence_allocation_gate > 0 forever, but it will not
block the notifiers, unless we wake them up.
Is this a problem?
Maybe a more robust solution would include:
1) s/cancel_delayed_work_sync()/cancel_delayed_work().
- This would unblock the notifier
or/and some of the followings
2) Return from wait_event_idle() if kfence_enabled got disabled.
- Remove the waiters once kfence got disabled
- Cons: kfence_allocation_gate will continue to be negative
3) Wake up everyone in the allocation_wait() list
- This might not be necessary if we got 2, since they will wake
themselves once kfence_enabled got to 0
- Cons: kfence_allocation_gate will continue to be negative
4) bump kfence_allocation_gate > 1 on the notifier
- Avoid kfence allocation completely after it got disabled.
- Cons: it is unclear if we I cant set kfence_allocation_gate = 1 from
the notifier.
Thanks for the report,
--breno
Powered by blists - more mailing lists