lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20260126033012.934143-1-realwujing@gmail.com>
Date: Sun, 25 Jan 2026 22:30:12 -0500
From: Qiliang Yuan <realwujing@...il.com>
To: dianders@...omium.org
Cc: akpm@...ux-foundation.org,
	lihuafei1@...wei.com,
	linux-kernel@...r.kernel.org,
	mingo@...nel.org,
	mm-commits@...r.kernel.org,
	realwujing@...il.com,
	song@...nel.org,
	stable@...r.kernel.org,
	sunshx@...natelecom.cn,
	thorsten.blum@...ux.dev,
	wangjinchao600@...il.com,
	yangyicong@...ilicon.com,
	yuanql9@...natelecom.cn,
	zhangjn11@...natelecom.cn
Subject: Re: [PATCH v3] watchdog/hardlockup: Fix UAF in perf event cleanup due to migration race

Hi Doug,

Thanks for your further questions and for digging into the 4.19 vs ToT
differences.

On Sat, 24 Jan 2026 15:36:01 Doug Anderson <dianders@...omium.org> wrote:
> The part that doesn't make a lot of sense to me, though, is that v4.19
> also doesn't have commit 930d8f8dbab9 ("watchdog/perf: adapt the
> watchdog_perf interface for async model"), which is where we are
> saying the problem was introduced.
> 
> ...so in v4.19 I think:
> * hardlockup_detector_perf_init() is only called from watchdog_nmi_probe()
> * watchdog_nmi_probe() is only called from lockup_detector_init()
> * lockup_detector_init() is only called from kernel_init_freeable()
> right before smp_init()
> 
> Thus I'm super confused about how you could have seen the problem on
> v4.19. Maybe your v4.19 kernel has some backported patches that makes
> this possible?

You caught it! Here is the context for the differences:

1. Mainline (ToT):
   - `lockup_detector_init()` is always called before `smp_init()`
     (pre-SMP phase).
   - Risk source: The asynchronous retry path (`lockup_detector_delay_init`)
     introduced by 930d8f8dbab9, which runs in a workqueue (post-SMP)
     context and triggers the UAF.

2. openEuler (4.19/5.10):
   - Local `euler inclusion` patches moved `lockup_detector_init()` after
     `do_basic_setup()` (post-SMP phase).
   - Risk source: The initial probe occurs directly in a post-SMP
     environment, exposing the race condition.

For openEuler (4.19/5.10) kernel, the call stack looks like this:
  kernel_init()
  -> kernel_init_freeable()
    -> lockup_detector_init()       <-- Called after smp_init()
      -> watchdog_nmi_probe()
        -> hardlockup_detector_perf_init()
          -> hardlockup_detector_event_create()

In mainline (ToT), the initial probe (safe) call stack is:
  kernel_init()
  -> kernel_init_freeable()
    -> lockup_detector_init()       <-- Called before smp_init()
      -> watchdog_hardlockup_probe()
        -> hardlockup_detector_event_create()

However, the asynchronous retry mechanism (commit 930d8f8dbab9) executes the
probe logic in a post-SMP, preemptible context. 

For the mainline (ToT) retry path (at risk), the call stack is:
  kworker thread
  -> process_one_work()
    -> lockup_detector_delay_init()
      -> watchdog_hardlockup_probe()
        -> hardlockup_detector_event_create()

Thus, `930d8f8dbab9` remains the correct "Fixes" target for ToT.

> OK, fair enough. ...but I'm a bit curious why nobody else saw this
> WARN_ON(). I'm also curious if you have tested the hardlockup detector
> on newer kernels, or if all of your work has been done on 4.19. If all
> your work has been done on 4.19, do we need to find someone to test
> your patch on a newer kernel and make sure it works OK? If you've
> tested on a newer kernel, did the hardlockup detector init from the
> kernel's early-init code, or the retry code?

In newer kernels, when the probe fails initially and falls
back to the retry workqueue (or even during early init if preemption is
enabled), the `WARN_ON(!is_percpu_thread())` in
`hardlockup_detector_event_create()` does indeed trigger because
`watchdog_hardlockup_probe()` is called from a non-bound context.

I have verified this patch on the openEuler 4.19 kernel. During our stress
testing, where we start dozens of VMs simultaneously to create high resource
contention, the UAF was consistently reproducible without this fix and is now
confirmed resolved.

The v4 patch addresses this by refactoring the creation logic to be stateless
and adding `cpu_hotplug_disable()` to ensure the probed CPU stays alive.

I'll wait for your further thoughts on v4:
https://lore.kernel.org/all/20260124070814.806828-1-realwujing@gmail.com/

Best regards,
Qiliang

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ