linux-kernel - Re: [tip: sched/urgent] sched/deadline: Fix dl

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANDhNCrztM1eK-6dab_-4hnX4miJH_pe49r=GVVqtD+Z235kgw@mail.gmail.com>
Date: Mon, 22 Sep 2025 16:46:52 -0700
From: John Stultz <jstultz@...gle.com>
To: Marek Szyprowski <m.szyprowski@...sung.com>
Cc: linux-kernel@...r.kernel.org, linux-tip-commits@...r.kernel.org, 
	"Peter Zijlstra (Intel)" <peterz@...radead.org>, x86@...nel.org, 
	Linux Samsung SOC <linux-samsung-soc@...r.kernel.org>
Subject: Re: [tip: sched/urgent] sched/deadline: Fix dl_server getting stuck

On Mon, Sep 22, 2025 at 2:57 PM Marek Szyprowski
<m.szyprowski@...sung.com> wrote:
> This patch landed in today's linux-next as commit 077e1e2e0015
> ("sched/deadline: Fix dl_server getting stuck"). In my tests I found
> that it breaks CPU hotplug on some of my systems. On 64bit
> Exynos5433-based TM2e board I've captured the following lock dep warning
> (which unfortunately doesn't look like really related to CPU hotplug):
>

Huh. Nor does it really look related to the dl_server change. Interesting...


> # for i in /sys/devices/system/cpu/cpu[1-9]; do echo 0 >$i/online; done
> Detected VIPT I-cache on CPU7
> CPU7: Booted secondary processor 0x0000000101 [0x410fd031]
> ------------[ cut here ]------------
> WARNING: CPU: 7 PID: 0 at kernel/rcu/tree.c:4329
> rcutree_report_cpu_starting+0x1e8/0x348
> Modules linked in: brcmfmac_wcc cpufreq_powersave cpufreq_conservative
> brcmfmac brcmutil sha256 snd_soc_wm5110 cfg80211 snd_soc_wm_adsp cs_dsp
> snd_soc_tm2_wm5110 snd_soc_arizona arizona_micsupp phy_exynos5_usbdrd
> s5p_mfc typec arizona_ldo1 hci_uart btqca s5p_jpeg max77693_haptic btbcm
> s3fwrn5_i2c exynos_gsc bluetooth s3fwrn5 nci v4l2_mem2mem nfc
> snd_soc_i2s snd_soc_idma snd_soc_hdmi_codec snd_soc_max98504
> snd_soc_s3c_dma videobuf2_dma_contig videobuf2_memops ecdh_generic
> snd_soc_core ir_spi videobuf2_v4l2 ecc snd_compress ntc_thermistor
> panfrost videodev snd_pcm_dmaengine snd_pcm rfkill drm_shmem_helper
> panel_samsung_s6e3ha2 videobuf2_common backlight pwrseq_core gpu_sched
> mc snd_timer snd soundcore ipv6
> CPU: 7 UID: 0 PID: 0 Comm: swapper/7 Not tainted 6.17.0-rc6+ #16012 PREEMPT
> Hardware name: Samsung TM2E board (DT)
> Hardware name: Samsung TM2E board (DT)
> Detected VIPT I-cache on CPU7
>
> ======================================================
> WARNING: possible circular locking dependency detected
> 6.17.0-rc6+ #16012 Not tainted
> ------------------------------------------------------
> swapper/7/0 is trying to acquire lock:
> ffff000024021cc8 (&irq_desc_lock_class){-.-.}-{2:2}, at:
> __irq_get_desc_lock+0x5c/0x9c
>
> but task is already holding lock:
> ffff800083e479c0 (&port_lock_key){-.-.}-{3:3}, at:
> s3c24xx_serial_console_write+0x80/0x268
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #2 (&port_lock_key){-.-.}-{3:3}:
>         _raw_spin_lock_irqsave+0x60/0x88
>         s3c24xx_serial_console_write+0x80/0x268
>         console_flush_all+0x304/0x49c
>         console_unlock+0x70/0x110
>         vprintk_emit+0x254/0x39c
>         vprintk_default+0x38/0x44
>         vprintk+0x28/0x34
>         _printk+0x5c/0x84
>         register_console+0x3ac/0x4f8
>         serial_core_register_port+0x6c4/0x7a4
>         serial_ctrl_register_port+0x10/0x1c
>         uart_add_one_port+0x10/0x1c
>         s3c24xx_serial_probe+0x34c/0x6d8
>         platform_probe+0x5c/0xac
>         really_probe+0xbc/0x298
>         __driver_probe_device+0x78/0x12c
>         driver_probe_device+0xdc/0x164
>         __device_attach_driver+0xb8/0x138
>         bus_for_each_drv+0x80/0xdc
>         __device_attach+0xa8/0x1b0
>         device_initial_probe+0x14/0x20
>         bus_probe_device+0xb0/0xb4
>         deferred_probe_work_func+0x8c/0xc8
>         process_one_work+0x208/0x60c
>         worker_thread+0x244/0x388
>         kthread+0x150/0x228
>         ret_from_fork+0x10/0x20
>
> -> #1 (console_owner){..-.}-{0:0}:
>         console_lock_spinning_enable+0x6c/0x7c
>         console_flush_all+0x2c8/0x49c
>         console_unlock+0x70/0x110
>         vprintk_emit+0x254/0x39c
>         vprintk_default+0x38/0x44
>         vprintk+0x28/0x34
>         _printk+0x5c/0x84
>         exynos_wkup_irq_set_wake+0x80/0xa4
>         irq_set_irq_wake+0x164/0x1e0
>         arizona_irq_set_wake+0x18/0x24
>         irq_set_irq_wake+0x164/0x1e0
>         regmap_irq_sync_unlock+0x328/0x530
>         __irq_put_desc_unlock+0x48/0x4c
>         irq_set_irq_wake+0x84/0x1e0
>         arizona_set_irq_wake+0x5c/0x70
>         wm5110_probe+0x220/0x354 [snd_soc_wm5110]
>         platform_probe+0x5c/0xac
>         really_probe+0xbc/0x298
>         __driver_probe_device+0x78/0x12c
>         driver_probe_device+0xdc/0x164
>         __driver_attach+0x9c/0x1ac
>         bus_for_each_dev+0x74/0xd0
>         driver_attach+0x24/0x30
>         bus_add_driver+0xe4/0x208
>         driver_register+0x60/0x128
>         __platform_driver_register+0x24/0x30
>         cs_exit+0xc/0x20 [cpufreq_conservative]
>         do_one_initcall+0x64/0x308
>         do_init_module+0x58/0x23c
>         load_module+0x1b48/0x1dc4
>         init_module_from_file+0x84/0xc4
>         idempotent_init_module+0x188/0x280
>         __arm64_sys_finit_module+0x68/0xac
>         invoke_syscall+0x48/0x110
>         el0_svc_.common.c
>
> (system is frozen at this point).

So I've seen issues like this when testing scheduler changes,
particularly when I've added debug printks or WARN_ONs that trip while
we're deep in the scheduler core and hold various locks. I reported
something similar here:
https://lore.kernel.org/lkml/CANDhNCo8NRm4meR7vHqvP8vVZ-_GXVPuUKSO1wUQkKdfjvy20w@mail.gmail.com/

Now, usually I'll see the lockdep warning, and the hang is much more rare.

But I don't see right off how the dl_server change would affect this,
other than just changing the timing of execution such that you manage
to trip over the existing issue.

So far I don't see anything similar testing hotplug on x86 qemu.  Do
you get any other console messages or warnings prior?

Looking at the backtrace, I wonder if changing the pr_info() in
exynos_wkup_irq_set_wake() to printk_deferred() might avoid this?

thanks
-john