[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a6150003-7f87-4f24-9156-bec10dfb70ce@samsung.com>
Date: Thu, 18 Dec 2025 11:09:13 +0100
From: Marek Szyprowski <m.szyprowski@...sung.com>
To: Peter Zijlstra <peterz@...radead.org>, mingo@...nel.org,
vincent.guittot@...aro.org
Cc: linux-kernel@...r.kernel.org, juri.lelli@...hat.com,
dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
mgorman@...e.de, vschneid@...hat.com, tj@...nel.org, void@...ifault.com,
arighi@...dia.com, changwoo@...lia.com, sched-ext@...ts.linux.dev, Heiko
Stuebner <heiko@...ech.de>, linux-rockchip@...ts.infradead.org
Subject: Re: [PATCH 4/5] sched: Add assertions to QUEUE_CLASS
On 27.11.2025 16:39, Peter Zijlstra wrote:
> Add some checks to the sched_change pattern to validate assumptions
> around changing classes.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
This patch landed recently in linux-next as commit 47efe2ddccb1
("sched/core: Add assertions to QUEUE_CLASS"). In my tests it turned out
that it triggers the following warning during simple 'rtcwake' test on
Hardkernel's Odroid-M1 board
(arch/arm64/boot/dts/rockchip/rk3568-odroid-m1.dts):
root@...get:~# time rtcwake -s5 -mon
rtcwake: wakeup using /dev/rtc0 at Thu Dec 18 10:01:28 2025
------------[ cut here ]------------
WARNING: kernel/sched/core.c:10837 at sched_change_end+0x160/0x168,
CPU#0: irq/38-rk817/79
Modules linked in: snd_soc_hdmi_codec dw_hdmi_i2s_audio dw_hdmi_cec
snd_soc_simple_card snd_soc_rk817 snd_soc_simple_card_utils
snd_soc_rockchip_i2s_tdm snd_soc_core hantro_vpu rockchip_rga v4l2_vp9
v4l2_h264 snd_compress v4l2_jpeg videobuf2_dma_sg videobuf2_dma_contig
v4l2_mem2mem videobuf2_memops snd_pcm_dmaengine videobuf2_v4l2 snd_pcm
gpio_ir_recv dwmac_rk display_connector stmmac_platform rockchip_saradc
rockchipdrm snd_timer videodev snd stmmac industrialio_triggered_buffer
kfifo_buf rockchip_thermal phy_rockchip_naneng_combphy videobuf2_common
spi_rockchip_sfc soundcore rk817_charger rockchip_dfi rtc_rk808
rk805_pwrkey pcs_xpcs panfrost dw_hdmi_qp analogix_dp dw_dp
drm_shmem_helper dw_mipi_dsi drm_dp_aux_bus gpu_sched dw_hdmi mc
drm_display_helper ahci_dwc ipv6 libsha1
CPU: 0 UID: 0 PID: 79 Comm: irq/38-rk817 Not tainted 6.19.0-rc1+ #16288
PREEMPT
Hardware name: Hardkernel ODROID-M1 (DT)
pstate: 404000c9 (nZcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : sched_change_end+0x160/0x168
lr : sched_change_end+0xb0/0x168
...
Call trace:
sched_change_end+0x160/0x168 (P)
rt_mutex_setprio+0xc8/0x3a8
mark_wakeup_next_waiter+0xc0/0x258
rt_mutex_unlock+0x88/0x148
i2c_adapter_unlock_bus+0x14/0x20
i2c_transfer+0xac/0xf0
regmap_i2c_read+0x5c/0xa0
_regmap_raw_read+0xec/0x16c
_regmap_bus_read+0x44/0x7c
_regmap_read+0x64/0xf4
regmap_read+0x4c/0x78
read_irq_data+0x9c/0x460
regmap_irq_thread+0x64/0x2f0
irq_thread_fn+0x2c/0xa8
irq_thread+0x1a4/0x378
kthread+0x13c/0x214
ret_from_fork+0x10/0x20
---[ end trace 0000000000000000 ]---
real 0m5.547s
user 0m0.004s
sys 0m0.011s
root@...get:~#
I don't see anything suspicious in this stacktrace. Let me know how I
can help debugging this issue. This board is the only one in my test
farm which triggers such warning.
> ---
> kernel/sched/core.c | 13 +++++++++++++
> kernel/sched/sched.h | 1 +
> 2 files changed, 14 insertions(+)
>
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -10806,6 +10806,7 @@ struct sched_change_ctx *sched_change_be
>
> *ctx = (struct sched_change_ctx){
> .p = p,
> + .class = p->sched_class,
> .flags = flags,
> .queued = task_on_rq_queued(p),
> .running = task_current_donor(rq, p),
> @@ -10836,6 +10837,11 @@ void sched_change_end(struct sched_chang
>
> lockdep_assert_rq_held(rq);
>
> + /*
> + * Changing class without *QUEUE_CLASS is bad.
> + */
> + WARN_ON_ONCE(p->sched_class != ctx->class && !(ctx->flags & ENQUEUE_CLASS));
> +
> if ((ctx->flags & ENQUEUE_CLASS) && p->sched_class->switching_to)
> p->sched_class->switching_to(rq, p);
>
> @@ -10847,6 +10853,13 @@ void sched_change_end(struct sched_chang
> if (ctx->flags & ENQUEUE_CLASS) {
> if (p->sched_class->switched_to)
> p->sched_class->switched_to(rq, p);
> +
> + /*
> + * If this was a degradation in class someone should have set
> + * need_resched by now.
> + */
> + WARN_ON_ONCE(sched_class_above(ctx->class, p->sched_class) &&
> + !test_tsk_need_resched(p));
> } else {
> p->sched_class->prio_changed(rq, p, ctx->prio);
> }
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -4027,6 +4027,7 @@ extern void balance_callbacks(struct rq
> struct sched_change_ctx {
> u64 prio;
> struct task_struct *p;
> + const struct sched_class *class;
> int flags;
> bool queued;
> bool running;
>
>
>
Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland
Powered by blists - more mailing lists