[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <51540780-580f-47ff-ae0a-7d335c0702f7@redhat.com>
Date: Fri, 5 Apr 2024 10:54:00 +0200
From: Daniel Bristot de Oliveira <bristot@...hat.com>
To: "Joel Fernandes (Google)" <joel@...lfernandes.org>,
linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>, Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>
Cc: Suleiman Souhlal <suleiman@...gle.com>,
Youssef Esmat <youssefesmat@...gle.com>, David Vernet <void@...ifault.com>,
Thomas Gleixner <tglx@...utronix.de>, "Paul E . McKenney"
<paulmck@...nel.org>, joseph.salisbury@...onical.com,
Luca Abeni <luca.abeni@...tannapisa.it>,
Tommaso Cucinotta <tommaso.cucinotta@...tannapisa.it>,
Vineeth Pillai <vineeth@...byteword.org>,
Shuah Khan <skhan@...uxfoundation.org>, Phil Auld <pauld@...hat.com>
Subject: Re: [PATCH v2 11/15] sched/deadline: Mark DL server as unthrottled
before enqueue
On 3/13/24 02:24, Joel Fernandes (Google) wrote:
> The DL server may not have had its timer started if start_dl_timer()
> returns 0 (say the zero-laxity time has already passed). In such cases,
> mark the DL task which is about to be enqueued as not throttled and
> cancel any previous timers, then do the enqueue.
>
> This fixes the following crash:
>
> [ 9.263331] kernel BUG at kernel/sched/deadline.c:1765!
> [ 9.282382] Call Trace:
> [ 9.282767] <TASK>
> [ 9.283086] ? __die_body+0x62/0xb0
> [ 9.283602] ? die+0x9b/0xc0
> [ 9.284036] ? do_trap+0xa3/0x170
> [ 9.284528] ? enqueue_dl_entity+0x45e/0x460
> [ 9.285158] ? enqueue_dl_entity+0x45e/0x460
> [ 9.285791] ? handle_invalid_op+0x65/0x80
> [ 9.286392] ? enqueue_dl_entity+0x45e/0x460
> [ 9.287021] ? exc_invalid_op+0x2f/0x40
> [ 9.287585] ? asm_exc_invalid_op+0x16/0x20
> [ 9.288200] ? find_later_rq+0x120/0x120
> [ 9.288775] ? fair_server_init+0x40/0x40
> [ 9.289364] ? enqueue_dl_entity+0x45e/0x460
> [ 9.289989] ? find_later_rq+0x120/0x120
> [ 9.290564] dl_task_timer+0x1d7/0x2f0
> [ 9.291120] ? find_later_rq+0x120/0x120
> [ 9.291695] __run_hrtimer+0x73/0x1b0
> [ 9.292238] hrtimer_interrupt+0x216/0x2c0
> [ 9.292841] __sysvec_apic_timer_interrupt+0x53/0x140
> [ 9.293581] sysvec_apic_timer_interrupt+0x2d/0x80
> [ 9.294285] asm_sysvec_apic_timer_interrupt+0x16/0x20
>
> The crash can easily be reproduced by adding a 100ms delay as follows:
>
> +int delay_inject_count;
> +
> static void
> enqueue_dl_entity(struct sched_dl_entity *dl_se, int flags)
> {
> @@ -1827,6 +1830,12 @@ enqueue_dl_entity(struct sched_dl_entity *dl_se, int flags)
> setup_new_dl_entity(dl_se);
> }
>
> + // 100ms delay every 20 enqueues.
> + if (delay_inject_count++ > 20) {
> + mdelay(100);
> + delay_inject_count = 0;
> + }
> +
> /*
> * If we are still throttled, eg. we got replenished but are a
> * zero-laxity task and still got to wait, don't enqueue.
Makes sense, I am adding this in the defer patch v6 as it is a fix for it...
-- Daniel
Powered by blists - more mailing lists