linux-kernel - Re: [PATCH v2 11/15] sched/deadline: Mark DL server as unthrottled before enqueue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <51540780-580f-47ff-ae0a-7d335c0702f7@redhat.com>
Date: Fri, 5 Apr 2024 10:54:00 +0200
From: Daniel Bristot de Oliveira <bristot@...hat.com>
To: "Joel Fernandes (Google)" <joel@...lfernandes.org>,
 linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
 Peter Zijlstra <peterz@...radead.org>, Juri Lelli <juri.lelli@...hat.com>,
 Vincent Guittot <vincent.guittot@...aro.org>,
 Dietmar Eggemann <dietmar.eggemann@....com>,
 Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
 Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>
Cc: Suleiman Souhlal <suleiman@...gle.com>,
 Youssef Esmat <youssefesmat@...gle.com>, David Vernet <void@...ifault.com>,
 Thomas Gleixner <tglx@...utronix.de>, "Paul E . McKenney"
 <paulmck@...nel.org>, joseph.salisbury@...onical.com,
 Luca Abeni <luca.abeni@...tannapisa.it>,
 Tommaso Cucinotta <tommaso.cucinotta@...tannapisa.it>,
 Vineeth Pillai <vineeth@...byteword.org>,
 Shuah Khan <skhan@...uxfoundation.org>, Phil Auld <pauld@...hat.com>
Subject: Re: [PATCH v2 11/15] sched/deadline: Mark DL server as unthrottled
 before enqueue

On 3/13/24 02:24, Joel Fernandes (Google) wrote:
> The DL server may not have had its timer started if start_dl_timer()
> returns 0 (say the zero-laxity time has already passed). In such cases,
> mark the DL task which is about to be enqueued as not throttled and
> cancel any previous timers, then do the enqueue.
> 
> This fixes the following crash:
> 
> [    9.263331] kernel BUG at kernel/sched/deadline.c:1765!
> [    9.282382] Call Trace:
> [    9.282767]  <TASK>
> [    9.283086]  ? __die_body+0x62/0xb0
> [    9.283602]  ? die+0x9b/0xc0
> [    9.284036]  ? do_trap+0xa3/0x170
> [    9.284528]  ? enqueue_dl_entity+0x45e/0x460
> [    9.285158]  ? enqueue_dl_entity+0x45e/0x460
> [    9.285791]  ? handle_invalid_op+0x65/0x80
> [    9.286392]  ? enqueue_dl_entity+0x45e/0x460
> [    9.287021]  ? exc_invalid_op+0x2f/0x40
> [    9.287585]  ? asm_exc_invalid_op+0x16/0x20
> [    9.288200]  ? find_later_rq+0x120/0x120
> [    9.288775]  ? fair_server_init+0x40/0x40
> [    9.289364]  ? enqueue_dl_entity+0x45e/0x460
> [    9.289989]  ? find_later_rq+0x120/0x120
> [    9.290564]  dl_task_timer+0x1d7/0x2f0
> [    9.291120]  ? find_later_rq+0x120/0x120
> [    9.291695]  __run_hrtimer+0x73/0x1b0
> [    9.292238]  hrtimer_interrupt+0x216/0x2c0
> [    9.292841]  __sysvec_apic_timer_interrupt+0x53/0x140
> [    9.293581]  sysvec_apic_timer_interrupt+0x2d/0x80
> [    9.294285]  asm_sysvec_apic_timer_interrupt+0x16/0x20
> 
> The crash can easily be reproduced by adding a 100ms delay as follows:
> 
> +int delay_inject_count;
> +
>  static void
>  enqueue_dl_entity(struct sched_dl_entity *dl_se, int flags)
>  {
> @@ -1827,6 +1830,12 @@ enqueue_dl_entity(struct sched_dl_entity *dl_se, int flags)
>                 setup_new_dl_entity(dl_se);
>         }
> 
> +       // 100ms delay every 20 enqueues.
> +       if (delay_inject_count++ > 20) {
> +               mdelay(100);
> +               delay_inject_count = 0;
> +       }
> +
>         /*
>          * If we are still throttled, eg. we got replenished but are a
>          * zero-laxity task and still got to wait, don't enqueue.

Makes sense, I am adding this in the defer patch v6 as it is a fix for it...

-- Daniel