lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20230516141131.fScCnP3q@linutronix.de>
Date:   Tue, 16 May 2023 16:11:31 +0200
From:   Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To:     Valentin Schneider <vschneid@...hat.com>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        LKML <linux-kernel@...r.kernel.org>,
        linux-rt-users@...r.kernel.org,
        Steven Rostedt <rostedt@...dmis.org>,
        Juri Lelli <juri.lelli@...hat.com>
Subject: Re: [ANNOUNCE] v6.3.1-rt13

On 2023-05-10 12:37:42 [+0100], Valentin Schneider wrote:
> The ktimersd threads solved some priority inversion problem we were seeing,
> IIRC it looked something like so:
> - GP kthread is waiting on swait_event_idle_timeout_exclusive(...)
> - p0 (CFS NICE0) did spin_lock(L) then got throttled by CFS bandwidth
> - p1 (CFS NICE0) did local_bh_disable() + did spin_lock(L)
> 
> So p0 owns L, but cannot get bandwidth replenished since local softirqs are
> disabled, and the GP kthread can't be woken up by timeout to initiate
> boosting either.
> 
> Even if ksoftirqd has its priority tuned to ensure timers can be expired,
> the above never wakes ksoftirqd due to:
> 
> static inline bool should_wake_ksoftirqd(void)
> {
>         return !this_cpu_read(softirq_ctrl.cnt);
> }
> 
> on the other hand, ktimersd are woken up unconditionally, so in this
> scenario it gets to run and donate its priority via
> 
>   ksoftirqd_run_begin()
>   `\
>     local_lock(&softirq_ctrl.lock)
> 
> (note that this only solves the CFS bandwidth issue if ktimersd are FIFO or
> above, but they are spawned as FIFO1)
> 
> 
> TL;DR: for RT, I think we should also kill should_wake_ksoftirqd()

If I remember correctly this check was to avoid waking ksoftirqd because
softirqs are already handled. In this case the systems stalls until p0/1
makes some progress. Waking ksoftirqd makes sense if its scheduling
policy is elevated.

Now we need overloading strategy since the current idea is to solve it
by moving everything to ksoftirqd and letting it run at SCHED_OTHER.

Sebastian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ