linux-kernel - Re: [RT BUG] Stall caused by eventpoll, rwlocks and CFS bandwidth controller

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2637dc3c-80b5-40d7-b0e1-22ccdeba848d@amd.com>
Date: Mon, 14 Apr 2025 20:48:15 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
CC: Jan Kiszka <jan.kiszka@...mens.com>, Aaron Lu <ziqianlu@...edance.com>,
	Valentin Schneider <vschneid@...hat.com>, <linux-rt-users@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>, Thomas Gleixner <tglx@...utronix.de>, Juri
 Lelli <juri.lelli@...hat.com>, Clark Williams <williams@...hat.com>, "Luis
 Claudio R. Goncalves" <lgoncalv@...hat.com>, Andreas Ziegler
	<ziegler.andreas@...mens.com>, Felix Moessbauer
	<felix.moessbauer@...mens.com>, Florian Bezdeka <florian.bezdeka@...mens.com>
Subject: Re: [RT BUG] Stall caused by eventpoll, rwlocks and CFS bandwidth
 controller

Hello Sebastian,

On 4/14/2025 8:35 PM, Sebastian Andrzej Siewior wrote:
> On 2025-04-14 20:20:04 [+0530], K Prateek Nayak wrote:
>> Note: I could not reproduce the splat with !PREEMPT_RT kernel
>> (CONFIG_PREEMPT=y) or with small loops counts that don't exhaust the
>> cfs bandwidth.
> 
> Not sure what this has to do with anything.

Let me clarify a bit more:

- Fair task with cfs_bandwidth limits triggers the prctl(666, 50000000)

- The prctl() takes a read_lock_irq() excpet on PREEMPT_RT this does not
   disable interrupt.

- I take a dummy lock to stall the preemption

- Within the read_lock critical section, I queue a timer that takes the
   read_lock.

- I also wakeup up a high priority RT task that that takes the
   write_lock

As soon as I drop the dummy raw_spin_lock:

- High priority RT task runs, tries to take the write_lock but cannot
   since the preempted fair task has the read end still.

- Next ktimerd runs trying to grab the read_lock() but is put in the
   slowpath since ktimerd has tried to take the write_lock

- The fair task runs out of bandwidth and is preempted but this requires
   the ktimerd to run the replenish function which is queued behind the
   already preempted timer function trying to grab the read_lock()

Isn't this the scenario that Valentin's original summary describes?
If I've got something wrong please do correct me.

> On !RT the read_lock() in the timer can be acquired even with a pending
> writer. The writer keeps spinning until the main thread is gone. There
> should be no RCU boosting but the RCU still is there, too.

On !RT, the read_lock_irq() in fair task will not be preempted in the
first place so progress is guaranteed that way right?

> 
> On RT the read_lock() in the timer block, the write blocks, too. So
> every blocker on the lock is scheduled out until the reader is gone. On
> top of that, the reader gets RCU boosted with FIFO-1 by default to get
> out.

Except there is a circular dependency now:

- fair task needs bandwidth replenishment to progress and drop lock.
- rt task needs fair task to drop the lock and grab the write end.
- ktimerd requires rt task to grab and drop the lock to make progress.

I'm fairly new to the PREEMPT_RT bits so if I've missed something,
please do let me know and sorry for any noise.

> 
> Sebastian

-- 
Thanks and Regards,
Prateek