linux-kernel - Re: [RFC PATCH v2 0/7] Defer throttle when task exits to user

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <f320d90db90df2d9583a1af4d83880f052768a64.camel@siemens.com>
Date: Tue, 22 Apr 2025 16:54:28 +0200
From: Florian Bezdeka <florian.bezdeka@...mens.com>
To: K Prateek Nayak <kprateek.nayak@....com>, Aaron Lu
 <ziqianlu@...edance.com>
Cc: Jan Kiszka <jan.kiszka@...mens.com>, Valentin Schneider	
 <vschneid@...hat.com>, Ben Segall <bsegall@...gle.com>, Peter Zijlstra	
 <peterz@...radead.org>, Josh Don <joshdon@...gle.com>, Ingo Molnar	
 <mingo@...hat.com>, Vincent Guittot <vincent.guittot@...aro.org>, Xi Wang	
 <xii@...gle.com>, linux-kernel@...r.kernel.org, Juri Lelli
 <juri.lelli@...hat.com>,  Dietmar Eggemann <dietmar.eggemann@....com>,
 Steven Rostedt <rostedt@...dmis.org>, Mel Gorman <mgorman@...e.de>, 
 Chengming Zhou <chengming.zhou@...ux.dev>, Chuyi Zhou
 <zhouchuyi@...edance.com>, "Sebastian Andrzej Siewior,"	
 <bigeasy@...utronix.de>
Subject: Re: [RFC PATCH v2 0/7] Defer throttle when task exits to user

On Tue, 2025-04-22 at 08:24 +0530, K Prateek Nayak wrote:
> Hello Aaron,
> 
> On 4/22/2025 7:40 AM, Aaron Lu wrote:
> > > anon_pipe_write()
> > >    __wake_up_common()
> > >      ep_poll_callback() {
> > >        read_lock_irq(&ep->lock)		/* Read lock acquired here */
> > I was confused by this function's name. I had thought irq is off but
> > then realized under PREEMPT_RT, read_lock_irq() doesn't disable irq...
> 
> Yup! Most of the interrupt handlers are run by the IRQ threads on
> PREEMPT_RT and the ones that do run in the interrupt context have all
> been adapted to use non-blocking locks whose *_irq variants disables
> interrupts on PREEMPT_RT too.
> 
> > 
> > >        __wake_up_common()
> > >          ep_autoremove_wake_function()
> > >            try_to_wake_up()		/* Wakes up "epoll-stall" */
> > >              preempt_schedule()
> > >              ...
> > > 
> > > # "epoll-stall-writer" has run out of bandwidth, needs replenish to run
> > Luckily in this "only throttle when ret2user" model, epoll-stall-writer
> > does not need replenish to run again(and then unblock the others).
> 
> I can confirm that throttle deferral solves this issue. I have run Jan's
> reproducer for a long time without seeing any hangs on your series. I
> hope Florian can confirm the same.
> 

Partially, yes.

First, let me clarify what I am testing: I'm testing with PREEMPT_RT
enabled, as that is the setup that makes problems in the field. For
those setups it's not a performance/jitter optimization it's a critical
fix. The system locks up completely.

I ported the series to 6.14. Background was stability and the
possibility to replace one of the devices in the field with a patched
version. We do not trust anything newer yet.

The test results: 6.14 + backport is still running fine for ~10 days
now on a system where the reproducer (that Jan posted already) crashed
a unpatched 6.14 in a couple of minutes. Success.

But: I also started a test with 6.14 vanilla (so unpatched) on a
different system. This one crashes within a couple of minutes. This is
a completely different story - as the series we're discussing here is
not even applied - but to be complete, this is the last message we get
from the device:

The device is completely locked up afterwards. PID 34 is ktimers on
CPU1.

kernel: ------------[ cut here ]------------
kernel: !se->on_rq
kernel: WARNING: CPU: 1 PID: 34 at kernel/sched/fair.c:699 update_entity_lag+0x7d/0x90
kernel: Modules linked in: veth xt_nat nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink xfr>
kernel:  sd_mod mptspi ata_generic mptscsih mptbase psmouse scsi_transport_spi ata_piix libata scs>
kernel: CPU: 1 UID: 0 PID: 34 Comm: ktimers/1 Not tainted 6.14.0 #1
kernel: Hardware name: VMware, Inc. VMware7,1/440BX Desktop Reference Platform, BIOS VMW71.00V.242>
kernel: RIP: 0010:update_entity_lag+0x7d/0x90
kernel: Code: 0f 4d d7 48 89 53 78 5b 5d c3 cc cc cc cc 80 3d e7 f4 dd 01 00 75 a9 48 c7 c7 d0 81 >
kernel: RSP: 0018:ffffacf58012fbe8 EFLAGS: 00010082
kernel: RAX: 0000000000000000 RBX: ffff9ee43ca00080 RCX: 0000000000000027
kernel: RDX: ffff9ee6efd21988 RSI: 0000000000000001 RDI: ffff9ee6efd21980
kernel: RBP: ffff9ee421929800 R08: 00000000462951bd R09: ffffffff8e654811
kernel: R10: ffffffff8e654811 R11: ffffffff8e608a2a R12: 000000000000000e
kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 000000000000000e
kernel: FS:  0000000000000000(0000) GS:ffff9ee6efd00000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 000000c00082a000 CR3: 0000000113416002 CR4: 00000000007706f0
kernel: PKRU: 55555554
kernel: Call Trace:
kernel:  <TASK>
kernel:  ? __warn+0x91/0x190
kernel:  ? update_entity_lag+0x7d/0x90
kernel:  ? report_bug+0x164/0x190
kernel:  ? handle_bug+0x58/0x90
kernel:  ? exc_invalid_op+0x17/0x70
kernel:  ? asm_exc_invalid_op+0x1a/0x20
kernel:  ? ret_from_fork_asm+0x1a/0x30
kernel:  ? ret_from_fork+0x31/0x50
kernel:  ? ret_from_fork+0x31/0x50
kernel:  ? update_entity_lag+0x7d/0x90
kernel:  ? update_entity_lag+0x7d/0x90
kernel:  dequeue_entity+0x90/0x5a0
kernel:  dequeue_entities+0x121/0x640
kernel:  dequeue_task_fair+0xbf/0x290
kernel:  rt_mutex_setprio+0x37c/0x690
kernel:  rtlock_slowlock_locked+0xca1/0x1860
kernel:  ? lock_acquire+0xcb/0x2e0
kernel:  ? run_ktimerd+0xe/0x80
kernel:  ? __pfx_smpboot_thread_fn+0x10/0x10
kernel:  rt_spin_lock+0x86/0x160
kernel:  __local_bh_disable_ip+0x9d/0x190
kernel:  ksoftirqd_run_begin+0xe/0x30
kernel:  run_ktimerd+0xe/0x80
kernel:  smpboot_thread_fn+0xda/0x1d0