lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <62304351-7fc0-48b6-883b-d346886dac8e@amd.com>
Date: Tue, 15 Apr 2025 16:04:11 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Jan Kiszka <jan.kiszka@...mens.com>, Aaron Lu <ziqianlu@...edance.com>,
	Florian Bezdeka <florian.bezdeka@...mens.com>
CC: Valentin Schneider <vschneid@...hat.com>, Ben Segall <bsegall@...gle.com>,
	Peter Zijlstra <peterz@...radead.org>, Josh Don <joshdon@...gle.com>, Ingo
 Molnar <mingo@...hat.com>, Vincent Guittot <vincent.guittot@...aro.org>, Xi
 Wang <xii@...gle.com>, <linux-kernel@...r.kernel.org>, Juri Lelli
	<juri.lelli@...hat.com>, Dietmar Eggemann <dietmar.eggemann@....com>, Steven
 Rostedt <rostedt@...dmis.org>, Mel Gorman <mgorman@...e.de>, Chengming Zhou
	<chengming.zhou@...ux.dev>, Chuyi Zhou <zhouchuyi@...edance.com>, "Sebastian
 Andrzej Siewior," <bigeasy@...utronix.de>
Subject: Re: [RFC PATCH v2 0/7] Defer throttle when task exits to user

On 4/15/2025 2:15 PM, K Prateek Nayak wrote:
> (+ Sebastian)
> 
> Hello Jan,
> 
> On 4/15/2025 11:39 AM, Jan Kiszka wrote:
>>>> Attached the bits with which we succeeded, sometimes. Setup: Debian 12,
>>>> RT kernel, 2-4 cores VM, 1-5 instances of the test, 2 min - 2 h

To improve the reproducibility, I pinned the two tasks to the same CPU
as the bandwidth timer and I could hit this consistently within a few
minutes at the most.

>>>> patience. As we have to succeed with at least 3 race conditions in a
>>>> row, that is still not bad... But maybe someone has an idea how to
>>>> increase probabilities further.
>>>
>>> Looking at run.sh, there are only fair tasks with one of them being run
>>> with cfs bandwidth constraints. Are you saying something goes wrong on
>>> PREEMPT_RT as a result of using bandwidth control on fair tasks?
>>
>> Yes, exactly. Also our in-field workload that triggers (most likely)
>> this issue is not using RT tasks itself. Only kernel threads are RT here.
>>
>>>
>>> What exactly is the symptom you are observing? Does one of the assert()
>>> trip during the run? Do you see a stall logged on dmesg? Can you provide
>>> more information on what to expect in this 2min - 2hr window?
>>
>> I've just lost my traces from yesterday ("you have 0 minutes to find a
>> power adapter"), but I got nice RCU stall warnings in the VM, including
>> backtraces from the involved tasks (minus the read-lock holder IIRC).
>> Maybe Florian can drop one of his dumps.
> 
> So I ran your reproducer on a 2vCPU VM running v6.15-rc1 PREEMPT_RT
> and I saw:
> 
>      rcu: INFO: rcu_preempt self-detected stall on CPU
>      rcu:     0-...!: (15000 ticks this GP) idle=8a74/0/0x1 softirq=0/0 fqs=0
>      rcu:     (t=15001 jiffies g=12713 q=24 ncpus=2)
>      rcu: rcu_preempt kthread timer wakeup didn't happen for 15000 jiffies! g12713 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
>      rcu:     Possible timer handling issue on cpu=0 timer-softirq=17688
>      rcu: rcu_preempt kthread starved for 15001 jiffies! g12713 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
>      rcu:     Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
>      rcu: RCU grace-period kthread stack dump:
>      task:rcu_preempt     state:I stack:0     pid:17    tgid:17    ppid:2      task_flags:0x208040 flags:0x00004000
>      Call Trace:
>       <TASK>
>       __schedule+0x401/0x15a0
>       ? srso_alias_return_thunk+0x5/0xfbef5
>       ? lock_timer_base+0x77/0xb0
>       ? srso_alias_return_thunk+0x5/0xfbef5
>       ? __pfx_rcu_gp_kthread+0x10/0x10
>       schedule+0x27/0xd0
>       schedule_timeout+0x76/0x100
>       ? __pfx_process_timeout+0x10/0x10
>       rcu_gp_fqs_loop+0x10a/0x4b0
>       rcu_gp_kthread+0xd3/0x160
>       kthread+0xff/0x210
>       ? rt_spin_lock+0x3c/0xc0
>       ? __pfx_kthread+0x10/0x10
>       ret_from_fork+0x34/0x50
>       ? __pfx_kthread+0x10/0x10
>       ret_from_fork_asm+0x1a/0x30
>       </TASK>
>      CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.15.0-rc1-test-dirty #746 PREEMPT_{RT,(full)}
>      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
>      RIP: 0010:pv_native_safe_halt+0xf/0x20
>      Code: 22 df e9 1f 08 e5 fe 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa eb 07 0f 00 2d 85 96 15 00 fb f4 <e9> f7 07 e5 fe 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90
>      RSP: 0018:ffffffff95803e50 EFLAGS: 00000216
>      RAX: ffff8e2d61534000 RBX: 0000000000000000 RCX: 0000000000000000
>      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000081f8a6c
>      RBP: ffffffff9581d280 R08: 0000000000000000 R09: ffff8e2cf7d32301
>      R10: ffff8e2be11ae5c8 R11: 0000000000000001 R12: 0000000000000000
>      R13: 0000000000000000 R14: 0000000000000000 R15: 00000000000147b0
>      FS:  0000000000000000(0000) GS:ffff8e2d61534000(0000) knlGS:0000000000000000
>      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>      CR2: 000055e77c3a5128 CR3: 000000010ff78003 CR4: 0000000000770ef0
>      PKRU: 55555554
>      Call Trace:
>       <TASK>
>       default_idle+0x9/0x20
>       default_idle_call+0x30/0x100
>       do_idle+0x20f/0x250
>       ? do_idle+0xb/0x250
>       cpu_startup_entry+0x29/0x30
>       rest_init+0xde/0x100
>       start_kernel+0x733/0xb20
>       ? copy_bootdata+0x9/0xb0
>       x86_64_start_reservations+0x18/0x30
>       x86_64_start_kernel+0xba/0x110
>       common_startup_64+0x13e/0x141
>       </TASK>
> 
> Is this in line with what you are seeing?

These are the backtrace for timer and the individual
epoll-stall threads:

[  539.155042] task:ktimers/1       state:D stack:0     pid:31    tgid:31    ppid:2      task_flags:0x4208040 flags:0x00004000
[  539.155047] Call Trace:
[  539.155049]  <TASK>
[  539.155051]  __schedule+0x401/0x15a0
[  539.155055]  ? srso_alias_return_thunk+0x5/0xfbef5
[  539.155059]  ? propagate_entity_cfs_rq+0x115/0x290
[  539.155063]  ? srso_alias_return_thunk+0x5/0xfbef5
[  539.155067]  ? srso_alias_return_thunk+0x5/0xfbef5
[  539.155070]  ? rt_mutex_setprio+0x1c2/0x480
[  539.155075]  schedule_rtlock+0x1e/0x40
[  539.155078]  rtlock_slowlock_locked+0x20e/0xc60
[  539.155088]  ? srso_alias_return_thunk+0x5/0xfbef5
[  539.155093]  rt_read_lock+0x8f/0x190
[  539.155099]  ep_poll_callback+0x37/0x2b0
[  539.155105]  __wake_up_common+0x78/0xa0
[  539.155110]  timerfd_tmrproc+0x43/0x60
[  539.155114]  ? __pfx_timerfd_tmrproc+0x10/0x10
[  539.155116]  __hrtimer_run_queues+0xfd/0x2e0
[  539.155124]  hrtimer_run_softirq+0x9d/0xf0
[  539.155128]  handle_softirqs.constprop.0+0xc1/0x2a0
[  539.155134]  ? __pfx_smpboot_thread_fn+0x10/0x10
[  539.155139]  run_ktimerd+0x3e/0x80
[  539.155142]  smpboot_thread_fn+0xf3/0x220
[  539.155147]  kthread+0xff/0x210
[  539.155151]  ? rt_spin_lock+0x3c/0xc0
[  539.155155]  ? __pfx_kthread+0x10/0x10
[  539.155159]  ret_from_fork+0x34/0x50
[  539.155165]  ? __pfx_kthread+0x10/0x10
[  539.155168]  ret_from_fork_asm+0x1a/0x30
[  539.155176]  </TASK>

[  557.323846] task:epoll-stall     state:D stack:0     pid:885   tgid:885   ppid:1      task_flags:0x400000 flags:0x00004002
[  557.323848] Call Trace:
[  557.323849]  <TASK>
[  557.323851]  __schedule+0x401/0x15a0
[  557.323853]  ? rt_write_lock+0x108/0x260
[  557.323858]  schedule_rtlock+0x1e/0x40
[  557.323860]  rt_write_lock+0xaa/0x260
[  557.323864]  do_epoll_wait+0x21f/0x4a0
[  557.323869]  ? __pfx_ep_autoremove_wake_function+0x10/0x10
[  557.323872]  __x64_sys_epoll_wait+0x63/0x100
[  557.323876]  do_syscall_64+0x6f/0x120
[  557.323879]  ? srso_alias_return_thunk+0x5/0xfbef5
[  557.323881]  ? ksys_read+0x6b/0xe0
[  557.323883]  ? srso_alias_return_thunk+0x5/0xfbef5
[  557.323885]  ? syscall_exit_to_user_mode+0x51/0x1a0
[  557.323887]  ? srso_alias_return_thunk+0x5/0xfbef5
[  557.323889]  ? do_syscall_64+0x7b/0x120
[  557.323890]  ? srso_alias_return_thunk+0x5/0xfbef5
[  557.323892]  ? ep_send_events+0x26d/0x2b0
[  557.323896]  ? srso_alias_return_thunk+0x5/0xfbef5
[  557.323898]  ? do_epoll_wait+0x17e/0x4a0
[  557.323900]  ? srso_alias_return_thunk+0x5/0xfbef5
[  557.323902]  ? __rseq_handle_notify_resume+0xa7/0x500
[  557.323905]  ? srso_alias_return_thunk+0x5/0xfbef5
[  557.323907]  ? aa_file_perm+0x123/0x4e0
[  557.323911]  ? srso_alias_return_thunk+0x5/0xfbef5
[  557.323912]  ? get_nohz_timer_target+0x2a/0x180
[  557.323914]  ? srso_alias_return_thunk+0x5/0xfbef5
[  557.323916]  ? _copy_to_iter+0xa3/0x630
[  557.323920]  ? srso_alias_return_thunk+0x5/0xfbef5
[  557.323922]  ? timerqueue_add+0x6a/0xc0
[  557.323924]  ? srso_alias_return_thunk+0x5/0xfbef5
[  557.323926]  ? hrtimer_start_range_ns+0x2e7/0x4a0
[  557.323931]  ? srso_alias_return_thunk+0x5/0xfbef5
[  557.323932]  ? timerfd_read_iter+0x141/0x2b0
[  557.323934]  ? srso_alias_return_thunk+0x5/0xfbef5
[  557.323936]  ? security_file_permission+0x123/0x140
[  557.323940]  ? srso_alias_return_thunk+0x5/0xfbef5
[  557.323942]  ? vfs_read+0x264/0x340
[  557.323946]  ? srso_alias_return_thunk+0x5/0xfbef5
[  557.323948]  ? ksys_read+0x6b/0xe0
[  557.323950]  ? srso_alias_return_thunk+0x5/0xfbef5
[  557.323951]  ? syscall_exit_to_user_mode+0x51/0x1a0
[  557.323953]  ? srso_alias_return_thunk+0x5/0xfbef5
[  557.323955]  ? do_syscall_64+0x7b/0x120
[  557.323957]  ? srso_alias_return_thunk+0x5/0xfbef5
[  557.323958]  ? syscall_exit_to_user_mode+0x168/0x1a0
[  557.323960]  ? srso_alias_return_thunk+0x5/0xfbef5
[  557.323962]  ? do_syscall_64+0x7b/0x120
[  557.323963]  ? srso_alias_return_thunk+0x5/0xfbef5
[  557.323965]  ? do_syscall_64+0x7b/0x120
[  557.323967]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  557.323968] RIP: 0033:0x7f0371fd3dea
[  557.323970] RSP: 002b:00007ffd1062cd68 EFLAGS: 00000246 ORIG_RAX: 00000000000000e8
[  557.323971] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f0371fd3dea
[  557.323972] RDX: 0000000000000001 RSI: 00007ffd1062cda4 RDI: 0000000000000005
[  557.323973] RBP: 00007ffd1062ce00 R08: 0000000000000000 R09: 000055ea1d8f12a0
[  557.323974] R10: 00000000ffffffff R11: 0000000000000246 R12: 00007ffd1062cf18
[  557.323975] R13: 000055ea03253249 R14: 000055ea03255d80 R15: 00007f0372121040
[  557.323979]  </TASK>

[  557.431402] task:epoll-stall-wri state:R  running task     stack:0     pid:887   tgid:887   ppid:1      task_flags:0x400100 flags:0x00000002
[  557.431405] Call Trace:
[  557.431406]  <TASK>
[  557.431408]  __schedule+0x401/0x15a0
[  557.431410]  ? srso_alias_return_thunk+0x5/0xfbef5
[  557.431412]  ? srso_alias_return_thunk+0x5/0xfbef5
[  557.431414]  ? psi_group_change+0x212/0x460
[  557.431417]  ? pick_eevdf+0x71/0x180
[  557.431419]  ? srso_alias_return_thunk+0x5/0xfbef5
[  557.431421]  ? update_curr+0x8d/0x240
[  557.431425]  preempt_schedule+0x41/0x60
[  557.431427]  preempt_schedule_thunk+0x16/0x30
[  557.431431]  try_to_wake_up+0x2f6/0x6e0
[  557.431433]  ? srso_alias_return_thunk+0x5/0xfbef5
[  557.431437]  ep_autoremove_wake_function+0x12/0x40
[  557.431439]  __wake_up_common+0x78/0xa0
[  557.431443]  __wake_up_sync+0x34/0x50
[  557.431445]  ep_poll_callback+0x13e/0x2b0
[  557.431448]  ? aa_file_perm+0x123/0x4e0
[  557.431451]  __wake_up_common+0x78/0xa0
[  557.431454]  __wake_up_sync_key+0x38/0x50
[  557.431456]  anon_pipe_write+0x43b/0x6d0
[  557.431461]  fifo_pipe_write+0x13/0xe0
[  557.431463]  vfs_write+0x374/0x420
[  557.431468]  ksys_write+0xc9/0xe0
[  557.431471]  do_syscall_64+0x6f/0x120
[  557.431473]  ? current_time+0x30/0x130
[  557.431476]  ? srso_alias_return_thunk+0x5/0xfbef5
[  557.431479]  ? vfs_write+0x1bd/0x420
[  557.431481]  ? srso_alias_return_thunk+0x5/0xfbef5
[  557.431483]  ? vfs_write+0x1bd/0x420
[  557.431487]  ? srso_alias_return_thunk+0x5/0xfbef5
[  557.431489]  ? ksys_write+0xc9/0xe0
[  557.431491]  ? srso_alias_return_thunk+0x5/0xfbef5
[  557.431492]  ? srso_alias_return_thunk+0x5/0xfbef5
[  557.431494]  ? syscall_exit_to_user_mode+0x51/0x1a0
[  557.431496]  ? srso_alias_return_thunk+0x5/0xfbef5
[  557.431498]  ? do_syscall_64+0x7b/0x120
[  557.431501]  ? srso_alias_return_thunk+0x5/0xfbef5
[  557.431502]  ? ksys_write+0xc9/0xe0
[  557.431504]  ? do_syscall_64+0x7b/0x120
[  557.431506]  ? srso_alias_return_thunk+0x5/0xfbef5
[  557.431508]  ? syscall_exit_to_user_mode+0x51/0x1a0
[  557.431510]  ? srso_alias_return_thunk+0x5/0xfbef5
[  557.431511]  ? do_syscall_64+0x7b/0x120
[  557.431513]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  557.431515] RIP: 0033:0x7f0ef191c887
[  557.431516] RSP: 002b:00007ffc15f50948 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  557.431517] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f0ef191c887
[  557.431518] RDX: 0000000000000001 RSI: 0000561a16a0203d RDI: 0000000000000003
[  557.431519] RBP: 00007ffc15f50970 R08: 0000000000000065 R09: 0000561a45f0e2a0
[  557.431520] R10: 0000000000000077 R11: 0000000000000246 R12: 00007ffc15f50a88
[  557.431521] R13: 0000561a16a011a9 R14: 0000561a16a03da8 R15: 00007f0ef1a7b040
[  557.431525]  </TASK>

> 
>>
>>>
>>> Additionally, do you have RT throttling enabled in your setup? Can long
>>> running RT tasks starve fair tasks on your setup?
>>
>> RT throttling is enabled (default settings) but was not kicking in - why
>> should it in that scenario? The only RT thread, ktimers, ran into the
>> held lock and stopped.
>>
>> Jan
>>
> 

-- 
Thanks and Regards,
Prateek


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ