[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zp/k+rJuVV+EcXqL@tpad>
Date: Tue, 23 Jul 2024 14:14:34 -0300
From: Marcelo Tosatti <mtosatti@...hat.com>
To: Leonardo Bras <leobras@...hat.com>
Cc: Johannes Weiner <hannes@...xchg.org>, Michal Hocko <mhocko@...nel.org>,
Roman Gushchin <roman.gushchin@...ux.dev>,
Shakeel Butt <shakeel.butt@...ux.dev>,
Muchun Song <muchun.song@...ux.dev>,
Andrew Morton <akpm@...ux-foundation.org>,
Christoph Lameter <cl@...ux.com>, Pekka Enberg <penberg@...nel.org>,
David Rientjes <rientjes@...gle.com>,
Joonsoo Kim <iamjoonsoo.kim@....com>,
Vlastimil Babka <vbabka@...e.cz>,
Hyeonggon Yoo <42.hyeyoo@...il.com>,
Thomas Gleixner <tglx@...utronix.de>, linux-kernel@...r.kernel.org,
cgroups@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [RFC PATCH v1 0/4] Introduce QPW for per-cpu operations
On Sat, Jun 22, 2024 at 12:58:08AM -0300, Leonardo Bras wrote:
> The problem:
> Some places in the kernel implement a parallel programming strategy
> consisting on local_locks() for most of the work, and some rare remote
> operations are scheduled on target cpu. This keeps cache bouncing low since
> cacheline tends to be mostly local, and avoids the cost of locks in non-RT
> kernels, even though the very few remote operations will be expensive due
> to scheduling overhead.
>
> On the other hand, for RT workloads this can represent a problem: getting
> an important workload scheduled out to deal with remote requests is
> sure to introduce unexpected deadline misses.
Another hang with a busy polling workload (kernel update hangs on
grub2-probe):
[342431.665417] INFO: task grub2-probe:24484 blocked for more than 622 seconds.
[342431.665458] Tainted: G W X ------- --- 5.14.0-438.el9s.x86_64+rt #1
[342431.665488] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[342431.665515] task:grub2-probe state:D stack:0 pid:24484 ppid:24455 flags:0x00004002
[342431.665523] Call Trace:
[342431.665525] <TASK>
[342431.665527] __schedule+0x22a/0x580
[342431.665537] schedule+0x30/0x80
[342431.665539] schedule_timeout+0x153/0x190
[342431.665543] ? preempt_schedule_thunk+0x16/0x30
[342431.665548] ? preempt_count_add+0x70/0xa0
[342431.665554] __wait_for_common+0x8b/0x1c0
[342431.665557] ? __pfx_schedule_timeout+0x10/0x10
[342431.665560] __flush_work.isra.0+0x15b/0x220
[342431.665565] ? __pfx_wq_barrier_func+0x10/0x10
[342431.665570] __lru_add_drain_all+0x17d/0x220
[342431.665576] invalidate_bdev+0x28/0x40
[342431.665583] blkdev_common_ioctl+0x714/0xa30
[342431.665588] ? bucket_table_alloc.isra.0+0x1/0x150
[342431.665593] ? cp_new_stat+0xbb/0x180
[342431.665599] blkdev_ioctl+0x112/0x270
[342431.665603] ? security_file_ioctl+0x2f/0x50
[342431.665609] __x64_sys_ioctl+0x87/0xc0
[342431.665614] do_syscall_64+0x5c/0xf0
[342431.665619] ? __ct_user_enter+0x89/0x130
[342431.665623] ? syscall_exit_to_user_mode+0x22/0x40
[342431.665625] ? do_syscall_64+0x6b/0xf0
[342431.665627] ? __ct_user_enter+0x89/0x130
[342431.665629] entry_SYSCALL_64_after_hwframe+0x6e/0x76
[342431.665635] RIP: 0033:0x7f39856c757b
[342431.665666] RSP: 002b:00007ffd9541c488 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[342431.665670] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f39856c757b
[342431.665673] RDX: 0000000000000000 RSI: 0000000000001261 RDI: 0000000000000005
[342431.665674] RBP: 00007ffd9541c540 R08: 0000000000000003 R09: 006164732f766564
[342431.665676] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffd9543ca68
[342431.665678] R13: 000055ea758a0708 R14: 000055ea759de338 R15: 00007f398586f000
Powered by blists - more mailing lists