[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YoZGSd6yQL3EP8tk@qian>
Date: Thu, 19 May 2022 09:29:45 -0400
From: Qian Cai <quic_qiancai@...cinc.com>
To: "Paul E. McKenney" <paulmck@...nel.org>
CC: Mel Gorman <mgorman@...hsingularity.net>,
Andrew Morton <akpm@...ux-foundation.org>,
Nicolas Saenz Julienne <nsaenzju@...hat.com>,
Marcelo Tosatti <mtosatti@...hat.com>,
Vlastimil Babka <vbabka@...e.cz>,
Michal Hocko <mhocko@...nel.org>,
LKML <linux-kernel@...r.kernel.org>,
Linux-MM <linux-mm@...ck.org>, <kafai@...com>,
<kpsingh@...nel.org>
Subject: Re: [PATCH 0/6] Drain remote per-cpu directly v3
On Wed, May 18, 2022 at 10:15:03AM -0700, Paul E. McKenney wrote:
> So does this python script somehow change the tracing state? (It does
> not look to me like it does, but I could easily be missing something.)
No, I don't think so either. It pretty much just offline memory sections
one at a time.
> Either way, is there something else waiting for these RCU flavors?
> (There should not be.) Nevertheless, if so, there should be
> a synchronize_rcu_tasks(), synchronize_rcu_tasks_rude(), or
> synchronize_rcu_tasks_trace() on some other blocked task's stack
> somewhere.
There are only three blocked tasks when this happens. The kmemleak_scan()
is just the victim waiting for the locks taken by the stucking
offline_pages()->synchronize_rcu() task.
task:kmemleak state:D stack:25824 pid: 1033 ppid: 2 flags:0x00000008
Call trace:
__switch_to
__schedule
schedule
percpu_rwsem_wait
__percpu_down_read
percpu_down_read.constprop.0
get_online_mems
kmemleak_scan
kmemleak_scan_thread
kthread
ret_from_fork
task:cppc_fie state:D stack:23472 pid: 1848 ppid: 2 flags:0x00000008
Call trace:
__switch_to
__schedule
lockdep_recursion
task:tee state:D stack:24816 pid:16733 ppid: 16732 flags:0x0000020c
Call trace:
__switch_to
__schedule
schedule
schedule_timeout
__wait_for_common
wait_for_completion
__wait_rcu_gp
synchronize_rcu
lru_cache_disable
__alloc_contig_migrate_range
isolate_single_pageblock
start_isolate_page_range
offline_pages
memory_subsys_offline
device_offline
online_store
dev_attr_store
sysfs_kf_write
kernfs_fop_write_iter
new_sync_write
vfs_write
ksys_write
__arm64_sys_write
invoke_syscall
el0_svc_common.constprop.0
do_el0_svc
el0_svc
el0t_64_sync_handler
el0t_64_sync
> Or maybe something sleeps waiting for an RCU Tasks * callback to
> be invoked. In that case (and in the above case, for that matter),
> at least one of these pointers would be non-NULL on some CPU:
>
> 1. rcu_tasks__percpu.cblist.head
> 2. rcu_tasks_rude__percpu.cblist.head
> 3. rcu_tasks_trace__percpu.cblist.head
>
> The ->func field of the pointed-to structure contains a pointer to
> the callback function, which will help work out what is going on.
> (Most likely a wakeup being lost or not provided.)
What would be some of the easy ways to find out those? I can't see anything
interesting from the output of sysrq-t.
> Alternatively, if your system has hundreds of thousands of tasks and
> you have attached BPF programs to short-lived socket structures and you
> don't yet have the workaround, then you can see hangs. (I am working on a
> longer-term fix.) In the short term, applying the workaround is the right
> thing to do. (Adding a couple of the BPF guys on CC for their thoughts.)
The system is pretty much idle after a fresh reboot. The only workload is
to run the script.
Powered by blists - more mailing lists