[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <7168A4A4-E735-4809-B80A-389990603EB8@lca.pw>
Date: Mon, 13 Jan 2020 01:30:21 -0500
From: Qian Cai <cai@....pw>
To: Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
Cc: Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>, juri.lelli@...hat.com,
vincent.guittot@...aro.org, dietmar.eggemann@....com,
"Steven Rostedt (VMware)" <rostedt@...dmis.org>,
bsegall@...gle.com, mgorman@...e.de, paulmck@...nel.org,
tglx@...utronix.de, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] sched/core: fix illegal RCU from offline CPUs
> On Jan 12, 2020, at 7:33 PM, Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp> wrote:
>
> On 2020/01/13 1:17, Qian Cai wrote:
>> In the CPU-offline process, it calls mmdrop() after idle entry and the
>> subsequent call to cpuhp_report_idle_dead(). Once execution passes the
>> call to rcu_report_dead(), RCU is ignoring the CPU, which results in
>> lockdep complaints when mmdrop() uses RCU from either memcg or
>> debugobjects. Fix it by scheduling mmdrop() on another online CPU.
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 90e4b00ace89..41fb49f3dfce 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -6194,7 +6194,8 @@ void idle_task_exit(void)
>> current->active_mm = &init_mm;
>> finish_arch_post_lock_switch();
>> }
>> - mmdrop(mm);
>> + smp_call_function_single(cpumask_first(cpu_online_mask),
>> + (void (*)(void *))mmdrop, mm, 0);
>
> mmdrop() might sleep, but
If that is the case, and then the commit e78a7614f387 (“idle: Prevent
late-arriving interrupts from disrupting offline”) is incorrect because it
will disable local irq before calling mmdrop() which will trigger
the might_sleep() warning. Can you prove it?
>
> /*
> * smp_call_function_single - Run a function on a specific CPU
> * @func: The function to run. This must be fast and non-blocking.
> * @info: An arbitrary pointer to pass to the function.
> * @wait: If true, wait until function has completed on other CPUs.
> *
> * Returns 0 on success, else a negative status code.
> */
>
> . Maybe mmdrop_async() instead?
Powered by blists - more mailing lists