[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <462ce651-4c10-9b76-3f51-6915f38b5158@gmx.us>
Date: Sat, 1 Dec 2018 23:16:07 -0500
From: Qian Cai <cai@....us>
To: Marc Zyngier <marc.zyngier@....com>
Cc: Sudeep Holla <sudeep.holla@....com>,
open list <linux-kernel@...r.kernel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ard Biesheuvel <ard.biesheuvel@...aro.org>,
Jason Cooper <jason@...edaemon.net>
Subject: Re: WARNING: CPU: 0 PID: 0 at drivers/irqchip/irq-gic-v3-its.c
On 11/12/18 3:39 AM, Marc Zyngier wrote:
> On Fri, 09 Nov 2018 18:41:03 +0000,
> Qian Cai <cai@....us> wrote:
>>
>>
>>
>>> On Nov 9, 2018, at 12:41 PM, Marc Zyngier <marc.zyngier@....com> wrote:
>>>
>>> On 09/11/18 17:28, Sudeep Holla wrote:
>>>> On Fri, Nov 9, 2018 at 4:10 PM Marc Zyngier <marc.zyngier@....com> wrote:
>>>>>
>>>> [...]
>>>>
>>>>>
>>>>> See bb42ca474010 and d003d029cea8 for details.
>>>>>
>>>>> Now, activating this workaround leads to lockdep being really angry,
>>>>> most likely because the cpus_read_lock is not taken, which is a change
>>>>> in behaviour...
>>>>>
>>>>> I'm trying to dig into this now.
>>>>>
>>>>
>>>> Yes we found similar issue in kernel/sched/core.c sched_init_smp
>>>> There's a fix with detailed description in -next
>>>> (Commit 40fa3780bac2 ("sched/core: Take the hotplug lock in sched_init_smp()")
>>>>
>>>> The behaviour changed since commit cb538267ea1e ("jump_label/lockdep:
>>>> Assert we hold the hotplug lock for _cpuslocked() operations")
>>>
>>> I indeed came to the same conclusion, but the fix is slightly less than
>>> obvious. I have the following arm64-specific crap, but it is pretty
>>> terrible:
>>>
>>> diff --git a/arch/arm64/kernel/time.c b/arch/arm64/kernel/time.c
>>> index f258636273c9..9e96e9eaca9b 100644
>>> --- a/arch/arm64/kernel/time.c
>>> +++ b/arch/arm64/kernel/time.c
>>> @@ -36,6 +36,7 @@
>>> #include <linux/clocksource.h>
>>> #include <linux/clk-provider.h>
>>> #include <linux/acpi.h>
>>> +#include <linux/cpu.h>
>>>
>>> #include <clocksource/arm_arch_timer.h>
>>>
>>> @@ -69,7 +70,9 @@ void __init time_init(void)
>>> u32 arch_timer_rate;
>>>
>>> of_clk_init(NULL);
>>> + cpus_read_lock();
>>> timer_probe();
>>> + cpus_read_unlock();
>>>
>>> tick_setup_hrtimer_broadcast();
>>>
>>> Qian, can you please let me know if this helps? If it does, we'll have
>>> to think of something a bit better…
>> After applied the above patch, the original warning is gone but there
>> Is now a new warning.
>
> [...]
>
> Which was ful;ly expected, given that I've taken the cpu lock at some
> semi-random location. I'll try to talk to PeterZ this week to try and
> solve this.
>
Marc, did you have a chance to investigate this further? I have still seen it in
the latest mainline today. This is the only warning left on this Huawei TaiShan
2280 server now after confirmed that those GICv3 warnings were gone.
Powered by blists - more mailing lists