linux-kernel - Re: WARNING: CPU: 0 PID: 0 at drivers/irqchip/irq-gic-v3-its.c

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <462ce651-4c10-9b76-3f51-6915f38b5158@gmx.us>
Date:   Sat, 1 Dec 2018 23:16:07 -0500
From:   Qian Cai <cai@....us>
To:     Marc Zyngier <marc.zyngier@....com>
Cc:     Sudeep Holla <sudeep.holla@....com>,
        open list <linux-kernel@...r.kernel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ard Biesheuvel <ard.biesheuvel@...aro.org>,
        Jason Cooper <jason@...edaemon.net>
Subject: Re: WARNING: CPU: 0 PID: 0 at drivers/irqchip/irq-gic-v3-its.c



On 11/12/18 3:39 AM, Marc Zyngier wrote:
> On Fri, 09 Nov 2018 18:41:03 +0000,
> Qian Cai <cai@....us> wrote:
>>
>>
>>
>>> On Nov 9, 2018, at 12:41 PM, Marc Zyngier <marc.zyngier@....com> wrote:
>>>
>>> On 09/11/18 17:28, Sudeep Holla wrote:
>>>> On Fri, Nov 9, 2018 at 4:10 PM Marc Zyngier <marc.zyngier@....com> wrote:
>>>>>
>>>> [...]
>>>>
>>>>>
>>>>> See bb42ca474010 and d003d029cea8 for details.
>>>>>
>>>>> Now, activating this workaround leads to lockdep being really angry,
>>>>> most likely because the cpus_read_lock is not taken, which is a change
>>>>> in behaviour...
>>>>>
>>>>> I'm trying to dig into this now.
>>>>>
>>>>
>>>> Yes we found similar issue in kernel/sched/core.c sched_init_smp
>>>> There's a fix with detailed description in -next
>>>> (Commit 40fa3780bac2 ("sched/core: Take the hotplug lock in sched_init_smp()")
>>>>
>>>> The behaviour changed since  commit cb538267ea1e ("jump_label/lockdep:
>>>> Assert we hold the hotplug lock for _cpuslocked() operations")
>>>
>>> I indeed came to the same conclusion, but the fix is slightly less than
>>> obvious. I have the following arm64-specific crap, but it is pretty
>>> terrible:
>>>
>>> diff --git a/arch/arm64/kernel/time.c b/arch/arm64/kernel/time.c
>>> index f258636273c9..9e96e9eaca9b 100644
>>> --- a/arch/arm64/kernel/time.c
>>> +++ b/arch/arm64/kernel/time.c
>>> @@ -36,6 +36,7 @@
>>> #include <linux/clocksource.h>
>>> #include <linux/clk-provider.h>
>>> #include <linux/acpi.h>
>>> +#include <linux/cpu.h>
>>>
>>> #include <clocksource/arm_arch_timer.h>
>>>
>>> @@ -69,7 +70,9 @@ void __init time_init(void)
>>> 	u32 arch_timer_rate;
>>>
>>> 	of_clk_init(NULL);
>>> +	cpus_read_lock();
>>> 	timer_probe();
>>> +	cpus_read_unlock();
>>>
>>> 	tick_setup_hrtimer_broadcast();
>>>
>>> Qian, can you please let me know if this helps? If it does, we'll have
>>> to think of something a bit better…
>> After applied the above patch, the original warning is gone but there
>> Is now a new warning.
> 
> [...]
> 
> Which was ful;ly expected, given that I've taken the cpu lock at some
> semi-random location. I'll try to talk to PeterZ this week to try and
> solve this.
> 

Marc, did you have a chance to investigate this further? I have still seen it in
the latest mainline today. This is the only warning left on this Huawei TaiShan
2280 server now after confirmed that those GICv3 warnings were gone.