linux-kernel - Re: WARNING: CPU: 0 PID: 0 at drivers/irqchip/irq-gic-v3-its.c

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <960D7493-0D43-40A3-9441-CF7F0D76A534@gmx.us>
Date:   Fri, 9 Nov 2018 13:41:03 -0500
From:   Qian Cai <cai@....us>
To:     Marc Zyngier <marc.zyngier@....com>
Cc:     Sudeep Holla <sudeep.holla@....com>,
        open list <linux-kernel@...r.kernel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ard Biesheuvel <ard.biesheuvel@...aro.org>,
        Jason Cooper <jason@...edaemon.net>
Subject: Re: WARNING: CPU: 0 PID: 0 at drivers/irqchip/irq-gic-v3-its.c



> On Nov 9, 2018, at 12:41 PM, Marc Zyngier <marc.zyngier@....com> wrote:
> 
> On 09/11/18 17:28, Sudeep Holla wrote:
>> On Fri, Nov 9, 2018 at 4:10 PM Marc Zyngier <marc.zyngier@....com> wrote:
>>> 
>> [...]
>> 
>>> 
>>> See bb42ca474010 and d003d029cea8 for details.
>>> 
>>> Now, activating this workaround leads to lockdep being really angry,
>>> most likely because the cpus_read_lock is not taken, which is a change
>>> in behaviour...
>>> 
>>> I'm trying to dig into this now.
>>> 
>> 
>> Yes we found similar issue in kernel/sched/core.c sched_init_smp
>> There's a fix with detailed description in -next
>> (Commit 40fa3780bac2 ("sched/core: Take the hotplug lock in sched_init_smp()")
>> 
>> The behaviour changed since  commit cb538267ea1e ("jump_label/lockdep:
>> Assert we hold the hotplug lock for _cpuslocked() operations")
> 
> I indeed came to the same conclusion, but the fix is slightly less than
> obvious. I have the following arm64-specific crap, but it is pretty
> terrible:
> 
> diff --git a/arch/arm64/kernel/time.c b/arch/arm64/kernel/time.c
> index f258636273c9..9e96e9eaca9b 100644
> --- a/arch/arm64/kernel/time.c
> +++ b/arch/arm64/kernel/time.c
> @@ -36,6 +36,7 @@
> #include <linux/clocksource.h>
> #include <linux/clk-provider.h>
> #include <linux/acpi.h>
> +#include <linux/cpu.h>
> 
> #include <clocksource/arm_arch_timer.h>
> 
> @@ -69,7 +70,9 @@ void __init time_init(void)
> 	u32 arch_timer_rate;
> 
> 	of_clk_init(NULL);
> +	cpus_read_lock();
> 	timer_probe();
> +	cpus_read_unlock();
> 
> 	tick_setup_hrtimer_broadcast();
> 
> Qian, can you please let me know if this helps? If it does, we'll have
> to think of something a bit better…
After applied the above patch, the original warning is gone but there
Is now a new warning.

> [    0.000000] rcu: 	Offload RCU callbacks from CPUs: (none).
> [    0.000000] 
> [    0.000000] ======================================================
> [    0.000000] WARNING: possible circular locking dependency detected
> [    0.000000] 4.20.0-rc1+ #10 Tainted: G                T
> [    0.000000] ------------------------------------------------------
> [    0.000000] swapper/0/0 is trying to acquire lock:
> [    0.000000] (____ptrval____) (acpi_probe_mutex){....}, at: __acpi_probe_device_table+0xac/0x1ec
> [    0.000000] 
> [    0.000000] but task is already holding lock:
> [    0.000000] (____ptrval____) (cpu_hotplug_lock.rw_sem){....}, at: time_init+0x44/0xa0
> [    0.000000] 
> [    0.000000] which lock already depends on the new lock.
> [    0.000000] 
> [    0.000000] 
> [    0.000000] the existing dependency chain (in reverse order) is:
> [    0.000000] 
> [    0.000000] -> #1 (cpu_hotplug_lock.rw_sem){....}:
> [    0.000000]        __lock_acquire+0x3cc/0x858
> [    0.000000]        lock_acquire+0x124/0x330
> [    0.000000]        cpus_read_lock+0x6c/0x100
> [    0.000000]        __cpuhp_setup_state+0x38/0x78
> [    0.000000]        gic_init_bases+0x3ac/0x5d8
> [    0.000000]        gic_acpi_init+0x2cc/0x564
> [    0.000000]        acpi_match_madt+0x9c/0x15c
> [    0.000000]        acpi_table_parse_entries_array+0x3e0/0x5d8
> [    0.000000]        acpi_table_parse_entries+0xbc/0x114
> [    0.000000]        acpi_table_parse_madt+0x4c/0x80
> [    0.000000]        __acpi_probe_device_table+0x134/0x1ec
> [    0.000000]        irqchip_init+0x48/0x74
> [    0.000000]        init_IRQ+0xe4/0x12c
> [    0.000000]        start_kernel+0x4d0/0x7d4
> [    0.000000] 
> [    0.000000] -> #0 (acpi_probe_mutex){....}:
> [    0.000000]        validate_chain.isra.19+0xcd8/0x1158
> [    0.000000]        __lock_acquire+0x3cc/0x858
> [    0.000000]        lock_acquire+0x124/0x330
> [    0.000000]        __mutex_lock+0x110/0xa68
> [    0.000000]        mutex_lock_nested+0x3c/0x50
> [    0.000000]        __acpi_probe_device_table+0xac/0x1ec
> [    0.000000]        timer_probe+0x1bc/0x254
> [    0.000000]        time_init+0x48/0xa0
> [    0.000000]        start_kernel+0x4ec/0x7d4
> [    0.000000] 
> [    0.000000] other info that might help us debug this:
> [    0.000000] 
> [    0.000000]  Possible unsafe locking scenario:
> [    0.000000] 
> [    0.000000]        CPU0                    CPU1
> [    0.000000]        ----                    ----
> [    0.000000]   lock(cpu_hotplug_lock.rw_sem);
> [    0.000000]                                lock(acpi_probe_mutex);
> [    0.000000]                                lock(cpu_hotplug_lock.rw_sem);
> [    0.000000]   lock(acpi_probe_mutex);
> [    0.000000] 
> [    0.000000]  *** DEADLOCK ***
> [    0.000000] 
> [    0.000000] 1 lock held by swapper/0/0:
> [    0.000000]  #0: (____ptrval____) (cpu_hotplug_lock.rw_sem){....}, at: time_init+0x44/0xa0
> [    0.000000] 
> [    0.000000] stack backtrace:
> [    0.000000] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G                T 4.20.0-rc1+ #10
> [    0.000000] Call trace:
> [    0.000000]  dump_backtrace+0x0/0x248
> [    0.000000]  show_stack+0x24/0x30
> [    0.000000]  dump_stack+0xb8/0xf4
> [    0.000000]  print_circular_bug.isra.15+0x240/0x368
> [    0.000000]  check_prev_add.constprop.24+0x444/0xa38
> [    0.000000]  validate_chain.isra.19+0xcd8/0x1158
> [    0.000000]  __lock_acquire+0x3cc/0x858
> [    0.000000]  lock_acquire+0x124/0x330
> [    0.000000]  __mutex_lock+0x110/0xa68
> [    0.000000]  mutex_lock_nested+0x3c/0x50
> [    0.000000]  __acpi_probe_device_table+0xac/0x1ec
> [    0.000000]  timer_probe+0x1bc/0x254
> [    0.000000]  time_init+0x48/0xa0
> [    0.000000]  start_kernel+0x4ec/0x7d4
> [    0.000000] arch_timer: Enabling global workaround for HiSilicon erratum 161010101
> [    0.000000] arch_timer: CPU0: Trapping CNTVCT access
> [    0.000000] arch_timer: cp15 timer(s) running at 50.00MHz (phys).
> [    0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0xb8812736b, max_idle_ns: 440795202655 ns
> [    0.000002] sched_clock: 56 bits at 50MHz, resolution 20ns, wraps every 4398046511100ns