linux-kernel - [REPORT] possible circular locking dependency when booting a VM on arm64 host

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <7225eba7-6e5e-ec7e-953b-d1fef0b1775b@huawei.com>
Date:   Thu, 9 Jul 2020 18:41:29 +0800
From:   Zenghui Yu <yuzenghui@...wei.com>
To:     Marc Zyngier <maz@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        "Linux Kernel Mailing List" <linux-kernel@...r.kernel.org>,
        <linux-arm-kernel@...ts.infradead.org>
CC:     <yisen.zhuang@...wei.com>, <salil.mehta@...wei.com>,
        <wanghaibin.wang@...wei.com>
Subject: [REPORT] possible circular locking dependency when booting a VM on
 arm64 host

Hi All,

I had seen the following lockdep splat when booting a guest on my
Kunpeng 920 with GICv4 enabled. I can also trigger the same splat
on v5.5 so it should already exist in the kernel for a while. I'm
not sure what the exact problem is and hope someone can have a look!


Thanks,
Zenghui

[  103.855511] ======================================================
[  103.861664] WARNING: possible circular locking dependency detected
[  103.867817] 5.8.0-rc4+ #35 Tainted: G        W
[  103.872932] ------------------------------------------------------
[  103.879083] CPU 2/KVM/20515 is trying to acquire lock:
[  103.884200] ffff202fcd5865b0 (&irq_desc_lock_class){-.-.}-{2:2}, at: 
__irq_get_desc_lock+0x60/0xa0
[  103.893127]
                but task is already holding lock:
[  103.898933] ffff202fcfd07f58 (&rq->lock){-.-.}-{2:2}, at: 
__schedule+0x114/0x8b8
[  103.906301]
                which lock already depends on the new lock.

[  103.914441]
                the existing dependency chain (in reverse order) is:
[  103.921888]
                -> #3 (&rq->lock){-.-.}-{2:2}:
[  103.927438]        _raw_spin_lock+0x54/0x70
[  103.931605]        task_fork_fair+0x48/0x150
[  103.935860]        sched_fork+0x100/0x268
[  103.939856]        copy_process+0x628/0x1868
[  103.944106]        _do_fork+0x74/0x710
[  103.947840]        kernel_thread+0x78/0xa0
[  103.951917]        rest_init+0x30/0x270
[  103.955742]        arch_call_rest_init+0x14/0x1c
[  103.960339]        start_kernel+0x534/0x568
[  103.964503]
                -> #2 (&p->pi_lock){-.-.}-{2:2}:
[  103.970224]        _raw_spin_lock_irqsave+0x70/0x98
[  103.975080]        try_to_wake_up+0x5c/0x5b0
[  103.979330]        wake_up_process+0x28/0x38
[  103.983581]        create_worker+0x128/0x1b8
[  103.987834]        workqueue_init+0x308/0x3bc
[  103.992172]        kernel_init_freeable+0x180/0x33c
[  103.997027]        kernel_init+0x18/0x118
[  104.001020]        ret_from_fork+0x10/0x18
[  104.005097]
                -> #1 (&pool->lock){-.-.}-{2:2}:
[  104.010817]        _raw_spin_lock+0x54/0x70
[  104.014983]        __queue_work+0x120/0x6e8
[  104.019146]        queue_work_on+0xa0/0xd8
[  104.023225]        irq_set_affinity_locked+0xa8/0x178
[  104.028253]        __irq_set_affinity+0x5c/0x90
[  104.032762]        irq_set_affinity_hint+0x74/0xb0
[  104.037540]        hns3_nic_init_irq+0xe0/0x210 [hns3]
[  104.042655]        hns3_client_init+0x2d8/0x4e0 [hns3]
[  104.047779]        hclge_init_client_instance+0xf0/0x3a8 [hclge]
[  104.053760]        hnae3_init_client_instance.part.3+0x30/0x68 [hnae3]
[  104.060257]        hnae3_register_ae_dev+0x100/0x1f0 [hnae3]
[  104.065892]        hns3_probe+0x60/0xa8 [hns3]
[  104.070319]        local_pci_probe+0x44/0x98
[  104.074573]        work_for_cpu_fn+0x20/0x30
[  104.078823]        process_one_work+0x258/0x618
[  104.083333]        worker_thread+0x1c0/0x438
[  104.087585]        kthread+0x120/0x128
[  104.091318]        ret_from_fork+0x10/0x18
[  104.095394]
                -> #0 (&irq_desc_lock_class){-.-.}-{2:2}:
[  104.101895]        __lock_acquire+0x11bc/0x1530
[  104.106406]        lock_acquire+0x100/0x3f8
[  104.110570]        _raw_spin_lock_irqsave+0x70/0x98
[  104.115426]        __irq_get_desc_lock+0x60/0xa0
[  104.120021]        irq_set_vcpu_affinity+0x48/0xc8
[  104.124793]        its_make_vpe_non_resident+0x6c/0xc0
[  104.129910]        vgic_v4_put+0x64/0x70
[  104.133815]        vgic_v3_put+0x28/0x100
[  104.137806]        kvm_vgic_put+0x3c/0x60
[  104.141801]        kvm_arch_vcpu_put+0x38/0x58
[  104.146228]        kvm_sched_out+0x38/0x58
[  104.150306]        __schedule+0x554/0x8b8
[  104.154298]        schedule+0x50/0xe0
[  104.157946]        kvm_arch_vcpu_ioctl_run+0x644/0x9e8
[  104.163063]        kvm_vcpu_ioctl+0x4b4/0x918
[  104.167403]        ksys_ioctl+0xb4/0xd0
[  104.171222]        __arm64_sys_ioctl+0x28/0xc8
[  104.175647]        el0_svc_common.constprop.2+0x74/0x138
[  104.180935]        do_el0_svc+0x34/0xa0
[  104.184755]        el0_sync_handler+0xec/0x128
[  104.189179]        el0_sync+0x140/0x180
[  104.192997]
                other info that might help us debug this:

[  104.200962] Chain exists of:
                  &irq_desc_lock_class --> &p->pi_lock --> &rq->lock

[  104.211261]  Possible unsafe locking scenario:

[  104.217152]        CPU0                    CPU1
[  104.221660]        ----                    ----
[  104.226170]   lock(&rq->lock);
[  104.229210]                                lock(&p->pi_lock);
[  104.234930]                                lock(&rq->lock);
[  104.240474]   lock(&irq_desc_lock_class);
[  104.244465]
                 *** DEADLOCK ***

[  104.250356] 2 locks held by CPU 2/KVM/20515:
[  104.254606]  #0: ffff202fa95680c8 (&vcpu->mutex){+.+.}-{3:3}, at: 
kvm_vcpu_ioctl+0x80/0x918
[  104.262921]  #1: ffff202fcfd07f58 (&rq->lock){-.-.}-{2:2}, at: 
__schedule+0x114/0x8b8
[  104.270717]
                stack backtrace:
[  104.275057] CPU: 73 PID: 20515 Comm: CPU 2/KVM Kdump: loaded Tainted: 
G        W         5.8.0-rc4+ #35
[  104.284404] Hardware name: Huawei TaiShan 2280 V2/BC82AMDDA, BIOS 
1.05 09/18/2019
[  104.291851] Call trace:
[  104.294289]  dump_backtrace+0x0/0x218
[  104.297935]  show_stack+0x2c/0x38
[  104.301235]  dump_stack+0xf0/0x168
[  104.304622]  print_circular_bug.isra.39+0x22c/0x280
[  104.309476]  check_noncircular+0x158/0x1c8
[  104.313555]  __lock_acquire+0x11bc/0x1530
[  104.317545]  lock_acquire+0x100/0x3f8
[  104.321190]  _raw_spin_lock_irqsave+0x70/0x98
[  104.325529]  __irq_get_desc_lock+0x60/0xa0
[  104.329606]  irq_set_vcpu_affinity+0x48/0xc8
[  104.333858]  its_make_vpe_non_resident+0x6c/0xc0
[  104.338454]  vgic_v4_put+0x64/0x70
[  104.341840]  vgic_v3_put+0x28/0x100
[  104.345314]  kvm_vgic_put+0x3c/0x60
[  104.348789]  kvm_arch_vcpu_put+0x38/0x58
[  104.352694]  kvm_sched_out+0x38/0x58
[  104.356256]  __schedule+0x554/0x8b8
[  104.359730]  schedule+0x50/0xe0
[  104.362859]  kvm_arch_vcpu_ioctl_run+0x644/0x9e8
[  104.367457]  kvm_vcpu_ioctl+0x4b4/0x918
[  104.371277]  ksys_ioctl+0xb4/0xd0
[  104.374577]  __arm64_sys_ioctl+0x28/0xc8
[  104.378482]  el0_svc_common.constprop.2+0x74/0x138
[  104.383250]  do_el0_svc+0x34/0xa0
[  104.386552]  el0_sync_handler+0xec/0x128
[  104.390458]  el0_sync+0x140/0x180