linux-kernel - Re: WARNING: suspicious RCU usage - while installing a VM on a CPU listed under nohz

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CANRm+CzAPnGRA8PTFsPQVVwpLnbdt=jwocgOB2CG5Ti8KWV9ig@mail.gmail.com>
Date:   Fri, 31 Jul 2020 15:42:28 +0800
From:   Wanpeng Li <kernellwp@...il.com>
To:     Nitesh Narayan Lal <nitesh@...hat.com>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        KVM list <kvm@...r.kernel.org>,
        Wanpeng Li <wanpengli@...cent.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Sean Christopherson <sean.j.christopherson@...el.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Liran Alon <liran.alon@...cle.com>,
        "frederic@...nel.org" <frederic@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Juri Lelli <juri.lelli@...hat.com>
Subject: Re: WARNING: suspicious RCU usage - while installing a VM on a CPU
 listed under nohz_full

On Fri, 31 Jul 2020 at 06:45, Nitesh Narayan Lal <nitesh@...hat.com> wrote:
>
>
> On 7/29/20 8:34 AM, Nitesh Narayan Lal wrote:
> > On 7/28/20 10:38 PM, Wanpeng Li wrote:
> >> Hi Nitesh，
> >> On Wed, 29 Jul 2020 at 09:00, Wanpeng Li <kernellwp@...il.com> wrote:
> >>> On Tue, 28 Jul 2020 at 22:40, Nitesh Narayan Lal <nitesh@...hat.com> wrote:
> >>>> Hi,
> >>>>
> >>>> I have recently come across an RCU trace with the 5.8-rc7 kernel that has the
> >>>> debug configs enabled while installing a VM on a CPU that is listed under
> >>>> nohz_full.
> >>>>
> >>>> Based on some of the initial debugging, my impression is that the issue is
> >>>> triggered because of the fastpath that is meant to optimize the writes to x2APIC
> >>>> ICR that eventually leads to a virtual IPI in fixed delivery mode, is getting
> >>>> invoked from the quiescent state.
> >> Could you try latest linux-next tree? I guess maybe some patches are
> >> pending in linux-next tree, I can't reproduce against linux-next tree.
> > Sure, I will try this today.
>
> Hi Wanpeng,
>
> I am not seeing the issue getting reproduced with the linux-next tree.
> Although, I am still seeing a Warning stack trace:
>
> [  139.220080] RIP: 0010:kvm_arch_vcpu_ioctl_run+0xb57/0x1320 [kvm]
> [  139.226837] Code: e8 03 0f b6 04 18 84 c0 74 06 0f 8e 4a 03 00 00 41 c6 85 48
> 31 00 00 00 e9 24 f8 ff ff 4c 89 ef e8 7e ac 02 00 e9 3d f8 ff ff <0f> 0b e9 f2
> f8 ff ff 48f
> [  139.247828] RSP: 0018:ffff8889bc397cb8 EFLAGS: 00010202
> [  139.253700] RAX: 0000000000000001 RBX: dffffc0000000000 RCX: ffffffffc1fc3bef
> [  139.261695] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff888f0fa1a8a0
> [  139.269692] RBP: ffff8889bc397d18 R08: ffffed113786a7d0 R09: ffffed113786a7d0
> [  139.277686] R10: ffff8889bc353e7f R11: ffffed113786a7cf R12: ffff8889bc35423c
> [  139.285682] R13: ffff8889bc353e40 R14: ffff8889bc353e6c R15: ffff88897f536000
> [  139.293678] FS:  00007f3d8a71c700(0000) GS:ffff888a3c400000(0000)
> knlGS:0000000000000000
> [  139.302742] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  139.309186] CR2: 0000000000000000 CR3: 00000009bc34c004 CR4: 00000000003726e0
> [  139.317180] Call Trace:
> [  139.320002]  kvm_vcpu_ioctl+0x3ee/0xb10 [kvm]
> [  139.324907]  ? sched_clock+0x5/0x10
> [  139.328875]  ? kvm_io_bus_get_dev+0x1c0/0x1c0 [kvm]
> [  139.334375]  ? ioctl_file_clone+0x120/0x120
> [  139.339079]  ? selinux_file_ioctl+0x98/0x570
> [  139.343895]  ? selinux_file_mprotect+0x5b0/0x5b0
> [  139.349088]  ? irq_matrix_assign+0x360/0x430
> [  139.353904]  ? rcu_read_lock_sched_held+0xe0/0xe0
> [  139.359201]  ? __fget_files+0x1f0/0x300
> [  139.363532]  __x64_sys_ioctl+0x128/0x18e
> [  139.367948]  do_syscall_64+0x33/0x40
> [  139.371974]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [  139.377643] RIP: 0033:0x7f3d98d0a88b
>
> Are you also triggering anything like this in your environment?

I see other issue when rmmod kvm modules. :)

 ------------[ cut here ]------------
 WARNING: CPU: 5 PID: 2837 at kernel/rcu/tree_plugin.h:1738 call_rcu+0xd3/0x800
 CPU: 5 PID: 2837 Comm: rmmod Not tainted 5.8.0-rc7-next-20200728 #1
 RIP: 0010:call_rcu+0xd3/0x800
 RSP: 0018:ffffae25c302bd90 EFLAGS: 00010002
 RAX: 0000000000000001 RBX: ffffffff944e4f80 RCX: 0000000000000000
 RDX: 0000000000000101 RSI: 000000000000009f RDI: ffffffff93308cef
 RBP: ffffae25c302bdf0 R08: 000000000000074e R09: 0000000000002a2f
 R10: 0000000000000002 R11: ffffffff944dcf80 R12: ffffffff936ef4c0
 R13: ffff9702ce1fd900 R14: ffffffff920e2bd0 R15: 00000000ffff3849
 FS:  00007fc99b242700(0000) GS:ffff9702ce000000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 000055905332dd58 CR3: 00000003eeb46005 CR4: 00000000001706e0
 Call Trace:
  call_rcu_zapped+0x70/0x80
  lockdep_unregister_key+0xa6/0xf0
  destroy_workqueue+0x1b1/0x210
  kvm_irqfd_exit+0x15/0x20 [kvm]
  kvm_exit+0x78/0x80 [kvm]
  vmx_exit+0x1e/0x50 [kvm_intel]
  __x64_sys_delete_module+0x1e6/0x260
  do_syscall_64+0x63/0x350
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
 RIP: 0033:0x7fc99ad7b9e7