lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 25 Aug 2022 23:42:47 +0000
From:   "Laba, SlawomirX" <slawomirx.laba@...el.com>
To:     "linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "regressions@...ts.linux.dev" <regressions@...ts.linux.dev>,
        "bhelgaas@...gle.com" <bhelgaas@...gle.com>,
        "alex.williamson@...hat.com" <alex.williamson@...hat.com>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "songmuchun@...edance.com" <songmuchun@...edance.com>,
        "mike.kravetz@...cle.com" <mike.kravetz@...cle.com>
CC:     "Brandeburg, Jesse" <jesse.brandeburg@...el.com>,
        "Nguyen, Anthony L" <anthony.l.nguyen@...el.com>
Subject: BUG: Virtual machine fails to start on 6.0-rc2

We were testing the changes for our VF devices and noticed an issue when starting VMs with passthrough VFs. We then moved back to mainline kernel and reproduced the issue on 6.0-rc2

We noticed that the startup of the KVM hangs.

Steps to reproduce:
Create a VF from the PF interface.
Configure VM XML with the VF PCI.
Start the KVM.

To isolate the issue we moved back to kernel 5.19 and it was working fine.

Working tag v5.19
Tested failing commit 4c612826bec1

[root@...alhost sl]# uname -r
6.0.0-rc2-00159-g4c612826bec1
[root@...alhost sl]# echo 1 > /sys/class/net/ens785f3/device/sriov_numvfs
[root@...alhost sl]# virsh start rhel_9_0_first
^C
[root@...alhost sl]# virsh list --all
 Id   Name             State
-------------------------------
 1    rhel_9_0_first   paused	

Dmesg:
[  +0.042400] iavf: Intel(R) Ethernet Adaptive Virtual Function Network Driver
[  +0.000004] Copyright (c) 2013 - 2018 Intel Corporation.
[  +0.000309] iavf 0000:18:19.0: enabling device (0000 -> 0002)
[  +0.073471] iavf 0000:18:19.0: Invalid MAC address 00:00:00:00:00:00, using random
[  +0.000674] iavf 0000:18:19.0: Multiqueue Enabled: Queue pair count = 16
[  +0.000466] iavf 0000:18:19.0: MAC address: 5a:0c:b5:f7:4f:0b
[  +0.000003] iavf 0000:18:19.0: GRO is enabled
[  +0.005941] iavf 0000:18:19.0 ens785f3v0: renamed from eth0
[  +0.179174] IPv6: ADDRCONF(NETDEV_CHANGE): ens785f3v0: link becomes ready
[  +0.000040] iavf 0000:18:19.0 ens785f3v0: NIC Link is Up Speed is 25 Gbps Full Duplex
[ +26.408503] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
[  +0.399621] VFIO - User Level meta-driver version: 0.3
[  +0.151579] iavf 0000:18:19.0: Remove device
[  +0.292158] ice 0000:18:00.3 ens785f3: Setting MAC 52:54:00:9f:ea:de on VF 0. VF driver will be reinitialized
[  +0.083676] ice 0000:18:00.3: Clearing port VLAN on VF 0
[  +0.155905] tun: Universal TUN/TAP device driver, 1.6
[  +0.000976] virbr0: port 1(vnet0) entered blocking state
[  +0.000017] virbr0: port 1(vnet0) entered disabled state
[  +0.000052] device vnet0 entered promiscuous mode
[  +0.000244] virbr0: port 1(vnet0) entered blocking state
[  +0.000003] virbr0: port 1(vnet0) entered listening state
[  +2.019924] virbr0: port 1(vnet0) entered learning state
[  +2.047997] virbr0: port 1(vnet0) entered forwarding state
[  +0.000018] virbr0: topology change detected, propagating
[Aug25 19:12] INFO: task khugepaged:507 blocked for more than 122 seconds.
[  +0.000016]       Tainted: G        W I        6.0.0-rc2-00159-g4c612826bec1 #1
[  +0.000010] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  +0.000010] task:khugepaged      state:D stack:    0 pid:  507 ppid:     2 flags:0x00004000
[  +0.000004] Call Trace:
[  +0.000001]  <TASK>
[  +0.000003]  __schedule+0x1bc/0x550
[  +0.000006]  ? osq_unlock+0xf/0x90
[  +0.000005]  schedule+0x5d/0xd0
[  +0.000002]  rwsem_down_write_slowpath+0x2c9/0x5e0
[  +0.000004]  ? find_vma+0x64/0x70
[  +0.000004]  collapse_huge_page+0x1f8/0x8a0
[  +0.000004]  ? _raw_spin_unlock+0x14/0x30
[  +0.000002]  ? preempt_count_add+0x70/0xa0
[  +0.000005]  ? _raw_spin_lock_irqsave+0x21/0x30
[  +0.000001]  ? lock_timer_base+0x61/0x80
[  +0.000005]  khugepaged_scan_pmd+0x33d/0x7b0
[  +0.000003]  khugepaged_scan_mm_slot+0x155/0x440
[  +0.000003]  khugepaged+0x189/0x3e0
[  +0.000002]  ? preempt_count_add+0x70/0xa0
[  +0.000002]  ? _raw_spin_unlock_irqrestore+0x1e/0x40
[  +0.000002]  ? khugepaged_scan_mm_slot+0x440/0x440
[  +0.000001]  kthread+0xf0/0x120
[  +0.000003]  ? kthread_complete_and_exit+0x20/0x20
[  +0.000003]  ret_from_fork+0x1f/0x30
[  +0.000006]  </TASK>

On the working version there is a line that enables the interface for the VF which is missing on non-working one:
[  +0.911730] vfio-pci 0000:18:19.0: enabling device (0000 -> 0002)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ