lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAB9dFdtBqrcmKcV=zxPyV5uNB7WeKOqqC4k5KtY+9vxS9ooKoA@mail.gmail.com>
Date:   Fri, 17 Apr 2020 13:06:40 -0300
From:   Marc Dionne <marc.c.dionne@...il.com>
To:     Thomas Gleixner <tglx@...utronix.de>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: FreeNAS VM disk access errors, bisected to commit 6f1a4891a592

Hi,

Commit 6f1a4891a592 ("x86/apic/msi: Plug non-maskable MSI affinity
race") causes Linux VMs hosted on FreeNAS (bhyve hypervisor) to lose
access to their disk devices shortly after boot.  The disks are zfs
zvols on the host, presented to each VM.

Background: I recently updated some fedora 31 VMs running under the
bhyve hypervisor (hosted on a FreeNAS mini), and they moved to a
distro 5.5 kernel (5.5.15).  Shortly after reboot, the disks became
inaccessible with any operation getting EIO errors.  Booting back into
a 5.4 kernel, everything was fine.  I built a 5.7-rc1 kernel, which
showed the same symptoms, and was then able to bisect it down to
commit 6f1a4891a592.  Note that the symptoms do not occur on every
boot, but often enough (roughly 80%) to make bisection possible.

Applying a manual revert of 6f1a4891a592 on top of mainline from
yesterday gives me a kernel that works fine.

Not sure which details are useful, but here are some bits that might
be relevant:
- FreeNAS host is running FreeNAS-11.3-U2
- efi/bios details:
    efi: EFI v2.40 by BHYVE
    efi:  SMBIOS=0x7fb5b000  ACPI=0x7fb88000  ACPI 2.0=0x7fb88014
    DMI:  BHYVE, BIOS 1.00 03/14/2014
- A sample disk:
    ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
    ata4.00: ATA-9: BHYVE SATA DISK, 001, max UDMA/133
    ata4.00: 2147483680 sectors, multi 128: LBA48 NCQ (depth 32)
    ata4.00: configured for UDMA/133
    scsi 3:0:0:0: Direct-Access     ATA      BHYVE SATA DISK  001  PQ: 0 ANSI: 5
    scsi 3:0:0:0: Attached scsi generic sg3 type 0
- The first sign of a problem on a "bad" kernel shows up as:
    ata1.00: exception Emask 0x0 SAct 0x78000001 SErr 0x0 action 0x6 frozen

Thanks,
Marc

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ