[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250325223752.f5tjazbpbblgppyz@amd.com>
Date: Tue, 25 Mar 2025 17:37:52 -0500
From: Michael Roth <michael.roth@....com>
To: "Aithal, Srikanth" <sraithal@....com>
CC: <linux-pci@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
<bhelgaas@...gle.com>, <sfr@...b.auug.org.au>,
<syzkaller-bugs@...glegroups.com>, <linux-next@...r.kernel.org>, "Roger Pau
Monne" <roger.pau@...rix.com>, Juergen Gross <jgross@...e.com>
Subject: Re: [syzbot] [pci?] linux-next test error: general protection fault
in msix_capability_init
Also able to reproduce this trace on every boot with a basic KVM guest on an
EPYC Milan system using next-20250325 for both host/guest.
A bisect of commits to drivers/pci/msi seems to indicate the following commit
is the source of the regression:
commit d9f2164238d814d119e8c979a3579d1199e271bb
Author: Roger Pau Monne <roger.pau@...rix.com>
Date: Wed Feb 19 10:20:57 2025 +0100
PCI/MSI: Convert pci_msi_ignore_mask to per MSI domain flag
Setting pci_msi_ignore_mask inhibits the toggling of the mask bit for both
MSI and MSI-X entries globally, regardless of the IRQ chip they are using.
Only Xen sets the pci_msi_ignore_mask when routing physical interrupts over
event channels, to prevent PCI code from attempting to toggle the maskbit,
as it's Xen that controls the bit.
However, the pci_msi_ignore_mask being global will affect devices that use
MSI interrupts but are not routing those interrupts over event channels
(not using the Xen pIRQ chip). One example is devices behind a VMD PCI
bridge. In that scenario the VMD bridge configures MSI(-X) using the
normal IRQ chip (the pIRQ one in the Xen case), and devices behind the
bridge configure the MSI entries using indexes into the VMD bridge MSI
table. The VMD bridge then demultiplexes such interrupts and delivers to
the destination device(s). Having pci_msi_ignore_mask set in that scenario
prevents (un)masking of MSI entries for devices behind the VMD bridge.
Move the signaling of no entry masking into the MSI domain flags, as that
allows setting it on a per-domain basis. Set it for the Xen MSI domain
that uses the pIRQ chip, while leaving it unset for the rest of the
cases.
Remove pci_msi_ignore_mask at once, since it was only used by Xen code, and
with Xen dropping usage the variable is unneeded.
This fixes using devices behind a VMD bridge on Xen PV hardware domains.
Albeit Devices behind a VMD bridge are not known to Xen, that doesn't mean
Linux cannot use them. By inhibiting the usage of
VMD_FEAT_CAN_BYPASS_MSI_REMAP and the removal of the pci_msi_ignore_mask
bodge devices behind a VMD bridge do work fine when use from a Linux Xen
hardware domain. That's the whole point of the series.
Signed-off-by: Roger Pau Monné <roger.pau@...rix.com>
Reviewed-by: Thomas Gleixner <tglx@...utronix.de>
Acked-by: Juergen Gross <jgross@...e.com>
Acked-by: Bjorn Helgaas <bhelgaas@...gle.com>
Message-ID: <20250219092059.90850-4-roger.pau@...rix.com>
Signed-off-by: Juergen Gross <jgross@...e.com>
Thanks,
Mike
Powered by blists - more mailing lists