lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250428142558.263c5db1.alex.williamson@redhat.com>
Date: Mon, 28 Apr 2025 14:25:58 -0600
From: Alex Williamson <alex.williamson@...hat.com>
To: Jason Gunthorpe <jgg@...pe.ca>
Cc: Chathura Rajapaksha <chathura.abeyrathne.lk@...il.com>,
 kvm@...r.kernel.org, Chathura Rajapaksha <chath@...edu>, Paul Moore
 <paul@...l-moore.com>, Eric Paris <eparis@...hat.com>, Giovanni Cabiddu
 <giovanni.cabiddu@...el.com>, Xin Zeng <xin.zeng@...el.com>, Yahui Cao
 <yahui.cao@...el.com>, Bjorn Helgaas <bhelgaas@...gle.com>, Kevin Tian
 <kevin.tian@...el.com>, Niklas Schnelle <schnelle@...ux.ibm.com>, Yunxiang
 Li <Yunxiang.Li@....com>, Dongdong Zhang
 <zhangdongdong@...incomputing.com>, Avihai Horon <avihaih@...dia.com>,
 linux-kernel@...r.kernel.org, audit@...r.kernel.org
Subject: Re: [RFC PATCH 0/2] vfio/pci: Block and audit accesses to
 unassigned config regions

On Mon, 28 Apr 2025 10:24:55 -0300
Jason Gunthorpe <jgg@...pe.ca> wrote:

> On Sat, Apr 26, 2025 at 09:22:47PM +0000, Chathura Rajapaksha wrote:
> > Some PCIe devices trigger PCI bus errors when accesses are made to
> > unassigned regions within their PCI configuration space. On certain
> > platforms, this can lead to host system hangs or reboots.  
> 
> Do you have an example of this? What do you mean by bus error?
> 
> I would expect the device to return some constant like 0, or to return
> an error TLP. The host bridge should convert the error TLP to
> 0XFFFFFFF like all other read error conversions.
> 
> Is it a device problem or host bridge problem you are facing?

Or system problem.  Is it the access itself that generates a problem or
is it what the device does as a result of the access?  If the latter,
does this only remove a config space fuzzing attack vector against that
behavior or do we expect the device cannot generate the same behavior
via MMIO or IO register accesses?

We've previously leaned in the direction that we depend on hardware to
contain errors.  We cannot trap every access to the device or else we'd
severely limit the devices available to use and the performance of
those devices to the point that device assignment isn't worthwhile.

PCI config space is a slow path, it's already trapped, and it's
theoretically architected that we could restrict and audit much of it,
though some devices do rely on access to unarchitected config space.
But even within the architected space there are device specific
capabilities with undocumented protocols, exposing unknown features of
devices.  Does this incrementally make things better in general, or is
this largely masking a poorly behaved device/system?

> > 1. Support for blocking guest accesses to unassigned
> >    PCI configuration space, and the ability to bypass this access control
> >    for specific devices. The patch introduces three module parameters:
> > 
> >    block_pci_unassigned_write:
> >    Blocks write accesses to unassigned config space regions.
> > 
> >    block_pci_unassigned_read:
> >    Blocks read accesses to unassigned config space regions.
> > 
> >    uaccess_allow_ids:
> >    Specifies the devices for which the above access control is bypassed.
> >    The value is a comma-separated list of device IDs in
> >    <vendor_id>:<device_id> format.
> > 
> >    Example usage:
> >    To block guest write accesses to unassigned config regions for all
> >    passed through devices except for the device with vendor ID 0x1234 and
> >    device ID 0x5678:
> > 
> >    block_pci_unassigned_write=1 uaccess_allow_ids=1234:5678  
> 
> No module parameters please.
> 
> At worst the kernel should maintain a quirks list to control this,
> maybe with a sysfs to update it.

No module parameters might be difficult if we end up managing this as a
default policy selection, but certainly agree that if we get into
device specific behaviors we probably want those quirks automatically
deployed by the kernel.  Thanks,

Alex


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ