linux-kernel - RE: [RFC PATCH] vfio/pci: add PCIe TPH to device feature ioctl

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID:
 <PAWPR08MB89095339DEAC58C405A0CF8F9FCB2@PAWPR08MB8909.eurprd08.prod.outlook.com>
Date: Wed, 5 Mar 2025 06:11:22 +0000
From: Wathsala Wathawana Vithanage <wathsala.vithanage@....com>
To: Alex Williamson <alex.williamson@...hat.com>
CC: Jason Gunthorpe <jgg@...pe.ca>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, nd <nd@....com>, Kevin Tian
	<kevin.tian@...el.com>, Philipp Stanner <pstanner@...hat.com>, Yunxiang Li
	<Yunxiang.Li@....com>, "Dr. David Alan Gilbert" <linux@...blig.org>, Ankit
 Agrawal <ankita@...dia.com>, "open list:VFIO DRIVER" <kvm@...r.kernel.org>,
	Dhruv Tripathi <Dhruv.Tripathi@....com>, Honnappa Nagarahalli
	<Honnappa.Nagarahalli@....com>, Jeremy Linton <Jeremy.Linton@....com>
Subject: RE: [RFC PATCH] vfio/pci: add PCIe TPH to device feature ioctl



> -----Original Message-----
> From: Alex Williamson <alex.williamson@...hat.com>
> Sent: Tuesday, March 4, 2025 7:24 PM
> To: Wathsala Wathawana Vithanage <wathsala.vithanage@....com>
> Cc: Jason Gunthorpe <jgg@...pe.ca>; linux-kernel@...r.kernel.org; nd
> <nd@....com>; Kevin Tian <kevin.tian@...el.com>; Philipp Stanner
> <pstanner@...hat.com>; Yunxiang Li <Yunxiang.Li@....com>; Dr. David Alan
> Gilbert <linux@...blig.org>; Ankit Agrawal <ankita@...dia.com>; open list:VFIO
> DRIVER <kvm@...r.kernel.org>
> Subject: Re: [RFC PATCH] vfio/pci: add PCIe TPH to device feature ioctl
> 
> On Tue, 4 Mar 2025 22:38:16 +0000
> Wathsala Wathawana Vithanage <wathsala.vithanage@....com> wrote:
> 
> > > > Linux v6.13 introduced the PCIe TLP Processing Hints (TPH) feature for
> > > > direct cache injection. As described in the relevant patch set [1],
> > > > direct cache injection in supported hardware allows optimal platform
> > > > resource utilization for specific requests on the PCIe bus. This feature
> > > > is currently available only for kernel device drivers. However,
> > > > user space applications, especially those whose performance is sensitive
> > > > to the latency of inbound writes as seen by a CPU core, may benefit from
> > > > using this information (E.g., DPDK cache stashing RFC [2] or an HPC
> > > > application running in a VM).
> > > >
> > > > This patch enables configuring of TPH from the user space via
> > > > VFIO_DEVICE_FEATURE IOCLT. It provides an interface to user space
> > > > drivers and VMMs to enable/disable the TPH feature on PCIe devices and
> > > > set steering tags in MSI-X or steering-tag table entries using
> > > > VFIO_DEVICE_FEATURE_SET flag or read steering tags from the kernel using
> > > > VFIO_DEVICE_FEATURE_GET to operate in device-specific mode.
> > >
> > > What level of protection do we expect to have here? Is it OK for
> > > userspace to make up any old tag value or is there some security
> > > concern with that?
> > >
> > Shouldn't be allowed from within a container.
> > A hypervisor should have its own STs and map them to platform STs for
> > the cores the VM is pinned to and verify any old ST is not written to the
> > device MSI-X, ST table or device specific locations.
> 
> And how exactly are we mediating device specific steering tags when we
> don't know where/how they're written to the device.  An API that
> returns a valid ST to userspace doesn't provide any guarantees relative
> to what userspace later writes.  MSI-X tables are also writable by

By not enabling TPH in device-specific mode, hypervisors can ensure that
setting an ST in a device-specific location (like queue contexts) will have no
effect. VMs should also not be allowed to enable TPH. I believe this could
be enforced by trapping (causing VM exits) on MSI-X/ST table writes. 

Having said that, regardless of this proposal or the availability of kernel
TPH support, a VFIO driver could enable TPH and set an arbitrary ST on the
MSI-X/ST table or a device-specific location on supported platforms. If the
driver doesn't have a list of valid STs, it can enumerate 8- or 16-bit STs and
measure access latencies to determine valid ones.

> userspace.  I could have missed it, but I also didn't note any pinning
> requirement in this proposal.  Thanks,
> 

Sorry, I failed to mention pinning earlier. Let's say we don't pin VMs to
CPUs. Now, say VM_A sets an ST on a NIC to get packet data to the L2D 
of the CPU_N to which its vCPU_0 is currently bound. Then, after a while,
say, VM_B gets scheduled to CPU_N. CPU_N, regardless of what 
process/thread is scheduled, will continuously receive data from VM A's
NIC for its L2D. Consequently, the performance of VMs scheduled on
CPU_N other than VM_A would degrade due to capacity misses and
invalidations. This is where the pinning requirement comes from.

--wathsala