lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250918225739.GS1326709@ziepe.ca>
Date: Thu, 18 Sep 2025 19:57:39 -0300
From: Jason Gunthorpe <jgg@...pe.ca>
To: Alex Mastro <amastro@...com>
Cc: Alex Williamson <alex.williamson@...hat.com>,
	Kevin Tian <kevin.tian@...el.com>,
	Bjorn Helgaas <bhelgaas@...gle.com>, David Reiss <dreiss@...a.com>,
	Joerg Roedel <joro@...tes.org>, Keith Busch <kbusch@...nel.org>,
	Leon Romanovsky <leon@...nel.org>, Li Zhe <lizhe.67@...edance.com>,
	Mahmoud Adam <mngyadam@...zon.de>,
	Philipp Stanner <pstanner@...hat.com>,
	Robin Murphy <robin.murphy@....com>,
	Vivek Kasireddy <vivek.kasireddy@...el.com>,
	Will Deacon <will@...nel.org>, Yunxiang Li <Yunxiang.Li@....com>,
	linux-kernel@...r.kernel.org, iommu@...ts.linux.dev,
	kvm@...r.kernel.org
Subject: Re: [TECH TOPIC] vfio, iommufd: Enabling user space drivers to vend
 more granular access to client processes

On Thu, Sep 18, 2025 at 02:44:07PM -0700, Alex Mastro wrote:

> We anticipate a growing need to provide more granular access to device resources
> beyond what the kernel currently affords to user space drivers similar to our
> model.

I'm having a somewhat hard time wrapping my head around the security
model that says your trust your related processes not use DMA in a way
that is hostile their peers, but you don't trust them not to issue
hostile ioctls..

> To achieve (a), the USD sends the VFIO device fd to the client over Unix domain
> sockets using SCM_RIGHTS, along with descriptions of which device regions are
> for what. While this allows the client to mmap BARs into its address space,
> it comes at the cost of exposing more access to device BAR regions than is
> necessary or appropriate. 

IIRC VFIO should allow partial BAR mappings, so the client process can
robustly have a subset mapped if you trust it to perform the unix
SCM_RIGHTS/mapping ioctl/close() sequence.

> Instead of vending the VFIO device fd to the client process, the USD could bind
> the necessary BAR regions to a dma-buf fd and share that with the client. If
> VFIO supported dma_buf_ops.mmap, the client could mmap those into its address
> space.

I wouldn't object to this, I think it is not too complicated at all.

And the idea to add some 'use writecombining' to the create dmabuf ioctl is
certainly a novel and simple way to solve that problem.

> We are interested in the following incremental capabilities:
> - We want the USD to be able to create and vend fds which provide restricted
>   mapping access to the device's IOAS to the client, while preserving
>   the ability of the USD to revoke device access to client memory via
>   VFIO_IOMMU_UNMAP_DMA (or IOMMUFD_CMD_IOAS_UNMAP for IOMMUFD). Alternatively,
>   to forcefully invalidate the entire restricted IOMMU fd, including mappings.

I've had similarish requests for fwctl.. 

What I've been thinking is if the vending process could "dup" the FD
and permanently attach a BPF program to the new FD that sits right
after ioctl. The BPF program would inspect each ioctl when it is
issued and enforce whatever policy the vending process wants.

Sort of like seccomp.

iommufd and fwctl have a similar ioctl design, so I would have no
issue with something that could be easily reused for both.

What would give me alot of pause is your proposal where we effectively
have the kernel enforce some arbitary policy, and I know from
experience there will be endless asks for more and more policy
options.

> - It would be nice if mappings created with the restricted IOMMU fd were
>   automatically freed when the underlying kernel object was freed (if the client
>   process were to exit ungracefully without explicitly performing unmap cleanup
>   after itself).

Maybe the BPF could trigger an eventfd or something when the FD closes?

> Some of those things sound very similar to the direction of vIOMMU, but it is
> difficult to tell if that could meet our needs exactly. The kinds of features
> I think we want should be achievable purely in software without any dedicated
> hardware support.

I don't think viommu is really related to this, viommu is more about
multiple physical devices.

Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ