linux-kernel - Re: [PATCH] vfio: fix VFIO_IOMMU_UNMAP_DMA when end of range would overflow u64

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <68e18f2c-79ad-45ec-99b9-99ff68ba5438@oracle.com>
Date: Mon, 6 Oct 2025 21:23:56 -0400
From: Alejandro Jimenez <alejandro.j.jimenez@...cle.com>
To: Alex Mastro <amastro@...com>, Jason Gunthorpe <jgg@...pe.ca>
Cc: Alex Williamson <alex.williamson@...hat.com>, kvm@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] vfio: fix VFIO_IOMMU_UNMAP_DMA when end of range would
 overflow u64

Hi Alex, Jason,

On 10/6/25 8:39 PM, Alex Mastro wrote:
> On Mon, Oct 06, 2025 at 07:50:39PM -0300, Jason Gunthorpe wrote:
>> Could we block right at the ioctl inputs that end at ULONG_MAX? Maybe
>> that is a good enough fix?
> 
> That would be a simpler, and perhaps less risky fix than surgically fixing every
> affected place.
> 

If going this way, we'd also have to deny the MAP requests. Right now we 
have this problem because you can map a range ending at the top of the 
address space, but not unmap it. The map operation doesn't overflow 
because vfio_dma_do_map() calls vfio_link_dma() with dma->size == 0, and 
it grows the size later in vfio_pin_map_dma()...

https://elixir.bootlin.com/linux/v6.17.1/source/drivers/vfio/vfio_iommu_type1.c#L1676-L1677

> If we were to do that, I think VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE should
> report coherent limits -- i.e. it should "lie" and say the last page (or
> whatever) of u64 addressable space is inaccessible so that the UAPI is coherent
> with itself. It should be legal to map/unmap iova ranges up to the limits it
> claims to support.
> 
Agree.

> I have doubts that anyone actually relies on MAP_DMA-ing such
> end-of-u64-mappings in practice, so perhaps it's OK?
> 

The AMD IOMMU supports a 64-bit IOVA, so when using the AMD vIOMMU with 
DMA remapping enabled + VF passthrough + a Linux guest with 
iommu.forcedac=1 we hit this issue since the driver (mlx5) starts 
requesting mappings for IOVAs right at the top of the address space.

I mentioned this issue on the cover letter for:
https://lore.kernel.org/qemu-devel/20250919213515.917111-1-alejandro.j.jimenez@oracle.com/

and linked to my candidate fix:

https://github.com/aljimenezb/linux/commit/014be8cafe7464d278729583a2dd5d94514e2e2a

The reason why I hadn't send it to the list yet is because I noticed the 
additional locations Jason mentioned earlier in the thread (e.g. 
vfio_find_dma(), vfio_iommu_replay()) and wanted to put together a 
reproducer that also triggered those paths.

I mentioned in the notes for the patch above why I chose a slightly more 
complex method than the '- 1' approach, since there is a chance that 
iova+size could also go beyond the end of the address space and actually 
wrap around.

My goal was not to trust the inputs at all if possible. We could also 
use check_add_overflow() if we want to add explicit error reporting in 
vfio_iommu_type1 when an overflow is detected. i.e. catch bad callers 
that don't sanitize their inputs as opposed to letting them handle the 
error on the return path...

Thank you,
Alejandro