linux-kernel - Re: [PATCH] vfio: remove useless judgement

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <7217566f-9c40-ae9d-6fd6-2ef93f13f853@oracle.com>
Date:   Tue, 28 Jun 2022 08:48:11 -0400
From:   Steven Sistare <steven.sistare@...cle.com>
To:     Alex Williamson <alex.williamson@...hat.com>
Cc:     lizhe.67@...edance.com, cohuck@...hat.com, jgg@...pe.ca,
        kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
        lizefan.x@...edance.com
Subject: Re: [PATCH] vfio: remove useless judgement

For cpr, old qemu directly exec's new qemu, so task does not change.

To support fork+exec, the ownership test needs to be deleted or modified.

Pinned page accounting is another issue, as the parent counts pins in its
mm->locked_vm.  If the child unmaps, it cannot simply decrement its own
mm->locked_vm counter.  As you and I have discussed, the count is also 
wrong in the direct exec model, because exec clears mm->locked_vm.  I am 
thinking vfio could count pins in struct user locked_vm to handle both 
models.  The user struct and its count would persist across direct exec,
and be shared by parent and child for fork+exec.  However, that does change
the RLIMIT_MEMLOCK value that applications must set, because the limit must
accommodate vfio plus other sub-systems that count in user->locked_vm, which
includes io_uring, skbuff, xdp, and perf.  Plus, the limit must accommodate all
processes of that user, not just a single process.

Folks like fork+exec because it allows recovery if the new qemu process fails to
initialize. One can fall back to the original process, if the above issues are fixed.

- Steve

On 6/27/2022 6:06 PM, Alex Williamson wrote:
> 
> Hey Steve, how did you get around this for cpr or is this a gap?
> Thanks,
> 
> Alex
> 
> On Mon, 27 Jun 2022 11:51:09 +0800
> lizhe.67@...edance.com wrote:
> 
>> From: Li Zhe <lizhe.67@...edance.com>
>>
>> In function vfio_dma_do_unmap(), we currently prevent process to unmap
>> vfio dma region whose mm_struct is different from the vfio_dma->task.
>> In our virtual machine scenario which is using kvm and qemu, this
>> judgement stops us from liveupgrading our qemu, which uses fork() &&
>> exec() to load the new binary but the new process cannot do the
>> VFIO_IOMMU_UNMAP_DMA action during vm exit because of this judgement.
>>
>> This judgement is added in commit 8f0d5bb95f76 ("vfio iommu type1: Add
>> task structure to vfio_dma") for the security reason. But it seems that
>> no other task who has no family relationship with old and new process
>> can get the same vfio_dma struct here for the reason of resource
>> isolation. So this patch delete it.
>>
>> Signed-off-by: Li Zhe <lizhe.67@...edance.com>
>> Reviewed-by: Jason Gunthorpe <jgg@...pe.ca>
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 6 ------
>>  1 file changed, 6 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index c13b9290e357..a8ff00dad834 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -1377,12 +1377,6 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>>  
>>  		if (!iommu->v2 && iova > dma->iova)
>>  			break;
>> -		/*
>> -		 * Task with same address space who mapped this iova range is
>> -		 * allowed to unmap the iova range.
>> -		 */
>> -		if (dma->task->mm != current->mm)
>> -			break;
>>  
>>  		if (invalidate_vaddr) {
>>  			if (dma->vaddr_invalid) {
>