linux-kernel - Re: [PATCH v3 1/3] s390/pci: Fix s390_mmio_read/write syscall page fault handling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <89c74380-6a60-4091-ba57-93c75d9a37d7@redhat.com>
Date: Tue, 11 Jun 2024 17:10:40 +0200
From: David Hildenbrand <david@...hat.com>
To: Niklas Schnelle <schnelle@...ux.ibm.com>,
 Gerald Schaefer <gerald.schaefer@...ux.ibm.com>,
 Heiko Carstens <hca@...ux.ibm.com>, Vasily Gorbik <gor@...ux.ibm.com>,
 Alexander Gordeev <agordeev@...ux.ibm.com>,
 Christian Borntraeger <borntraeger@...ux.ibm.com>,
 Sven Schnelle <svens@...ux.ibm.com>,
 Alex Williamson <alex.williamson@...hat.com>,
 Gerd Bayer <gbayer@...ux.ibm.com>, Matthew Rosato <mjrosato@...ux.ibm.com>,
 Jason Gunthorpe <jgg@...pe.ca>, Suren Baghdasaryan <surenb@...gle.com>
Cc: linux-s390@...r.kernel.org, linux-kernel@...r.kernel.org,
 kvm@...r.kernel.org
Subject: Re: [PATCH v3 1/3] s390/pci: Fix s390_mmio_read/write syscall page
 fault handling

>>
>> which checks mmap_assert_write_locked().
>>
>> Setting VMA flags would be racy with the mmap lock in read mode.
>>
>>
>> remap_pfn_range() documents: "this is only safe if the mm semaphore is
>> held when called." which doesn't spell out if it needs to be held in
>> write mode (which I think it does) :)
> 
> Logically this makes sense to me. At the same time it looks like
> fixup_user_fault() expects the caller to only hold mmap_read_lock() as
> I do here. In there it even retakes mmap_read_lock(). But then wouldn't
> any fault handling by its nature need to hold the write lock?

Well, if you're calling remap_pfn_range() right now the expectation is 
that we hold it in write mode. :)

Staring at some random users, they all call it from mmap(), where you 
hold the mmap lock in write mode.

I wonder why we are not seeing that splat with vfio all of the time?

That mmap lock check was added "recently". In 1c71222e5f23 we started 
using vm_flags_set(). That (including the mmap_assert_write_locked()) 
check was added via bc292ab00f6c almost 1.5 years ago.

Maybe vfio is a bit special and was never really run with lockdep?

> 
>>
>>
>> My best guess is: if you are using remap_pfn_range() from a fault
>> handler (not during mmap time) you are doing something wrong, that's why
>> you get that report.
> 
> @Alex: I guess so far the vfio_pci_mmap_fault() handler is only ever
> triggered by "normal"/"actual" page faults where this isn't a problem?
> Or could it be a problem there too?
> 

I think we should see it there as well, unless I am missing something.

>>
>> vmf_insert_pfn() and friends might be better alternatives, that make
>> sure that the VMA already received the proper VMA flags at mmap time.
>>

There would be ways of silencing that check: for example, making sure at 
mmap time that these flags are already set, and skipping modifications 
if the flags are already set.

But, we'll run into more similar checks in x86 VM_PAT code, where we 
would do vm_flags_set(vma, VM_PAT) from track_pfn_remap. Some of that 
code really doesn't want to be called concurrently (e.g., "vma->vm_pgoff 
= pfn;").

I thought that we silenced some of these warnings in the past using 
__vm_flags_mod(). But it sounds hacky.

CCing Sureen.

-- 
Cheers,

David / dhildenb