linux-kernel - Re: [PATCH v16 2/5] fs/proc/task_mmu: Implement IOCTL to get and optionally clear info about PTEs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <be70e76a-3875-6e1b-a5aa-b89d2368b904@collabora.com>
Date:   Fri, 2 Jun 2023 22:42:45 +0500
From:   Muhammad Usama Anjum <usama.anjum@...labora.com>
To:     Peter Xu <peterx@...hat.com>
Cc:     Muhammad Usama Anjum <usama.anjum@...labora.com>,
        David Hildenbrand <david@...hat.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Michał Mirosław 
        <emmir@...gle.com>, Andrei Vagin <avagin@...il.com>,
        Danylo Mocherniuk <mdanylo@...gle.com>,
        Paul Gofman <pgofman@...eweavers.com>,
        Cyrill Gorcunov <gorcunov@...il.com>,
        Mike Rapoport <rppt@...nel.org>, Nadav Amit <namit@...are.com>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Shuah Khan <shuah@...nel.org>,
        Christian Brauner <brauner@...nel.org>,
        Yang Shi <shy828301@...il.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        "Liam R . Howlett" <Liam.Howlett@...cle.com>,
        Yun Zhou <yun.zhou@...driver.com>,
        Suren Baghdasaryan <surenb@...gle.com>,
        Alex Sierra <alex.sierra@....com>,
        Matthew Wilcox <willy@...radead.org>,
        Pasha Tatashin <pasha.tatashin@...een.com>,
        Axel Rasmussen <axelrasmussen@...gle.com>,
        "Gustavo A . R . Silva" <gustavoars@...nel.org>,
        Dan Williams <dan.j.williams@...el.com>,
        linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        linux-mm@...ck.org, linux-kselftest@...r.kernel.org,
        Greg KH <gregkh@...uxfoundation.org>, kernel@...labora.com
Subject: Re: [PATCH v16 2/5] fs/proc/task_mmu: Implement IOCTL to get and
 optionally clear info about PTEs

Thank you for reviewing and helping out.

On 6/2/23 9:47 PM, Peter Xu wrote:
[...]
>>>> static inline void make_uffd_wp_huge_pte(struct vm_area_struct *vma,
>>>> 					 unsigned long addr, pte_t *ptep,
>>>> 					 pte_t ptent)
>>>> {
>>>> 	if (is_hugetlb_entry_hwpoisoned(ptent) || is_pte_marker(ptent))
>>>> 		return;
>>>>
>>>> 	if (is_hugetlb_entry_migration(ptent))
>>>> 		set_huge_pte_at(vma->vm_mm, addr, ptep,
>>>> 				pte_swp_mkuffd_wp(ptent));
>>>> 	else if (!huge_pte_none(ptent))
>>>> 		ptep_modify_prot_commit(vma, addr, ptep, ptent,
>>>> 					huge_pte_mkuffd_wp(ptent));
>>>> 	else
>>>> 		set_huge_pte_at(vma->vm_mm, addr, ptep,
>>>> 				make_pte_marker(PTE_MARKER_UFFD_WP));
>>>> }
>>>
>>> the is_pte_marker() check can be extended to double check
>>> pte_marker_uffd_wp() bit, but shouldn't matter a lot since besides the
>>> uffd-wp bit currently we only support swapin error which should sigbus when
>>> accessed, so no point in tracking anyway.
>> Yeah, we are good with what we have as even if more bits are supported in
>> pte markers, this function is only reached when UNPOPULATED + ASYNC WP are
>> enabled. So no other bit would be set on the marker.
> 
> I think we don't know?  swapin error bit can be set there afaiu, if someone
> swapoff -a and found a swap device broken for some swapped out ptes.
Oops, so I remove returning on pte marker detection. Instead the last else
is already setting marker wp-ed.

> 
> But again I think that's fine for now.
> 
>>
>>>
>>>>
>>>> As we always set UNPOPULATED, so markers are always set on none ptes
>>>> initially. Is it possible that a none pte becomes present, then swapped and
>>>> finally none again? So I'll do the following addition for make_uffd_wp_pte():
>>>>
>>>> --- a/fs/proc/task_mmu.c
>>>> +++ b/fs/proc/task_mmu.c
>>>> @@ -1800,6 +1800,9 @@ static inline void make_uffd_wp_pte(struct
>>>> vm_area_struct *vma,
>>>>  	} else if (is_swap_pte(ptent)) {
>>>>  		ptent = pte_swp_mkuffd_wp(ptent);
>>>>  		set_pte_at(vma->vm_mm, addr, pte, ptent);
>>>> +	} else {
>>>> +		set_pte_at(vma->vm_mm, addr, pte,
>>>> +			   make_pte_marker(PTE_MARKER_UFFD_WP));
>>>>  	}
>>>>  }
>>>
>>> Makes sense, you can leverage userfaultfd_wp_use_markers() here, and you
>>> should probably keep the protocol (only set the marker when WP_UNPOPULATED
>>> for anon).
>> This function is only reachable when UNPOPULATED + Async WP are set. So we
>> don't need to use userfaultfd_wp_use_markers().
> 
> I don't remember where you explicitly checked that to make sure it'll
> always be the case, but if you do it'll still be nice if you can add a
> comment right above here explaining.
I was only testing for WP and WP Async in pagemap_scan_test_walk() as WP
async can only be enabled if unpopulated is enabled which in turn enables
the markers. I'll add userfaultfd_wp_use_markers() to pagemap_scan_test_walk().

> 
> Or, maybe you can still use userfaultfd_wp_use_markers() here, then you can
> just WARN_ON_ONCE() on it, which looks even better?
> 
> [...]
> 
>>> Hmm, is it a bug for pagemap?  pagemapread.buffer should be linear to the
>>> address range to be scanned to me.  If it skips some unstable pmd without
>>> filling in anything it seems everything later will be shifted with
>>> PMD_SIZE..  I had a feeling that it should set walk->action==ACTION_AGAIN
>>> before return.
>> I don't think this is a bug if this is how it was implemented in the first
>> place. In this task_mmu.c file, we can find several examples of the same
>> pattern that error isn't returned if pmd_trans_unstable() succeeds.
> 
> I don't see why multiple same usages mean it's correct.. maybe they're all
> buggy?
> 
> I can post a patch for this to collect opinions to see if I missed
> something. I'd suggest you figure out what's the right thing to do for the
> new interface and make it right from the start, no matter how it was
> implemented elsewhere.
Alright. At first sight, it seems I should return -EAGAIN here. But there
maybe a case where there are 3 THPs. I've cleared WP over the first THP and
put data in the user buffer. If I return -EAGAIN, the data in the user
buffer would be not used by the user correctly as negative value has been
returned. So the best way here would be to skip this second VMA and don't
abort the operation. Thus we'll not be giving any information about the
second THP as it was in transition.

I'll think about it again before sending next version.

> 
> Thanks,
> 

-- 
BR,
Muhammad Usama Anjum