linux-kernel - Re: [PATCH 15/31] mm/userfaultfd: allow pte_offset_map

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZHDdYYQfqwiLSTzo@x1n>
Date:   Fri, 26 May 2023 12:25:05 -0400
From:   Peter Xu <peterx@...hat.com>
To:     Hugh Dickins <hughd@...gle.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Mike Rapoport <rppt@...nel.org>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Matthew Wilcox <willy@...radead.org>,
        David Hildenbrand <david@...hat.com>,
        Suren Baghdasaryan <surenb@...gle.com>,
        Qi Zheng <zhengqi.arch@...edance.com>,
        Yang Shi <shy828301@...il.com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Peter Zijlstra <peterz@...radead.org>,
        Will Deacon <will@...nel.org>, Yu Zhao <yuzhao@...gle.com>,
        Alistair Popple <apopple@...dia.com>,
        Ralph Campbell <rcampbell@...dia.com>,
        Ira Weiny <ira.weiny@...el.com>,
        Steven Price <steven.price@....com>,
        SeongJae Park <sj@...nel.org>,
        Naoya Horiguchi <naoya.horiguchi@....com>,
        Christophe Leroy <christophe.leroy@...roup.eu>,
        Zack Rusin <zackr@...are.com>, Jason Gunthorpe <jgg@...pe.ca>,
        Axel Rasmussen <axelrasmussen@...gle.com>,
        Anshuman Khandual <anshuman.khandual@....com>,
        Pasha Tatashin <pasha.tatashin@...een.com>,
        Miaohe Lin <linmiaohe@...wei.com>,
        Minchan Kim <minchan@...nel.org>,
        Christoph Hellwig <hch@...radead.org>,
        Song Liu <song@...nel.org>,
        Thomas Hellstrom <thomas.hellstrom@...ux.intel.com>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH 15/31] mm/userfaultfd: allow pte_offset_map_lock() to fail

On Thu, May 25, 2023 at 03:06:27PM -0700, Hugh Dickins wrote:
> On Wed, 24 May 2023, Peter Xu wrote:
> > On Sun, May 21, 2023 at 10:07:35PM -0700, Hugh Dickins wrote:
> > > mfill_atomic_install_pte() and mfill_atomic_pte_zeropage() treat
> > > failed pte_offset_map_lock() as -EFAULT, with no attempt to retry.
> > 
> > Could you help explain why it should be -EFAULT, not -EAGAIN or -EEXIST?
> 
> Thanks a lot for looking, Peter.
> 
> No good justification for -EFAULT: I just grabbed the closest, fairly
> neutral, error code that I could see already being in use there: but now
> that you mention -EAGAIN, which I can see being used from mfill_atomic(),
> yes, that would be ideal - and consistent with how it's already being used.
> 
> I'll make that change, thanks for suggesting.  (And it had bugged me how
> my fs/userfaultfd.c was electing to retry, but this one electing to fail.)

Thanks.

> 
> > 
> > IIUC right now if pte existed we have -EEXIST returned as part of the
> > userfault ABI, no matter whether it's pte or thp.
> 
> It might or might not correspond to -EEXIST - it might even end up as
> -EFAULT on a retry after -EAGAIN: I see mfill_atomic() contains both
> -EEXIST and -EFAULT cases for pmd_trans_huge().  Actually, I could
> say that the -EFAULT case there corresponds to the -EFAULT in this
> 15/31 patch, but that would be by coincidence not design: I'm happier
> with your -EAGAIN suggestion.

I had a feeling that that 2nd -EFAULT there could crash some userapp
already if it got returned somewhere, because the userapp shouldn't expect
that.  IMHO it should also return -EAGAIN, or even -EEXIST because even if
user retries, we should highly possibly see that thp again, so the -EEXIST
should possibly follow anyway.

Not a big deal here I think - if an userapp can trigger that -EFAULT I'd
say it's also a user bug because it made two decisions already on resolving
page fault for single VA, and it's racy between them..

> 
> > 
> > IMHO it may boil down to my limited knowledge on how pte_offset_map_lock()
> > is used after this part 2 series, and I assume the core changes will be in
> > your 3rd series (besides this one and the arch one).
> > 
> > Please shed some light if there's quick answers (IIUC this is for speeding
> > up collapsing shmem thps, but still no much clue here), or I can also wait
> > for reading the 3rd part if it'll come soon in any form.
> 
> It wouldn't be particularly easy to deduce from the third series of
> patches, rather submerged in implementation details.  Just keep in mind
> that, like in the "old" pmd_trans_unstable() cases, there may be instants
> at which, when trying to get the lock on a page table, that page table
> might already have gone, or been replaced by something else e.g. a THP,
> and a retry necessary at the outer level (if it's important to persist).

I'm actually still curious how the 3rd series will look like; would love to
read it when it comes.

Thanks,

-- 
Peter Xu