lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <aadc7b03-9121-6800-341b-6f2849088a09@linux.ibm.com>
Date:   Wed, 24 Apr 2019 20:01:20 +0200
From:   Laurent Dufour <ldufour@...ux.ibm.com>
To:     Michel Lespinasse <walken@...gle.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Michal Hocko <mhocko@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        "Kirill A. Shutemov" <kirill@...temov.name>,
        Andi Kleen <ak@...ux.intel.com>, dave@...olabs.net,
        Jan Kara <jack@...e.cz>, Matthew Wilcox <willy@...radead.org>,
        aneesh.kumar@...ux.ibm.com,
        Benjamin Herrenschmidt <benh@...nel.crashing.org>,
        mpe@...erman.id.au, Paul Mackerras <paulus@...ba.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>,
        "H. Peter Anvin" <hpa@...or.com>,
        Will Deacon <will.deacon@....com>,
        Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
        sergey.senozhatsky.work@...il.com,
        Andrea Arcangeli <aarcange@...hat.com>,
        Alexei Starovoitov <alexei.starovoitov@...il.com>,
        kemi.wang@...el.com, Daniel Jordan <daniel.m.jordan@...cle.com>,
        David Rientjes <rientjes@...gle.com>,
        Jerome Glisse <jglisse@...hat.com>,
        Ganesh Mahendran <opensource.ganesh@...il.com>,
        Minchan Kim <minchan@...nel.org>,
        Punit Agrawal <punitagrawal@...il.com>,
        vinayak menon <vinayakm.list@...il.com>,
        Yang Shi <yang.shi@...ux.alibaba.com>,
        zhong jiang <zhongjiang@...wei.com>,
        Haiyan Song <haiyanx.song@...el.com>,
        Balbir Singh <bsingharora@...il.com>, sj38.park@...il.com,
        Mike Rapoport <rppt@...ux.ibm.com>,
        LKML <linux-kernel@...r.kernel.org>,
        linux-mm <linux-mm@...ck.org>, haren@...ux.vnet.ibm.com,
        Nick Piggin <npiggin@...il.com>,
        "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
        Tim Chen <tim.c.chen@...ux.intel.com>,
        linuxppc-dev@...ts.ozlabs.org, x86@...nel.org
Subject: Re: [PATCH v12 00/31] Speculative page faults

Le 22/04/2019 à 23:29, Michel Lespinasse a écrit :
> Hi Laurent,
> 
> Thanks a lot for copying me on this patchset. It took me a few days to
> go through it - I had not been following the previous iterations of
> this series so I had to catch up. I will be sending comments for
> individual commits, but before tat I would like to discuss the series
> as a whole.

Hi Michel,

Thanks for reviewing this series.

> I think these changes are a big step in the right direction. My main
> reservation about them is that they are additive - adding some complexity
> for speculative page faults - and I wonder if it'd be possible, over the
> long term, to replace the existing complexity we have in mmap_sem retry
> mechanisms instead of adding to it. This is not something that should
> block your progress, but I think it would be good, as we introduce spf,
> to evaluate whether we could eventually get all the way to removing the
> mmap_sem retry mechanism, or if we will actually have to keep both.

Until we get rid of the mmap_sem which seems to be a very long story, I 
can't see how we could get rid of the retry mechanism.

> The proposed spf mechanism only handles anon vmas. Is there a
> fundamental reason why it couldn't handle mapped files too ?
> My understanding is that the mechanism of verifying the vma after
> taking back the ptl at the end of the fault would work there too ?
> The file has to stay referenced during the fault, but holding the vma's
> refcount could be made to cover that ? the vm_file refcount would have
> to be released in __free_vma() instead of remove_vma; I'm not quite sure
> if that has more implications than I realize ?

The only concern is the flow of operation  done in the vm_ops->fault() 
processing. Most of the file system relie on the generic filemap_fault() 
which should be safe to use. But we need a clever way to identify fault 
processing which are compatible with the SPF handler. This could be done 
using a tag/flag in the vm_ops structure or in the vma's flags.

This would be the next step.


> The proposed spf mechanism only works at the pte level after the page
> tables have already been created. The non-spf page fault path takes the
> mm->page_table_lock to protect against concurrent page table allocation
> by multiple page faults; I think unmapping/freeing page tables could
> be done under mm->page_table_lock too so that spf could implement
> allocating new page tables by verifying the vma after taking the
> mm->page_table_lock ?

I've to admit that I didn't dig further here.
Do you have a patch? ;)

> 
> The proposed spf mechanism depends on ARCH_HAS_PTE_SPECIAL.
> I am not sure what is the issue there - is this due to the vma->vm_start
> and vma->vm_pgoff reads in *__vm_normal_page() ?

Yes that's the reason, no way to guarantee the value of these fields in 
the SPF path.

> 
> My last potential concern is about performance. The numbers you have
> look great, but I worry about potential regressions in PF performance
> for threaded processes that don't currently encounter contention
> (i.e. there may be just one thread actually doing all the work while
> the others are blocked). I think one good proxy for measuring that
> would be to measure a single threaded workload - kernbench would be
> fine - without the special-case optimization in patch 22 where
> handle_speculative_fault() immediately aborts in the single-threaded case.

I'll have to give it a try.

> Reviewed-by: Michel Lespinasse <walken@...gle.com>
> This is for the series as a whole; I expect to do another review pass on
> individual commits in the series when we have agreement on the toplevel
> stuff (I noticed a few things like out-of-date commit messages but that's
> really minor stuff).

Thanks a lot for reviewing this long series.

> 
> I want to add a note about mmap_sem. In the past there has been
> discussions about replacing it with an interval lock, but these never
> went anywhere because, mostly, of the fact that such mechanisms were
> too expensive to use in the page fault path. I think adding the spf
> mechanism would invite us to revisit this issue - interval locks may
> be a great way to avoid blocking between unrelated mmap_sem writers
> (for example, do not delay stack creation for new threads while a
> large mmap or munmap may be going on), and probably also to handle
> mmap_sem readers that can't easily use the spf mechanism (for example,
> gup callers which make use of the returned vmas). But again that is a
> separate topic to explore which doesn't have to get resolved before
> spf goes in.
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ