lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wiU8ktvak2hCj2TWJ6wMSwVsUSvi5Bjf4i1JGvpGmyUZw@mail.gmail.com>
Date:   Thu, 10 Dec 2020 09:23:53 -0800
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
Cc:     Will Deacon <will@...nel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Linux-MM <linux-mm@...ck.org>,
        Linux ARM <linux-arm-kernel@...ts.infradead.org>,
        Catalin Marinas <catalin.marinas@....com>,
        Jan Kara <jack@...e.cz>, Minchan Kim <minchan@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Vinayak Menon <vinmenon@...eaurora.org>,
        Android Kernel Team <kernel-team@...roid.com>
Subject: Re: [PATCH 1/2] mm: Allow architectures to request 'old' entries when prefaulting

On Thu, Dec 10, 2020 at 7:08 AM Kirill A. Shutemov
<kirill.shutemov@...ux.intel.com> wrote:
>
> See lightly tested patch below. Is it something you had in mind?

This is closer, in that at least it removes the ostensibly blocking
allocation (that can't happen) from the prefault path.

But the main issue remains:

> > At that point, I think the current very special and odd
> > do_fault_around() pre-allocation could be made into just a _regular_
> > "allocate the pmd if it doesn't exist". And then the pte locking could
> > be moved into filemap_map_pages(), and suddenly the semantics and
> > rules around all that would be a whole lot more obvious.
>
> No. It would stop faultaround code from mapping huge pages. We had to
> defer pte page table mapping until we know we don't have huge pages in
> page cache.

Can we please move that part to the callers too - possibly with a
separate helper function?

Because the real issue remains: as long the map_set_pte() function
takes the pte lock, the caller cannot rely on it.

And the filemap_map_pages() code really would like to rely on it.
Because if the lock is taken there *above* the loop - or even in the
loop iteration at the top, the code can now do things that rely on "I
know I hold the page table lock".

In particular, we can get rid of that very very expensive page locking.

Which is the reason I know about the horrid current issue with
"pre-allocate in one place, lock in another, and know we are atomic in
a third place" issue. Because I had to walk down these paths and
realize that "this loop is run under the page table lock, EXCEPT for
the first iteration, where it's taken by the first time we do that
non-allocating alloc_set_pte()".

See?

             Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ