lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wgv=fz=c34MJOUbdSOVb6pGXkEXx9OnTz7weuYYBhd5pQ@mail.gmail.com>
Date:   Sat, 9 Jan 2021 11:46:46 -0800
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Matthew Wilcox <willy@...radead.org>
Cc:     Jason Gunthorpe <jgg@...pe.ca>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Linux-MM <linux-mm@...ck.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Yu Zhao <yuzhao@...gle.com>, Andy Lutomirski <luto@...nel.org>,
        Peter Xu <peterx@...hat.com>,
        Pavel Emelyanov <xemul@...nvz.org>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Mike Rapoport <rppt@...ux.vnet.ibm.com>,
        Minchan Kim <minchan@...nel.org>,
        Will Deacon <will@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Hugh Dickins <hughd@...gle.com>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Oleg Nesterov <oleg@...hat.com>, Jann Horn <jannh@...gle.com>,
        Kees Cook <keescook@...omium.org>,
        John Hubbard <jhubbard@...dia.com>,
        Leon Romanovsky <leonro@...dia.com>, Jan Kara <jack@...e.cz>,
        Kirill Tkhai <ktkhai@...tuozzo.com>
Subject: Re: [PATCH 0/2] page_count can't be used to decide when wp_page_copy

On Sat, Jan 9, 2021 at 11:33 AM Matthew Wilcox <willy@...radead.org> wrote:
>
> On Thu, Jan 07, 2021 at 01:05:19PM -0800, Linus Torvalds wrote:
> > Side note, and not really related to UFFD, but the mmap_sem in
> > general: I was at one point actually hoping that we could make the
> > mmap_sem a spinlock, or at least make the rule be that we never do any
> > IO under it. At which point a write lock hopefully really shouldn't be
> > such a huge deal.
>
> There's a (small) group of us working towards that.  It has some
> prerequisites, but where we're hoping to go currently:
>
>  - Replace the vma rbtree with a b-tree protected with a spinlock
>  - Page faults walk the b-tree under RCU, like peterz/laurent's SPF patchset
>  - If we need to do I/O, take a refcount on the VMA
>
> After that, we can gradually move things out from mmap_sem protection
> to just the vma tree spinlock, or whatever makes sense for them.  In a
> very real way the mmap_sem is the MM layer's BKL.

Well, we could do the "no IO" part first, and keep the semaphore part.

Some people actually prefer a semaphore to a spinlock, because it
doesn't end up causing preemption issues.

As long as you don't do IO (or memory allocations) under a semaphore
(ok, in this case it's a rwsem, same difference), it might even be
preferable to keep it as a semaphore rather than as a spinlock.

So it doesn't necessarily have to go all the way - we _could_ just try
something like "when taking the mmap_sem, set a thread flag" and then
have a "warn if doing allocations or IO under that flag".

And since this is about performance, not some hard requirement, it
might not even matter if we catch all cases.  If we fix it so that any
regular load on most normal filesystems never see the warning, we'd
already be golden.

Of course, I think we've had issues with rw_sems for _other_ reasons.
Waiman actually removed the reader optimistic spinning because it
caused bad interactions with mixed reader-writer loads.

So rwsemapores may end up not working as well as spinlocks if the
common situation is "just wait a bit, you'll get it".

                   Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ