linux-kernel - Re: [RFC PATCH RESEND 00/28] per-VMA locks proposal

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20220905203503.tqtr36fsfg4guk4j@moria.home.lan>
Date:   Mon, 5 Sep 2022 16:35:03 -0400
From:   Kent Overstreet <kent.overstreet@...ux.dev>
To:     Suren Baghdasaryan <surenb@...gle.com>
Cc:     Michal Hocko <mhocko@...e.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Michel Lespinasse <michel@...pinasse.org>,
        Jerome Glisse <jglisse@...gle.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Johannes Weiner <hannes@...xchg.org>,
        Mel Gorman <mgorman@...e.de>,
        Davidlohr Bueso <dave@...olabs.net>,
        Matthew Wilcox <willy@...radead.org>,
        "Liam R. Howlett" <liam.howlett@...cle.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Laurent Dufour <ldufour@...ux.ibm.com>,
        Laurent Dufour <laurent.dufour@...ibm.com>,
        "Paul E . McKenney" <paulmck@...nel.org>,
        Andy Lutomirski <luto@...nel.org>,
        Song Liu <songliubraving@...com>, Peter Xu <peterx@...hat.com>,
        David Hildenbrand <david@...hat.com>, dhowells@...hat.com,
        Hugh Dickins <hughd@...gle.com>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        David Rientjes <rientjes@...gle.com>,
        Axel Rasmussen <axelrasmussen@...gle.com>,
        Joel Fernandes <joelaf@...gle.com>,
        Minchan Kim <minchan@...gle.com>,
        kernel-team <kernel-team@...roid.com>,
        linux-mm <linux-mm@...ck.org>,
        linux-arm-kernel@...ts.infradead.org,
        linuxppc-dev@...ts.ozlabs.org, x86@...nel.org,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH RESEND 00/28] per-VMA locks proposal

On Mon, Sep 05, 2022 at 11:32:48AM -0700, Suren Baghdasaryan wrote:
> On Mon, Sep 5, 2022 at 5:32 AM 'Michal Hocko' via kernel-team
> <kernel-team@...roid.com> wrote:
> >
> > Unless I am missing something, this is not based on the Maple tree
> > rewrite, right? Does the change in the data structure makes any
> > difference to the approach? I remember discussions at LSFMM where it has
> > been pointed out that some issues with the vma tree are considerably
> > simpler to handle with the maple tree.
> 
> Correct, this does not use the Maple tree yet but once Maple tree
> transition happens and it supports RCU-safe lookups, my code in
> find_vma_under_rcu() becomes really simple.
> 
> >
> > On Thu 01-09-22 10:34:48, Suren Baghdasaryan wrote:
> > [...]
> > > One notable way the implementation deviates from the proposal is the way
> > > VMAs are marked as locked. Because during some of mm updates multiple
> > > VMAs need to be locked until the end of the update (e.g. vma_merge,
> > > split_vma, etc).
> >
> > I think it would be really helpful to spell out those issues in a greater
> > detail. Not everybody is aware of those vma related subtleties.
> 
> Ack. I'll expand the description of the cases when multiple VMAs need
> to be locked in the same update. The main difficulties are:
> 1. Multiple VMAs might need to be locked within one
> mmap_write_lock/mmap_write_unlock session (will call it an update
> transaction).
> 2. Figuring out when it's safe to unlock a previously locked VMA is
> tricky because that might be happening in different functions and at
> different call levels.
> 
> So, instead of the usual lock/unlock pattern, the proposed solution
> marks a VMA as locked and provides an efficient way to:
> 1. Identify locked VMAs.
> 2. Unlock all locked VMAs in bulk.
> 
> We also postpone unlocking the locked VMAs until the end of the update
> transaction, when we do mmap_write_unlock. Potentially this keeps a
> VMA locked for longer than is absolutely necessary but it results in a
> big reduction of code complexity.

Correct me if I'm wrong, but it looks like any time multiple VMAs need to be
locked we need mmap_lock anyways, which is what makes your approach so sweet.

If however we ever want to lock multiple VMAs without taking mmap_lock, then
deadlock avoidance algorithms aren't that bad - there's the ww_mutex approach,
which is simple and works well when there isn't much expected contention (the
advantage of the ww_mutex approach is that it doesn't have to track all held
locks). I've also written full cycle detection; that approcah gets you fewer
restarts, at the cost of needing a list of all currently held locks.