linux-kernel - Re: [PATCH v1 01/11] seqlock: provide lockdep-free raw_seqcount

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87h7b6c44o.ffs@tglx>
Date:   Fri, 17 Dec 2021 22:28:55 +0100
From:   Thomas Gleixner <tglx@...utronix.de>
To:     David Hildenbrand <david@...hat.com>, linux-kernel@...r.kernel.org
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Hugh Dickins <hughd@...gle.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        David Rientjes <rientjes@...gle.com>,
        Shakeel Butt <shakeelb@...gle.com>,
        John Hubbard <jhubbard@...dia.com>,
        Jason Gunthorpe <jgg@...dia.com>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Mike Rapoport <rppt@...ux.ibm.com>,
        Yang Shi <shy828301@...il.com>,
        "Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
        Matthew Wilcox <willy@...radead.org>,
        Vlastimil Babka <vbabka@...e.cz>, Jann Horn <jannh@...gle.com>,
        Michal Hocko <mhocko@...nel.org>,
        Nadav Amit <namit@...are.com>, Rik van Riel <riel@...riel.com>,
        Roman Gushchin <guro@...com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Peter Xu <peterx@...hat.com>,
        Donald Dutile <ddutile@...hat.com>,
        Christoph Hellwig <hch@....de>,
        Oleg Nesterov <oleg@...hat.com>, Jan Kara <jack@...e.cz>,
        linux-mm@...ck.org, linux-kselftest@...r.kernel.org,
        linux-doc@...r.kernel.org, David Hildenbrand <david@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>, Will Deacon <will@...nel.org>,
        Waiman Long <longman@...hat.com>,
        Boqun Feng <boqun.feng@...il.com>,
        Jonathan Corbet <corbet@....net>
Subject: Re: [PATCH v1 01/11] seqlock: provide lockdep-free raw_seqcount_t
 variant

On Fri, Dec 17 2021 at 12:30, David Hildenbrand wrote:
> Sometimes it is required to have a seqcount implementation that uses
> a structure with a fixed and minimal size -- just a bare unsigned int --
> independent of the kernel configuration. This is especially valuable, when
> the raw_ variants of the seqlock function will be used and the additional
> lockdep part of the seqcount_t structure remains essentially unused.
>
> Let's provide a lockdep-free raw_seqcount_t variant that can be used via
> the raw functions to have a basic seqlock.
>
> The target use case is embedding a raw_seqcount_t in the "struct page",
> where we really want a minimal size and cannot tolerate a sudden grow of
> the seqcount_t structure resulting in a significant "struct page"
> increase or even a layout change.

Cannot tolerate? Could you please provide a reason and not just a
statement?

> Provide raw_read_seqcount_retry(), to make it easy to match to
> raw_read_seqcount_begin() in the code.
>
> Let's add a short documentation as well.
>
> Note: There might be other possible users for raw_seqcount_t where the
>       lockdep part might be completely unused and just wastes memory --
>       essentially any users that only use the raw_ function variants.

Even when the reader side uses raw_seqcount_begin/retry() the writer
side still can use the non-raw variant which validates that the
associated lock is held on write.

Aside of that your proposed extra raw sequence count needs extra care
vs. PREEMPT_RT and this want's to be very clearly documented. Why?

The lock association has two purposes:

    1) Lockdep coverage which unearthed bugs already

    2) PREEMPT_RT livelock prevention

       Assume the following:

       spin_lock(wrlock);
       write_seqcount_begin(seq);

       ---> preemption by a high priority reader

       seqcount_begin(seq); <-- live lock

       The RT substitution does:

       seqcount_begin(seq)
           cnt = READ_ONCE(seq->sequence);

           if (cnt & 1) {
           	lock(s->lock);
                unlock(s->lock);
           }

       which prevents the deadlock because it makes the reader block on
       the associated lock, which allows the preempted writer to make
       progress.

       This applies to raw_seqcount_begin() as well.

I have no objections against the construct itself, but this has to be
properly documented vs. the restriction this imposes.

As you can see above the writer side therefore has to ensure that it
cannot preempted on PREEMPT_RT, which limits the possibilities what you
can do inside a preemption (or interrupt) disabled section on RT enabled
kernels. See Documentation/locking/locktypes.rst for further information.

Thanks,

        tglx