[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <644daada-1d77-68bb-1682-fb67f0b427f0@redhat.com>
Date: Fri, 17 Dec 2021 23:02:52 +0100
From: David Hildenbrand <david@...hat.com>
To: Thomas Gleixner <tglx@...utronix.de>, linux-kernel@...r.kernel.org
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Hugh Dickins <hughd@...gle.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
David Rientjes <rientjes@...gle.com>,
Shakeel Butt <shakeelb@...gle.com>,
John Hubbard <jhubbard@...dia.com>,
Jason Gunthorpe <jgg@...dia.com>,
Mike Kravetz <mike.kravetz@...cle.com>,
Mike Rapoport <rppt@...ux.ibm.com>,
Yang Shi <shy828301@...il.com>,
"Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
Matthew Wilcox <willy@...radead.org>,
Vlastimil Babka <vbabka@...e.cz>, Jann Horn <jannh@...gle.com>,
Michal Hocko <mhocko@...nel.org>,
Nadav Amit <namit@...are.com>, Rik van Riel <riel@...riel.com>,
Roman Gushchin <guro@...com>,
Andrea Arcangeli <aarcange@...hat.com>,
Peter Xu <peterx@...hat.com>,
Donald Dutile <ddutile@...hat.com>,
Christoph Hellwig <hch@....de>,
Oleg Nesterov <oleg@...hat.com>, Jan Kara <jack@...e.cz>,
linux-mm@...ck.org, linux-kselftest@...r.kernel.org,
linux-doc@...r.kernel.org, Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>, Will Deacon <will@...nel.org>,
Waiman Long <longman@...hat.com>,
Boqun Feng <boqun.feng@...il.com>,
Jonathan Corbet <corbet@....net>
Subject: Re: [PATCH v1 01/11] seqlock: provide lockdep-free raw_seqcount_t
variant
On 17.12.21 22:28, Thomas Gleixner wrote:
> On Fri, Dec 17 2021 at 12:30, David Hildenbrand wrote:
>> Sometimes it is required to have a seqcount implementation that uses
>> a structure with a fixed and minimal size -- just a bare unsigned int --
>> independent of the kernel configuration. This is especially valuable, when
>> the raw_ variants of the seqlock function will be used and the additional
>> lockdep part of the seqcount_t structure remains essentially unused.
>>
>> Let's provide a lockdep-free raw_seqcount_t variant that can be used via
>> the raw functions to have a basic seqlock.
>>
>> The target use case is embedding a raw_seqcount_t in the "struct page",
>> where we really want a minimal size and cannot tolerate a sudden grow of
>> the seqcount_t structure resulting in a significant "struct page"
>> increase or even a layout change.
>
Hi Thomas,
thanks for your feedback!
> Cannot tolerate? Could you please provide a reason and not just a
> statement?
Absolutely.
"struct page" is supposed to have a minimal size with a fixed layout.
Embedding something inside such a structure can change the fixed layout
in a way that it can just completely breaks any assumptions on location
of values.
Therefore, embedding a complex structure in it is usually avoided -- and
if we have to (spin_lock), we work around sudden size increases.
There are ways around it: allocate the lock and only store the pointer
in the struct page. But that most certainly adds complexity, which is
why I want to avoid it for now.
I'll extend that answer and add it to the patch description.
>
>> Provide raw_read_seqcount_retry(), to make it easy to match to
>> raw_read_seqcount_begin() in the code.
>>
>> Let's add a short documentation as well.
>>
>> Note: There might be other possible users for raw_seqcount_t where the
>> lockdep part might be completely unused and just wastes memory --
>> essentially any users that only use the raw_ function variants.
>
> Even when the reader side uses raw_seqcount_begin/retry() the writer
> side still can use the non-raw variant which validates that the
> associated lock is held on write.
Yes, that's my understanding as well.
>
> Aside of that your proposed extra raw sequence count needs extra care
> vs. PREEMPT_RT and this want's to be very clearly documented. Why?
>
> The lock association has two purposes:
>
> 1) Lockdep coverage which unearthed bugs already
Yes, that's a real shame to lose.
>
> 2) PREEMPT_RT livelock prevention
>
> Assume the following:
>
> spin_lock(wrlock);
> write_seqcount_begin(seq);
>
> ---> preemption by a high priority reader
>
> seqcount_begin(seq); <-- live lock
>
> The RT substitution does:
>
> seqcount_begin(seq)
> cnt = READ_ONCE(seq->sequence);
>
> if (cnt & 1) {
> lock(s->lock);
> unlock(s->lock);
> }
>
> which prevents the deadlock because it makes the reader block on
> the associated lock, which allows the preempted writer to make
> progress.
>
> This applies to raw_seqcount_begin() as well.
>
> I have no objections against the construct itself, but this has to be
> properly documented vs. the restriction this imposes.
Absolutely, any input is highly appreciated.
But to mention it again: whatever you can do with raw_seqcount_t, you
can do with seqcount_t, and there are already users relying completely
on the raw_ function variants (see my other reply).
So the documentation should most probably be extended to cover the raw_
functions and seqcount_t in general.
>
> As you can see above the writer side therefore has to ensure that it
> cannot preempted on PREEMPT_RT, which limits the possibilities what you
> can do inside a preemption (or interrupt) disabled section on RT enabled
> kernels. See Documentation/locking/locktypes.rst for further information.
It's going to be used for THP, which are currently incompatible with
PREEMPT_RT (disabled in the Kconfig). But preemption is also disabled
because we're using bit_spin_lock(), which does a bit_spin_lock().
Certainly worth documenting!
Thanks for your input!
--
Thanks,
David / dhildenb
Powered by blists - more mailing lists