linux-kernel - Re: [PATCH v4 01/24] Documentation: locking: Describe seqlock design and usage

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200720213551.5ba9bc6d@oasis.local.home>
Date:   Mon, 20 Jul 2020 21:35:51 -0400
From:   Steven Rostedt <rostedt@...dmis.org>
To:     "Ahmed S. Darwish" <a.darwish@...utronix.de>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>, Will Deacon <will@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        "Paul E. McKenney" <paulmck@...nel.org>,
        "Sebastian A. Siewior" <bigeasy@...utronix.de>,
        LKML <linux-kernel@...r.kernel.org>,
        Jonathan Corbet <corbet@....net>, linux-doc@...r.kernel.org
Subject: Re: [PATCH v4 01/24] Documentation: locking: Describe seqlock
 design and usage

On Mon, 20 Jul 2020 17:55:07 +0200
"Ahmed S. Darwish" <a.darwish@...utronix.de> wrote:
> +Read path, three categories:
> +
> +1. Normal Sequence readers which never block a writer but they must
> +   retry if a writer is in progress by detecting change in the sequence
> +   number.  Writers do not wait for a sequence reader::
> +
> +	do {
> +		seq = read_seqbegin(&foo_seqlock);
> +
> +		/* ... [[read-side critical section]] ... */
> +
> +	} while (read_seqretry(&foo_seqlock, seq));
> +
> +2. Locking readers which will wait if a writer or another locking reader
> +   is in progress. A locking reader in progress will also block a writer
> +   from entering its critical section. This read lock is
> +   exclusive. Unlike rwlock_t, only one locking reader can acquire it::

Nit, but I would mention that this acts similar to a normal spin_lock, and even disables preeption (in the non-RT case).
> +
> +	read_seqlock_excl(&foo_seqlock);
> +
> +	/* ... [[read-side critical section]] ... */
> +
> +	read_sequnlock_excl(&foo_seqlock);
> +
> +3. Conditional lockless reader (as in 1), or locking reader (as in 2),
> +   according to a passed marker. This is used to avoid lockless readers
> +   starvation (too much retry loops) in case of a sharp spike in write
> +   activity. First, a lockless read is tried (even marker passed). If
> +   that trial fails (odd sequence counter is returned, which is used as
> +   the next iteration marker), the lockless read is transformed to a
> +   full locking read and no retry loop is necessary::
> +
> +	/* marker; even initialization */
> +	int seq = 0;
> +	do {
> +		read_seqbegin_or_lock(&foo_seqlock, &seq);
> +
> +		/* ... [[read-side critical section]] ... */
> +
> +	} while (need_seqretry(&foo_seqlock, seq));
> +	done_seqretry(&foo_seqlock, seq);
> +
> +
> +API documentation
> +=================
> +
> +.. kernel-doc:: include/linux/seqlock.h
> diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
> index 8b97204f35a7..299d68f10325 100644
> --- a/include/linux/seqlock.h
> +++ b/include/linux/seqlock.h
> @@ -1,36 +1,15 @@
>  /* SPDX-License-Identifier: GPL-2.0 */
>  #ifndef __LINUX_SEQLOCK_H
>  #define __LINUX_SEQLOCK_H
> +
>  /*
> - * Reader/writer consistent mechanism without starving writers. This type of
> - * lock for data where the reader wants a consistent set of information
> - * and is willing to retry if the information changes. There are two types
> - * of readers:
> - * 1. Sequence readers which never block a writer but they may have to retry
> - *    if a writer is in progress by detecting change in sequence number.
> - *    Writers do not wait for a sequence reader.
> - * 2. Locking readers which will wait if a writer or another locking reader
> - *    is in progress. A locking reader in progress will also block a writer
> - *    from going forward. Unlike the regular rwlock, the read lock here is
> - *    exclusive so that only one locking reader can get it.
> + * seqcount_t / seqlock_t - a reader-writer consistency mechanism with
> + * lockless readers (read-only retry loops), and no writer starvation.
>   *
> - * This is not as cache friendly as brlock. Also, this may not work well
> - * for data that contains pointers, because any writer could
> - * invalidate a pointer that a reader was following.
> + * See Documentation/locking/seqlock.rst
>   *
> - * Expected non-blocking reader usage:
> - * 	do {
> - *	    seq = read_seqbegin(&foo);
> - * 	...
> - *      } while (read_seqretry(&foo, seq));
> - *
> - *
> - * On non-SMP the spin locks disappear but the writer still needs
> - * to increment the sequence variables because an interrupt routine could
> - * change the state of the data.
> - *
> - * Based on x86_64 vsyscall gettimeofday 
> - * by Keith Owens and Andrea Arcangeli
> + * Copyrights:
> + * - Based on x86_64 vsyscall gettimeofday: Keith Owens, Andrea Arcangeli
>   */
>  
>  #include <linux/spinlock.h>
> @@ -41,8 +20,8 @@
>  #include <asm/processor.h>
>  
>  /*
> - * The seqlock interface does not prescribe a precise sequence of read
> - * begin/retry/end. For readers, typically there is a call to
> + * The seqlock seqcount_t interface does not prescribe a precise sequence of
> + * read begin/retry/end. For readers, typically there is a call to
>   * read_seqcount_begin() and read_seqcount_retry(), however, there are more
>   * esoteric cases which do not follow this pattern.
>   *
> @@ -50,16 +29,30 @@
>   * via seqcount_t under KCSAN: upon beginning a seq-reader critical section,
>   * pessimistically mark the next KCSAN_SEQLOCK_REGION_MAX memory accesses as
>   * atomics; if there is a matching read_seqcount_retry() call, no following
> - * memory operations are considered atomic. Usage of seqlocks via seqlock_t
> - * interface is not affected.
> + * memory operations are considered atomic. Usage of the seqlock_t interface
> + * is not affected.
>   */
>  #define KCSAN_SEQLOCK_REGION_MAX 1000
>  
>  /*
> - * Version using sequence counter only.
> - * This can be used when code has its own mutex protecting the
> - * updating starting before the write_seqcountbeqin() and ending
> - * after the write_seqcount_end().
> + * Sequence counters (seqcount_t)
> + *
> + * This is the raw counting mechanism, without any writer protection.
> + *
> + * Write side critical sections must be serialized and non-preemptible.
> + *
> + * If readers can be invoked from hardirq or softirq contexts,
> + * interrupts or bottom halves must also be respectively disabled before
> + * entering the write section.
> + *
> + * This mechanism can't be used if the protected data contains pointers,
> + * as the writer can invalidate a pointer that a reader is following.
> + *
> + * If it's desired to automatically handle the sequence counter writer
> + * serialization and non-preemptibility requirements, use a sequential
> + * lock (seqlock_t) instead.
> + *
> + * See Documentation/locking/seqlock.rst
>   */
>  typedef struct seqcount {
>  	unsigned sequence;
> @@ -398,10 +391,6 @@ static inline void raw_write_seqcount_latch(seqcount_t *s)
>         smp_wmb();      /* increment "sequence" before following stores */
>  }
>  
> -/*
> - * Sequence counter only version assumes that callers are using their
> - * own mutexing.
> - */
>  static inline void write_seqcount_begin_nested(seqcount_t *s, int subclass)
>  {
>  	raw_write_seqcount_begin(s);
> @@ -434,15 +423,21 @@ static inline void write_seqcount_invalidate(seqcount_t *s)
>  	kcsan_nestable_atomic_end();
>  }
>  
> +/*
> + * Sequential locks (seqlock_t)
> + *
> + * Sequence counters with an embedded spinlock for writer serialization
> + * and non-preemptibility.
> + *
> + * For more info, see:
> + *    - Comments on top of seqcount_t
> + *    - Documentation/locking/seqlock.rst
> + */
>  typedef struct {
>  	struct seqcount seqcount;
>  	spinlock_t lock;
>  } seqlock_t;
>  
> -/*
> - * These macros triggered gcc-3.x compile-time problems.  We think these are
> - * OK now.  Be cautious.
> - */
>  #define __SEQLOCK_UNLOCKED(lockname)			\
>  	{						\
>  		.seqcount = SEQCNT_ZERO(lockname),	\