linux-kernel - Re: [PATCH next v2 2/2] printk: fix cpu lock ordering

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YL9osWvgvdCo4JAK@alley>
Date:   Tue, 8 Jun 2021 14:55:13 +0200
From:   Petr Mladek <pmladek@...e.com>
To:     John Ogness <john.ogness@...utronix.de>
Cc:     Sergey Senozhatsky <senozhatsky@...omium.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        linux-kernel@...r.kernel.org,
        Peter Zijlstra <peterz@...radead.org>,
        "Paul E. McKenney" <paulmck@...nel.org>
Subject: Re: [PATCH next v2 2/2] printk: fix cpu lock ordering

On Mon 2021-06-07 22:02:32, John Ogness wrote:
> The cpu lock implementation uses a full memory barrier to take
> the lock, but no memory barriers when releasing the lock. This
> means that changes performed by a lock owner may not be seen by
> the next lock owner. This may have been "good enough" for use
> by dump_stack() as a serialization mechanism, but it is not
> enough to provide proper protection for a critical section.
> 
> Correct this problem by using acquire/release memory barriers
> for lock/unlock, respectively.
> 
> Note that it is not necessary for a cpu lock to disable
> interrupts. However, in upcoming work this cpu lock will be used
> for emergency tasks (for example, atomic consoles during kernel
> crashes) and any interruptions should be avoided if possible.
> 
> Signed-off-by: John Ogness <john.ogness@...utronix.de>

The change makes perfect sense and the code looks correct.
But I am not sure about the description of the memory barriers.

> ---
>  kernel/printk/printk.c | 30 ++++++++++++++++++++++++++++--
>  1 file changed, 28 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index f94babb38493..8c870581cfb4 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -3560,10 +3560,29 @@ void printk_cpu_lock_irqsave(bool *lock_flag, unsigned long *irq_flags)
>  
>  	cpu = smp_processor_id();
>  
> -	old = atomic_cmpxchg(&printk_cpulock_owner, -1, cpu);
> +	/*
> +	 * Guarantee loads and stores from the previous lock owner are
> +	 * visible to this CPU once it is the lock owner. This pairs
> +	 * with cpu_unlock:B.

These things are not easy to describe. It took me quite some time to
understand the above description. And think that it does not say
the full storry.

IMHO, the lock should work the way that:

   + The new owner see all writes done or seen by the previous owner(s).
   + The previous owner(s) never see writes done by the new owner.

It is actually a full barrier. I see this in include/linux/atomic.h

#define __atomic_acquire_fence		smp_mb__after_atomic
#define __atomic_release_fence		smp_mb__before_atomic


Well, I am not sure if my description is correct.
Documentation/memory-barriers.txt describes the acquire operation from
another angle:

 (5) ACQUIRE operations.

     This acts as a one-way permeable barrier.  It guarantees that all memory
     operations after the ACQUIRE operation will appear to happen after the
     ACQUIRE operation with respect to the other components of the system.
     ACQUIRE operations include LOCK operations and both smp_load_acquire()
     and smp_cond_load_acquire() operations.

     Memory operations that occur before an ACQUIRE operation may appear to
     happen after it completes.

     An ACQUIRE operation should almost always be paired with a RELEASE
     operation.

> +	 *
> +	 * Memory barrier involvement:
> +	 *
> +	 * If cpu_lock:A reads from cpu_unlock:B, then cpu_lock:B
> +	 * reads from cpu_unlock:A.

IMHO, this seems to describe even smaller part of the picture.
It is just about reads.

*
> +	 * Relies on:
> +	 *
> +	 * RELEASE from cpu_unlock:A to cpu_unlock:B
> +	 *    matching
> +	 * ACQUIRE from cpu_lock:A to cpu_lock:B
> +	 */
> +	old = atomic_cmpxchg_acquire(&printk_cpulock_owner,
> +				     -1, cpu); /* LMM(cpu_lock:A) */
>  	if (old == -1) {
>  		/* This CPU is now the owner. */
>  
> +		/* This CPU begins loading/storing data: LMM(cpu_lock:B) */
> +
>  		*lock_flag = true;
>  
>  	} else if (old == cpu) {
> @@ -3600,7 +3619,14 @@ EXPORT_SYMBOL(printk_cpu_lock_irqsave);
>  void printk_cpu_unlock_irqrestore(bool lock_flag, unsigned long irq_flags)
>  {
>  	if (lock_flag) {
> -		atomic_set(&printk_cpulock_owner, -1);
> +		/* This CPU is finished loading/storing data: LMM(cpu_unlock:A) */
> +
> +		/*
> +		 * Guarantee loads and stores from this CPU when it was the
> +		 * lock owner are visible to the next lock owner. This pairs
> +		 * with cpu_lock:A.

Yup, this seems to better match the formulation in
Documentation/memory-barriers.txt:

 (6) RELEASE operations.

     This also acts as a one-way permeable barrier.  It guarantees that all
     memory operations before the RELEASE operation will appear to happen
     before the RELEASE operation with respect to the other components of the
     system. RELEASE operations include UNLOCK operations and
     smp_store_release() operations.

     Memory operations that occur after a RELEASE operation may appear to
     happen before it completes.


Except that the acquire description does not mention "this CPU".

> +		 */
> +		atomic_set_release(&printk_cpulock_owner, -1); /* LMM(cpu_unlock:B) */
>  
>  		local_irq_restore(irq_flags);
>  	}

Honestly, I am not sure if we could describe the barriers correctly
and effectively at the same time.

IMHO, the following sentence, in the commit message, says everything:

   Correct this problem by using acquire/release memory barriers
   for lock/unlock, respectively.


In the code, I would use something similar:

	/* Try to take the lock with the appropriate memory barriers. */
	old = atomic_cmpxchg_acquire(&printk_cpulock_owner, -1, cpu);

and

		/* Release the lock with the appropriate memory barriers. */
		atomic_set_release(&printk_cpulock_owner, -1);


After all, the _acquire() and _release() variants were introduced
to implement locking operations correctly and effectively.
AFAIK, there were long discussions about the implementation and
documentation. IMHO, we do not need to duplicate it.

Best Regards,
Petr