linux-kernel - Re: [PATCH printk v3 11/14] printk: ringbuffer: Consider committed as finalized in panic

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZbvcJwsIEsii3oi2@alley>
Date: Thu, 1 Feb 2024 19:00:07 +0100
From: Petr Mladek <pmladek@...e.com>
To: John Ogness <john.ogness@...utronix.de>
Cc: Sergey Senozhatsky <senozhatsky@...omium.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Thomas Gleixner <tglx@...utronix.de>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH printk v3 11/14] printk: ringbuffer: Consider committed
 as finalized in panic

On Thu 2023-12-14 22:47:58, John Ogness wrote:
> A descriptor in the committed state means the record does not yet
> exist for the reader. However, for the panic CPU, committed
> records should be handled as finalized records since they contain
> message data in a consistent state and may contain additional
> hints as to the cause of the panic.
> 
> Add an exception for records in the commit state to not be
> considered non-existing when reading from the panic CPU.

IMHO, it is important to describe effects of this change in more
details. And I think that it actually does not work as expected,
see below.

> --- a/kernel/printk/printk_ringbuffer.c
> +++ b/kernel/printk/printk_ringbuffer.c
> @@ -1875,16 +1877,25 @@ static int desc_read_finalized_seq(struct prb_desc_ring *desc_ring,
>  
>  	/*
>  	 * An unexpected @id (desc_miss) or @seq mismatch means the record
> -	 * does not exist. A descriptor in the reserved or committed state
> -	 * means the record does not yet exist for the reader.
> +	 * does not exist. A descriptor in the reserved state means the
> +	 * record does not yet exist for the reader.
>  	 */
>  	if (d_state == desc_miss ||
>  	    d_state == desc_reserved ||
> -	    d_state == desc_committed ||
>  	    s != seq) {
>  		return -EINVAL;
>  	}
>  
> +	/*
> +	 * A descriptor in the committed state means the record does not yet
> +	 * exist for the reader. However, for the panic CPU, committed
> +	 * records are also handled as finalized records since they contain
> +	 * message data in a consistent state and may contain additional
> +	 * hints as to the cause of the panic.
> +	 */
> +	if (d_state == desc_committed && !this_cpu_in_panic())
> +		return -EINVAL;

If I get it correctly, this causes that panic CPU would see a
non-finalized continuous line as finalized. And it would flush
the existing piece to consoles.

The problem is that pr_cont() would append the message into
the same record. But the consoles would already wait
for the next record. They would miss the appended pieces.

It might be fixed by finalizing the record before we read
the non-finalized piece. It is doable but it would add
another lock-less scenario. I am not sure if it would work with
the existing barriers or how complicated it would be to
prove the correctness.

Now, in practice, it would allow to flush pieces of continuous lines
printed on panic CPU immediately. It would not affect messages printed
by other CPUs because of a mix of reasons:

   1. The current code tries hard to move the console owner to panic
      CPU. It allows the panic CPU to flush the messages a safe way
      even when other CPUs are stopped or somehow locked.

      It means that the consoles are ideally flushed when the panic CPU
      adds a message.

   2. Only the last record might be in a committed state. Any older
      record is automatically finalized when it reaches the committed
      state.

   3. The previous patch causes that messages from non-panic CPUs
      are skipped when they are not committed or finalized, see
      https://lore.kernel.org/all/20231214214201.499426-11-john.ogness@linutronix.de

Now, this patch would really help only when the panic CPU dies in
the middle of a continuous message or when the final message does
not have a newline character.

Honestly, I think that it is not worth the effort. It would add another
complexity to the memory barriers. The real effect is not easy
to understand. And the benefit is minimal from my POV.

Alternative solutions:

1. Flush the non-finalized continuous line only before reaching
   the infinite loop in panic(). printk() gets disabled
   at this point anyway.

   It would solve only this one scenario, though.

2. Always finalize messages in vprintk_store() when in_panic().

   It would have the desired effect in all panic() situations.
   And it would not add another complexity to the lock-less
   ringbuffer code.

Best Regards,
Petr