linux-kernel - Re: [RFC PATCH printk v1] printk: ringbuffer: Do not skip non-finalized with prb_next

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <8734xnj74k.fsf@jogness.linutronix.de>
Date:   Fri, 03 Nov 2023 14:37:23 +0106
From:   John Ogness <john.ogness@...utronix.de>
To:     Petr Mladek <pmladek@...e.com>
Cc:     Sergey Senozhatsky <senozhatsky@...omium.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        linux-kernel@...r.kernel.org, Mukesh Ojha <quic_mojha@...cinc.com>,
        Chunlei Wang <chunlei.wang@...iatek.com>
Subject: Re: [RFC PATCH printk v1] printk: ringbuffer: Do not skip
 non-finalized with prb_next_seq()

On 2023-11-03, Petr Mladek <pmladek@...e.com> wrote:
>> IMHO we need 2 different functions:
>>
>> 1. A function that reports the last contiguous finalized record for a
>> reader. This is useful for syslog and kmsg_dump to know what is
>> available for them to read. We can use @last_finalized_seq for this,
>> optimizing it correctly this time.
>
> I would use this also for console_unlock() as well, see below.

OK.

>> 2. A function that reports the last reserved sequence number of a
>> writer. This is useful for pr_flush and console_unlock to know when they
>> are finished.
>
> I would personally use the @last_finalized_seq for
> console_unlock() and pr_flush() without a timeout. We could
> always call defer_console_output() when it is lower then
> the last reserved seq.
>
> Well, we actually do not even need to do this because
> the reserved records must be added by some printk().
> And this printk() will either flush the pending messages
> or it will call defer_console_output().

OK.

> The above paragraph describes a scenario which is not obvious.
> We should probably document it somewhere, probably in the description
> of prb_last_finalized_seq() or how it will be called.

OK.

>> This function can begin with @last_finalized_seq, looking
>> for the last finalized record (skipping over any non-finalized).
>
> I though about using desc_ring->head_id or looking for the
> last reserved sequence number.

The problem with @head_id is that the sequence number may not be
assigned yet. Really @last_finalized_seq is the newest sequence number
we have to search from.

>> Generally we have not concerned ourselves with readers. But I agree we
>> should make the optimization coherent with what a reader can actually
>> read. It might save some CPU cycles for polling tasks.
>
> I wanted to agree. But then I found this scenario:
>
> CPU0				CPU1
>
> console_unlock()
>   console_flush_all()
>
> 				printk()
> 				  vprintk_store()
>     return;
> 				    prb_final_commit;
>
> 				  console_trylock();  # failed
>
>   while (prb_read_valid());
>
> Now, the race:
>
>   + console_flush_all() did not flush the message from CPU1 because
>     it was not finalized in time.
>
>   + CPU1 failed to get console_lock() => CPU0 is responsible for
>     flushing
>
>   + prb_read_valid() failed on CPU0 because it did not see
>     the prb_desc finalized (missing barrier).

For semaphores, up() and down_trylock() successfully take and release a
raw spin lock. That provides the necessary barriers so that CPU0 sees
the record that CPU1 finalized.

>> Writing and reading of @last_finalized_seq will provide the necessary
>> boundaries to guarantee this:
>> 
>> ...finalize record...
>> atomic_long_try_cmpxchg_release(&desc_ring->last_finalized_seq, ...);
>> 
>> and
>> 
>> atomic_long_read_acquire(&desc_ring->last_finalized_seq);
>> ...read record...
>
> Yup. something like this.
>
> Well, it is suspicious that there is no _release() counter part.

Take a closer look above. The cmpxchg (on the writer side) does the
release. I have the litmus tests to verify that is correct and
sufficient for what we want: to guarantee that for any read
@last_finalized_seq value, the CPU can also read the associated record.

I am finalizing a new version of the "fix console flushing on panic"
series [0] that will also include the prb_next_seq() fix. If needed, we
can continue this discussion based on the new code.

John

[0] https://lore.kernel.org/lkml/20231013204340.1112036-1-john.ogness@linutronix.de