linux-kernel - Re: [PATCH printk v1 06/18] printk: nobkl: Add acquire/release logic

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZBiFX9rHg/Gjj27Y@alley>
Date:   Mon, 20 Mar 2023 17:10:07 +0100
From:   Petr Mladek <pmladek@...e.com>
To:     John Ogness <john.ogness@...utronix.de>
Cc:     Sergey Senozhatsky <senozhatsky@...omium.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        linux-kernel@...r.kernel.org,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Subject: Re: [PATCH printk v1 06/18] printk: nobkl: Add acquire/release logic

On Fri 2023-03-17 16:02:12, John Ogness wrote:
> Hi Petr,
> 
> On oftc#printk you mentioned that I do not need to go into details
> here. But I would like to confirm your understanding and clarify some
> minor details.
> 
> On 2023-03-13, Petr Mladek <pmladek@...e.com> wrote:
> > 2. There are 4 priorities. They describe the type of the context that is
> >    either owning the console or which would like to get the owner
> >    ship.
> 
> Yes, however (and I see now the kerneldoc is not very clear about this),
> the priorities are not really about _printing_ on the console, but
> instead about _owning_ the console. This is an important distinction
> because console drivers will also acquire the console for non-printing
> activities (such as setting up their baud rate, etc.).

Makes sense. I have missed this use-case of the lock.

> >    These priorities have the following meaning:
> >
> >        + NONE: when the console is idle
> 
> "unowned" is a better term than "idle".

Makes sense. Or maybe "free" or "released".

> >        + NORMAL: the console is owned by the kthread
> 
> NORMAL really means ownership for normal usage (i.e. an owner that is
> not in an emergency or panic situation).
> 
> >        + EMERGENCY: The console is called directly from printk().
> > 	   It is used when printing some emergency messages, like
> > 	   WARN(), watchdog splat.
> 
> This priority of ownership will only be used when printing emergency
> messages. It does not mean that printk() does direct printing. The
> atomic printing occurs as a flush when releasing the ownership. This
> allows the full backtrace to go into the ringbuffer before flushing (as
> we decided at LPC2022).

I see. I have missed this as well.

> >
> > Common rule: The caller never tries to take over the lock
> >     from another owner of the same priority (?)
> 
> Correct. Although I could see there being an argument to let an
> EMERGENCY priority take over another EMERGENCY. For example, an owning
> EMERGENCY CPU could hang and another CPU triggers the NMI stall message
> (also considered emergency messages), in which case it would be helpful
> to take over ownership from the hung CPU in order to finish flushing.

I agree that it would be useful. Another motivation would be to reduce
the risk of stalling the current lock owner. I mean to have a variant
of console_trylock_spinning() also for this consoles in the EMERGENCY
priority.


> > Current owner:
> >
> >   + Must always do non-atomic operations in the "unsafe" context.
> 
> Each driver must decide for itself how it defines unsafe. But generally
> speaking it will be a block of code involving modifying multiple
> registers.
> 
> >   + Must check if they still own the lock or if there is a request
> >     to pass the lock before manipulating the console state or reading
> >     the shared buffers.
> 
> ... or continuing to touch its registers.
> 
> >   + Should pass the lock to a context with a higher priority.
> >     It must be done only in a "safe" state. But it might be in
> >     the middle of the record.
> 
> The function to check also handles the handing over. So a console
> driver, when checking, may suddenly see that it is no longer the owner
> and must either carefully back out or re-acquire ownership to finish
> what it started.

Just to be sure. The owner could finish what-it-started only when
the other owner did not do conflicting changes in the meantime.

For example, it could not finish writing of a line because the
other owner could have reused the buffer or already flushed
the line in the meantime.


(For example, for the 8250, if an owning context
> disabled interrupts and then lost ownership, it _must_ re-acquire the
> console to re-enable the interrupts.)
> 
> > Passing the owner:
> >
> >    + The current owner sets con->atomic_state[CUR] according
> >      to the info in con->atomic_state[REQ] and bails out.
> >
> >    + The notices that it became the owner by finding its
> >      requested state in con->atomic_state[CUR]
> >
> >    + The most tricky situation is when the current owner
> >      is passing the lock and the waiter is giving up
> >      because of the timeout. The current owner could pass
> >      the lock only when the waiter is still watching.
> 
> Yes, yes, and yes. Since the waiter must remove its request from
> con->atomic_state[CUR] before giving up, it guarentees the current owner
> will see that the waiter is gone because any cmpxchg will fail and the
> current owner will need to re-read con->atomic_state[CUR] (in which case
> it sees there is no waiter).
> 
> > Other:
> >
> >    + Atomic consoles ignore con->seq. Instead they store the lower
> >      32-bit part of the sequence number in the atomic_state variable
> >      at least on 64-bit systems. They use get_next_seq() to guess
> >      the higher 32-bit part of the sequence number.
> 
> Yes, because con->seq is protected by the console_lock, which nbcons do
> not use.

Yup.

> > Questions:
> >
> > How exactly do we handle the early boot before kthreads are ready,
> > please? It looks like we just wait for the kthread.
> 
> Every vprintk_emit() will call into cons_atomic_flush(), which will
> atomically flush the consoles if their threads do not exist. Looking at
> the code, I see it deserves a comment about this (inside the
> for_each_console_srcu loop in cons_atomic_flush()).

I see. I have missed this as well. I haven't checked the later
patches in delail yet.

> > Does the above summary describe the behavior, please?
> > Or does the code handle some situation another way?
> 
> Generally speaking, you have a pretty good picture. I think the only
> thing that was missing was the concept that non-printing code (in
> console drivers) will also acquire the console at times.

Thanks a lot for the info.


> >> --- a/kernel/printk/printk_nobkl.c
> >> +++ b/kernel/printk/printk_nobkl.c
> >> +/**
> >> + * cons_check_panic - Check whether a remote CPU is in panic
> >> + *
> >> + * Returns: True if a remote CPU is in panic, false otherwise.
> >> + */
> >> +static inline bool cons_check_panic(void)
> >> +{
> >> +	unsigned int pcpu = atomic_read(&panic_cpu);
> >> +
> >> +	return pcpu != PANIC_CPU_INVALID && pcpu != smp_processor_id();
> >> +}
> >
> > This does the same as abandon_console_lock_in_panic(). I would
> > give it some more meaningful name and use it everywhere.
> >
> > What about other_cpu_in_panic() or panic_on_other_cpu()?
> 
> I prefer the first because it sounds more like a query than a
> command.

Yup.

Best Regards,
Petr