linux-kernel - Re: [PATCH RFC 2/2] printk: Use console_flush_one

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALqELGz4ug+YyBvdmPp5aHR3x4qUEp4u4cCpWLL5143VCrf3-w@mail.gmail.com>
Date: Fri, 19 Sep 2025 14:38:14 +0100
From: Andrew Murray <amurray@...goodpenguin.co.uk>
To: Petr Mladek <pmladek@...e.com>
Cc: Steven Rostedt <rostedt@...dmis.org>, John Ogness <john.ogness@...utronix.de>, 
	Sergey Senozhatsky <senozhatsky@...omium.org>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH RFC 2/2] printk: Use console_flush_one_record for legacy
 printer kthread

On Thu, 18 Sept 2025 at 17:27, Petr Mladek <pmladek@...e.com> wrote:
>
> On Mon 2025-09-15 13:43:06, Andrew Murray wrote:
> > The legacy printer kthread uses console_lock and
> > __console_flush_and_unlock to flush records to the console. This
> > approach results in the console_lock being held for the entire
> > duration of a flush. This can result in large waiting times for
> > those waiting for console_lock especially where there is a large
> > volume of records or where the console is slow (e.g. serial). This
> > contention is observed during boot, as the call to filp_open in
> > console_on_rootfs will delay progression to userspace until any
> > in-flight flush is completed.
>
> It would be great to add here the boot logs from the cover letter
> so that the real life numbers are stored in the git log.

No problem, I'll add them to the next revision.


>
> > Let's instead use __console_flush_unlocked which releases and
> > reacquires console_lock between records.
>
> It seems that the patch does the right thing.
> I would just suggest some refactoring, see below.
>
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -3301,6 +3301,46 @@ static bool console_flush_all(bool do_cond_resched, u64 *next_seq, bool *handove
> >       return any_usable;
> >  }
> >
> > +/*
> > + * Print out all remaining records to all consoles.
> > + *
> > + * @next_seq is set to the sequence number after the last available record.
> > + * The value is valid only when this function returns true. It means that all
> > + * usable consoles are completely flushed.
> > + *
> > + * @handover will be set to true if a printk waiter has taken over the
> > + * console_lock, in which case the caller is no longer holding the
> > + * console_lock. Otherwise it is set to false.
> > + *
> > + * Returns true when there was at least one usable console and all messages
> > + * were flushed to all usable consoles. A returned false informs the caller
> > + * that everything was not flushed (either there were no usable consoles or
> > + * another context has taken over printing or it is a panic situation and this
> > + * is not the panic CPU). Regardless the reason, the caller should assume it
> > + * is not useful to immediately try again.
> > + */
> > +static bool console_flush_all_unlocked(u64 *next_seq, bool *handover)
> > +{
> > +     bool any_usable;
> > +     bool any_progress;
> > +
> > +     *next_seq = 0;
> > +     *handover = false;
> > +
> > +     do {
> > +             console_lock();
> > +             any_progress = console_flush_one_record(true, next_seq, handover, &any_usable);
> > +
> > +             if (*handover)
> > +                     return false;
> > +
> > +             __console_unlock();
> > +
> > +     } while (any_progress);
> > +
> > +     return any_usable;
> > +}
>
> This is yet another console_flush() API with a complicated semantic.
> (return value, parameters). It is not bad. But it would be nice
> if we did not need it ;-)

Yes, my approach/rationale was to replace the chain of calls from the
__console_flush_and_unlock call in legacy_kthread_func to functions
that don't require console_lock. Thus __console_flush_and_unlock
became __console_flush_unlocked, and console_flush_all became
console_flush_all_unlocked. I hoped I could then change existing users
of the locked versions over to the unlocked versions and perhaps
eliminate the locked versions. However, given that hasn't worked out,
open-coding as you suggest seems like a better approach.


>
> >  static void __console_flush_and_unlock(void)
> >  {
> >       bool do_cond_resched;
> > @@ -3346,6 +3386,17 @@ static void __console_flush_and_unlock(void)
> >       } while (prb_read_valid(prb, next_seq, NULL) && console_trylock());
> >  }
> >
> > +static void __console_flush_unlocked(void)
> > +{
> > +     bool handover;
> > +     bool flushed;
> > +     u64 next_seq;
> > +
> > +     do {
> > +             flushed = console_flush_all_unlocked(&next_seq, &handover);
> > +     } while (flushed && !handover && prb_read_valid(prb, next_seq, NULL));
> > +}
>
> The semantic of this function is not much clear from the name.
> IMHO, it would be a puzzle for people who would try to understand
> the code in the future.
>
> > +
> >  /**
> >   * console_unlock - unblock the legacy console subsystem from printing
> >   *
> > @@ -3676,8 +3727,7 @@ static int legacy_kthread_func(void *unused)
> >               if (kthread_should_stop())
> >                       break;
> >
> > -             console_lock();
> > -             __console_flush_and_unlock();
> > +             __console_flush_unlocked();
>
> IMHO, it would be pretty hard to understand what it does by trying to
> understand all the layers of the code.
>
> It might be better to open code it. And I would take inspiration
> in the nbcon_kthread_func().

Yes, I agree.


>
> I played with the code and came up with:
>
> static int legacy_kthread_func(void *unused)
> {
>         bool any_progress;
>
> wait_for_event:
>         wait_event_interruptible(legacy_wait, legacy_kthread_should_wakeup());
>
>         do {
>                 bool any_usable;
>                 bool handover;
>                 u64 next_seq;
>
>                 if (kthread_should_stop())
>                         return 0;

This changes the behaviour from the existing legacy_kthread_func. Thus
allowing the thread to exit mid way through printing remaining
records, whereas previously the whole set of unprinted records would
first be printed. But that's probably a good thing.


>
>                 console_lock();
>                 any_progress = console_flush_one_record(true, &next_seq, &handover, &any_usable);
>                 if (!handover)
>                         __console_unlock();
>
>                 /*
>                  * There is no need to check whether there is any usable
>                  * console in this context or whether there are pending
>                  * messages. It is checked by legacy_kthread_should_wakeup()
>                  * anyway.

Ah, I hadn't realised this, thus the prb_read_valid call in my loop
isn't required. Thanks.


> And console_lock() will never succeed again if
>                  * there was panic in progress.
>                  */
>         } while (any_progress);
>
>         goto wait_for_event;
> }
>
> What do you think, please?
>
> Best Regards,
> Petr

I'll respin this series.

Thanks,

Andrew Murray