[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <875yn2h5ku.fsf@jogness.linutronix.de>
Date: Thu, 21 Apr 2022 16:36:25 +0206
From: John Ogness <john.ogness@...utronix.de>
To: Petr Mladek <pmladek@...e.com>
Cc: Sergey Senozhatsky <senozhatsky@...omium.org>,
Steven Rostedt <rostedt@...dmis.org>,
Thomas Gleixner <tglx@...utronix.de>,
linux-kernel@...r.kernel.org,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Subject: Re: [PATCH printk v3 14/15] printk: extend console_lock for proper
kthread support
On 2022-04-21, Petr Mladek <pmladek@...e.com> wrote:
>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -2603,9 +2666,10 @@ static int console_cpu_notify(unsigned int cpu)
>> /* If trylock fails, someone else is doing the printing */
>> if (console_trylock())
>> console_unlock();
>> -
>> - /* Wake kthread printers. Some may have become usable. */
>> - wake_up_klogd();
>> + else {
>> + /* Some kthread printers may have become usable. */
>> + wake_up_klogd();
>
> Do you have any particular scenario in mind, please?
> Could CPU hotplug put any printk kthread into a sleep?
I do not have a particular scenario. My reasoning was that a CPU coming
online would affect the conditions of __console_is_usable() for consoles
without CON_ANYTIME. Of course, it would mean that previously a kthread
went to sleep because it was trying to print from a CPU that was
offline. I am doubtful that such a scenario is possible. But you did
uncover some bizarre code paths where task migration could fail during
CPU offlining.
Anyway, you suggested to keep the CON_ANYTIME checks for kthreads in
there. So it seems correct to wake threads anytime the
printer_should_wake() conditions change.
>> @@ -2625,11 +2689,33 @@ void console_lock(void)
>> down_console_sem();
>> if (console_suspended)
>> return;
>> + console_kthreads_block();
>> console_locked = 1;
>> console_may_schedule = 1;
>> }
>> EXPORT_SYMBOL(console_lock);
>>
>> +/*
>> + * Lock the console_lock, but rather than blocking all the kthread printers,
>> + * lock a specified kthread printer and hold the lock. This is useful if
>> + * console flags for a particular console need to be updated.
>> + */
>> +void console_lock_single_hold(struct console *con)
>> +{
>> + might_sleep();
>> + down_console_sem();
>> + mutex_lock(&con->lock);
>> + console_locked = 1;
>> + console_may_schedule = 1;
>
> This looks wrong. It is a global flag that could be modified
> only when all consoles are blocked.
You are correct. is_console_locked() needs to return false in this
scenario. I will leave out the @console_lock setting and insert a
comment to clarify why.
> This API blocks only the single console. The other consoles are still
> allowed to print actively.
That is the point. VT does not care about the other printers. VT is
using @console_locked to protect itself against itself.
> Another problem will appear with the 15th patch. It will remove
> console_locked variable and is_console_locked() will not longer
> be aware that this console is locked. We will not know that
> it might cause deadlock in the VT code.
>From the perspective of VT code the console is _not_ locked. So
is_console_locked() should return false. is_console_locked() is to make
sure that the _VT code_ has called console_lock()/console_trylock(). So
the 15th patch is still correct.
>> @@ -2728,17 +2834,18 @@ static void __console_unlock(void)
>> *
>> * @handover will be set to true if a printk waiter has taken over the
>> * console_lock, in which case the caller is no longer holding the
>> - * console_lock. Otherwise it is set to false.
>> + * console_lock. Otherwise it is set to false. A NULL pointer may be provided
>> + * to disable allowing the console_lock to be taken over by a printk waiter.
>> *
>> * Returns false if the given console has no next record to print, otherwise
>> * true.
>> *
>> - * Requires the console_lock.
>> + * Requires the console_lock if @handover is non-NULL.
>
> * Requires con->lock otherwise.
Right. I will update the comments.
>> */
>> -static bool console_emit_next_record(struct console *con, char *text, char *ext_text,
>> - char *dropped_text, bool *handover)
>> +static bool __console_emit_next_record(struct console *con, char *text, char *ext_text,
>> + char *dropped_text, bool *handover)
>> {
>> - static int panic_console_dropped;
>> + static atomic_t panic_console_dropped = ATOMIC_INIT(0);
>> struct printk_info info;
>> struct printk_record r;
>> unsigned long flags;
>> @@ -3261,6 +3401,8 @@ void register_console(struct console *newcon)
>>
>> newcon->dropped = 0;
>> newcon->thread = NULL;
>> + newcon->flags |= CON_THD_BLOCKED;
>
> Just to show the complexity added by console_lock_single_hold():
>
> It took me some time to realize that it is correct. The flag
> is needed because the console will be added under console_lock().
> The flag would not be needed when it was added under
> console_lock_single_hold().
?? But it is not added under
console_lock_single_hold(). console_lock_single_hold() is not a
replacement for console_lock(). Their purpose is very
different. console_lock_single_hold() is an internal function to provide
synchronization for @flags and @thread updates of a single console.
Maybe we are getting caught in my "bad naming" trap again. :-/
I do not have any ideas for a function that:
- locks @console_sem to prevent console registration/deregistration
- locks con->lock to provide synchronized @flags and/or @thread updates
>> + mutex_init(&newcon->lock);
>>
>> if (newcon->flags & CON_PRINTBUFFER) {
>> /* Get a consistent copy of @syslog_seq. */
>> @@ -3314,7 +3456,7 @@ int unregister_console(struct console *console)
>> return 0;
>>
>> res = -ENODEV;
>> - console_lock();
>> + console_lock_single_hold(console);
>> if (console_drivers == console) {
>> console_drivers=console->next;
>
> Another example of the complexity:
>
> I though that this was not safe. console_drivers is a global list
> and console_lock_single_hold() is supposed to block only a single
> console. But it is actually safe because console_lock_single_hold()
> holds console_sem.
Yes. It is safe.
> Another question is why console_lock_single_hold() is enough
> here and why console_lock() is used in register_console(). I think
> that console_lock_single_hold() will be enough even in
> register_console().
?? And which console would you want to lock? @newcon? It is not
registered yet.
If you want to minimize register_console() locking, it is enough just to
down @console_sem.
> All this is far from obvious. It shows how the API is confusing and
> tricky. And it is another motivation to remove
> console_lock_single_hold().
We need a method to provide @flags synchronization between the kthreads
and console_stop(). Keep in mind that console_lock() does *not* hold the
mutexes. So a completed console_lock() call does *not* mean that the
kthreads are sleeping. They could still lock their own mutex and keep
going. It is not until the kthreads see that CON_THD_BLOCKED is set that
they realize they are not supposed to be running and go to sleep. But
console_stop() could be performing an update to @flags while that
kthread is checking it. It is a data race in code that should be
synchronized.
I spent some time trying to find a good solution for this. Here are the
ideas that I came up with:
1. Use READ_ONCE(short)/WRITE_ONCE(short) because probably that is
enough to guarantee atomic writes/reads on all platforms.
2. Make @flags atomic_t. This guarentees consistence but would require
changing how all consoles initialize that field.
3. Create a separate @enabled boolean field in struct console so that
data races do not matter. This would also change how all consoles
initialize their struct.
4. Provide a new function that uses the mutex to synchronize, since the
kthread is already using the mutex.
I ended up choosing #4 because it had the added benefit of allowing
console_start(), console_stop(), console_unregister() to avoid affecting
the other kthreads.
>> res = 0;
>> @@ -3676,14 +3835,14 @@ static int printk_kthread_func(void *data)
>> kfree(ext_text);
>> kfree(text);
>>
>> - console_lock();
>> + mutex_lock(&con->lock);
>
> This is serialized against unregister_console() but not with
> register_console() because they use different locking scheme.
?? In register_console() the thread has not been created yet. There is
nothing to synchronize against.
> Resume:
>
> I would prefer to get rid of console_lock_single_hold() and
> console_unlock_single_release() API.
>
> It was definitely an interesting experiment. I agree that it would
> be nice to do not block the other kthreads when it is not really
> needed. But from my POV, it adds more harm than good at the moment.
So we go with option #1 to solve(?) the @flags synchronization issue? Or
is there another option I missed?
> It is possible that we will want to do such optimizations
> in the future. But it must be easier to understand what exactly
> is serialized which way. At least it should be more documented.
> Also the same API would need to be used on the related code
> paths.
AFAICT it is used in all places that it is appropriate.
John
Powered by blists - more mailing lists