linux-kernel - Re: [PATCH printk v7 30/35] printk: Add helper for flush type logic

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87zfpozal4.fsf@jogness.linutronix.de>
Date: Wed, 07 Aug 2024 16:17:51 +0206
From: John Ogness <john.ogness@...utronix.de>
To: Petr Mladek <pmladek@...e.com>
Cc: Sergey Senozhatsky <senozhatsky@...omium.org>, Steven Rostedt
 <rostedt@...dmis.org>, Thomas Gleixner <tglx@...utronix.de>,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH printk v7 30/35] printk: Add helper for flush type logic

On 2024-08-07, Petr Mladek <pmladek@...e.com> wrote:
> I would suggest to change the semantic and set the _preferred_
> flush method instead of an _available_ one.

I will need to evaluate this closely. I worry that the caller needs to
understands how the helper function is choosing the preference. For
example, at the end you make a suggestion that is broken with this
suggested change.

>> +		if (ft.nbcon_atomic) {
>> +			stop_seq = prb_next_reserve_seq(prb);
>> +			goto again;
>> +		}
>
> BTW: I wonder how this code would look like after adding the printk
>      threads. We should do "goto again" only when ft.nbcon_atomic
>      is the preferred (only available) flush type for nbcon consoles.

	if (ft.nbcon_offload) {
		...
	} else if (ft.nbcon_atomic) {
		...
	}

>      IMHO, it is another reason to change the semantic.

The caller does not need to rely on the helper "choosing" the right
one. I understand your point that: It is easier for the caller when we
can blindly rely on the helper to choose for us. But I worry that if we
ever adjust the helper, we might break various call sites that blindly
rely on the helper making a certain choice. If the helper's job is only
to say what is possible, then I would worry less for the future when we
may need to adjust the helper.

>> +	printk_get_console_flush_type(&ft);
>
> It is a nice trick to call printk_get_console_flush_type() this early.
> I allows to hack the result when processing the hacky LOGLEVEL_SCHED ;-)
>
>> +
>>  	/* If called from the scheduler, we can not call up(). */
>>  	if (level == LOGLEVEL_SCHED) {
>>  		level = LOGLEVEL_DEFAULT;
>>  		do_trylock_unlock = false;
>> -		defer_legacy = true;
>> +	} else {
>> +		do_trylock_unlock = ft.legacy_direct;
>>  	}
>
> We could hack the @ft structure directly here:
>
> 	if (level == LOGLEVEL_SCHED) {
> 		level = LOGLEVEL_DEFAULT;
> 		ft.legacy_offload |= ft.legacy_direct;
> 		ft.legacy_direct = false;
> 	}

The hack seems a bit complicated to me. Especially when the helper is
choosing preferred methods. I will think about it.

>> +	if (!cpuhp_tasks_frozen) {
>> +		printk_get_console_flush_type(&ft);
>> +		if (ft.legacy_direct) {
>> +			if (console_trylock())
>> +				console_unlock();
>
> Why do we actually call only the legacy loop here?
> IMHO, we should also do
>
> 	if (ft.nbcon_atomic)
>  		nbcon_atomic_flush_pending();

Atomic consoles do not care if a CPU was online or not. I can add this,
but I expect there is nothing for the atomic console to flush. And when
threading is added, we would need the extra code to avoid atomic
flushing:

	if (!ft.nbcon_offload && ft.nbcon_atomic)                
		nbcon_atomic_flush_pending();

>> @@ -3327,7 +3316,8 @@ void console_flush_on_panic(enum con_flush_mode mode)
>>  	if (mode == CONSOLE_REPLAY_ALL)
>>  		__console_rewind_all();
>>  
>> -	if (!have_boot_console)
>> +	printk_get_console_flush_type(&ft);
>> +	if (ft.nbcon_atomic)
>>  		nbcon_atomic_flush_pending();
>
> I would use "ft.legacy_direct" also below for the decision about
> the legacy loop:
>
> -	if (legacy_allow_panic_sync)
> +	if (ft.legacy_direct)
> 		console_flush_all(false, &next_seq, &handover);

No, because it would mean the console is not flushed if the CPU is in
the deferred state. That is why I added an extra comment in the helper
saying that console_flush_on_panic() will _always_ flush directly.

I thought about adding that extra logic into the helper, but it really
isn't possible. @legacy_allow_panic_sync does not matter if there are no
nbcon consoles. So somehow the helper would need to know that CPU is in
the deferred state, but now it is allowed to do direct printing.

So it seemed more straight forward to have console_flush_on_panic() not
care about what is allowed (for legacy). It is going to flush directly
no matter what.

I will reconsider your suggestions about the helper and also compare the
end result at the call sites (also with threading changes applied) to
see what looks simpler to maintain.

John