linux-kernel - Re: hostile takeover: Re: [PATCH printk v2 3/8] printk: nbcon: Add acquire/release logic

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87o7iqrvvx.fsf@jogness.linutronix.de>
Date:   Tue, 29 Aug 2023 12:28:10 +0206
From:   John Ogness <john.ogness@...utronix.de>
To:     Petr Mladek <pmladek@...e.com>
Cc:     Sergey Senozhatsky <senozhatsky@...omium.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        linux-kernel@...r.kernel.org,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Subject: Re: hostile takeover: Re: [PATCH printk v2 3/8] printk: nbcon: Add
 acquire/release logic

On 2023-08-09, Petr Mladek <pmladek@...e.com> wrote:
>> Add per console acquire/release functionality. The console 'locked'
>> state is a combination of multiple state fields:
>> 
>>   - Hostile takeover
>> 
>>       The new owner takes the console over without 'req_prio'
>>       handshake.
>> 
>>       This is required when friendly handovers are not possible,
>>       i.e. the higher priority context interrupted the owning
>>       context on the same CPU or the owning context is not able
>>       to make progress on a remote CPU.
>
> I always expected that there would be only one hostile takeover.
> It would allow to take the lock a harsh way when the friendly
> way fails.

You are referring to the hostile takeover when unsafe. A hostile
takeover when safe is still considered a hostile takeover. (At least,
until now that is how it was considered. More below...)

>> All other policy decisions have to be made at the call sites:
>> 
>>   - What is marked as an unsafe section.
>>   - Whether to spinwait if there is already an owner.
>>   - Whether to attempt a hostile takeover when safe.
>>   - Whether to attempt a hostile takeover when unsafe.
>
> But there seems to be actually two variants. How they are
> supposed to be used, please?
>
> I would expect that a higher priority context would always
> be able to takeover the lock when it is in a safe context.
>
> By other words, "hostile takeover when safe" would be
> the standard behavior for context with a higher priority.

The difference is that with "hostile takeover when safe" there is a
foreign CPU that still thinks it has the lock, even though it does not.

> By other words, the difference between normal takeover and
> "hostile takeover when safe" is that the 1st one has to
> wait until the current owner calls nbcon_enter_unsafe().
> But the result is the same. The current owner might
> prematurely end after calling nbcon_enter_unsafe().
>
> Maybe, this another relic from the initial more generic approach?

I suppose so. But then why not try the "hostile takeover when safe"
first and only use the handover request as a fallback? Like this:

1. try direct
2. try hostile takeover when safe
3. try handover
4. try hostile takeover when unsafe

Then we should remove the "hostile" label for #2 and just call it:

1. try direct
2. try safe takeover
3. try handover
4. try hostile takeover

(I guess this is how you imagined things should be.)

>> +/**
>> + * struct nbcon_context - Context for console acquire/release
>> + * @console:		The associated console
>> + * @spinwait_max_us:	Limit for spinwait acquire
>> + * @prio:		Priority of the context
>> + * @unsafe:		This context is in an unsafe section
>
> This seems to be an input value for try_acquire(). It is
> controversial.
>
> I guess that it removes the need to call nbcon_enter_unsafe()
> right after try_acquire_console().

Yes. It removes the "safe window" between try_acquire_console() and
nbcon_enter_unsafe() by allowing to acquire and mark unsafe
atomically. For example, this would be useful for the port_lock()
wrapper we discussed. But it is not strictly necessary. I can remove it.

Below I answer your comments anyway.

> Hmm, this semantic is problematic:
>
>   1. The result would be non-paired enter_unsafe()/exit_unsafe()
>      calls.

The paired calls are either:

try_acquire(unsafe=1) -> release()

or

try_acquire(unsafe=0) + enter_unsafe() -> exit_unsafe() + release()

>   2. I would personally expect that this is an output value
>      set by try_acquire() so that the context might know
>      how it got the lock.

The state structure is used to communicate this information.

>   3. Strictly speaking, as an input value, it would mean that
>      try_acquire() is called when the console is in "unsafe" state.
>      But the caller does not know in which state the console
>      is until it acquires the lock.

This is not about the current state. The calling _context_ wants to set
a certain state. The caller wants to acquire the lock and atomically
mark this as the beginning of an unsafe section. But as I wrote, this is
not strictly necessary. Only an efficient convenience.

> For me it is not easy to remember which permutation is used where ;-)
> It would be easier if we remove the "hostile when safe" variant.
> Then 3 variables might be enough. I suggest something like:
>
>   state.unsafe		 Console is busy and takeover is not safe.
>
>   state.unsafe_takeover  A hostile takeover in an unsafe state happened
> 			 in the past. The console can't be safe until
> 			 re-initialized.
>
>   ctxt.allow_unsafe_takeover Allow hostile takeover even if unsafe.
> 			Can be used only with PANIC prio. Might cause
> 			a system freeze when the console is used later.

Since "safe takeovers" should be handled equivalent to "handovers" then
I agree with your suggestion.

John