linux-kernel - Re: [PATCH printk v2 06/26] printk: nbcon: Ensure ownership release on failed emit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZdTCZqhZww8_WgSU@alley>
Date: Tue, 20 Feb 2024 16:16:54 +0100
From: Petr Mladek <pmladek@...e.com>
To: John Ogness <john.ogness@...utronix.de>
Cc: Sergey Senozhatsky <senozhatsky@...omium.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Thomas Gleixner <tglx@...utronix.de>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH printk v2 06/26] printk: nbcon: Ensure ownership release
 on failed emit

On Sun 2024-02-18 20:03:06, John Ogness wrote:
> Until now it was assumed that ownership has been lost when the
> write_atomic() callback fails. And nbcon_emit_next_record()
> directly returned false. However, if nbcon_emit_next_record()
> returns false, the context must no longer have ownership.
> 
> The semantics for the callbacks could be specified such that
> if they return false, they must have released ownership. But
> in practice those semantics seem odd since the ownership was
> acquired outside of the callback.
> 
> Ensure ownership has been released before reporting failure by
> explicitly attempting a release. If the current context is not
> the owner, the release has no effect.

Hmm, the new semantic is not ideal either. And I think that it is
even worse. The function still releases the owership even though
it has been acquired by the caller. In addition, it causes
a double unlock in a valid case. I know that the 2nd
nbcon_context_release() is a NOP but...

I would personally solve this by adding a comment into the code
and moving the check, see below.

> --- a/kernel/printk/nbcon.c
> +++ b/kernel/printk/nbcon.c
> @@ -891,17 +891,18 @@ static bool nbcon_emit_next_record(struct nbcon_write_context *wctxt)
>  	nbcon_state_read(con, &cur);
>  	wctxt->unsafe_takeover = cur.unsafe_takeover;
>  
> -	if (con->write_atomic) {
> +	if (con->write_atomic)
>  		done = con->write_atomic(con, wctxt);
> -	} else {

	This code path does not create a bad semantic. The semantic is
	as it is because the context might lose the ownership in "any"
	nested function.

	Well, it might deserve a comment, something like:

		/*
		 * nbcon_emit_next_record() should never be called for legacy
		 * consoles. Handle it as if write_atomic() have lost
		 * the ownership and try to continue.
		 */
> -		nbcon_context_release(ctxt);
> -		WARN_ON_ONCE(1);
> -		done = false;
> -	}
>  
> -	/* If not done, the emit was aborted. */
> -	if (!done)
> +	if (!done) {
> +		/*
> +		 * The emit was aborted, probably due to a loss of ownership.
> +		 * Ensure ownership was lost or released before reporting the
> +		 * loss.
> +		 */

Is there a valid reason when con->write_atomic() would return false
and still own the context?

If not, then this would hide bugs and cause double unlock in
the valid case.

> +		nbcon_context_release(ctxt);
>  		return false;

Even better solution might be to do the check at the beginning of
the function. It might look like:

	  if (WARN_ON_ONCE(!con->write_atomic)) {
		/*
		 * This function should never be called for legacy consoles.
		 * Handle it as if write_atomic() have lost the ownership
		 * and try to continue.
		 */
		nbcon_context_release(ctxt);
		return false;
	}


Best Regards,
Petr