linux-kernel - Re: port lock: was: Re: [PATCH printk v1 11/18] printk: nobkl: Introduce printer threads

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ZEK2mu9ikDhDeVvu@alley>
Date:   Fri, 21 Apr 2023 18:15:22 +0200
From:   Petr Mladek <pmladek@...e.com>
To:     John Ogness <john.ogness@...utronix.de>
Cc:     Sergey Senozhatsky <senozhatsky@...omium.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        linux-kernel@...r.kernel.org,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Subject: Re: port lock: was: Re: [PATCH printk v1 11/18] printk: nobkl:
 Introduce printer threads

On Thu 2023-04-20 15:33:10, Petr Mladek wrote:
> On Thu 2023-04-20 12:39:31, John Ogness wrote:
> I know. And the hostile takeover is not my concern.
> 
> My concern are races between write_atomic() in emergency context
> and other driver code serialized only by the port->lock.
> 
> We need an API that will make sure that any code serialized
> by port->lock is properly serialized against write->atomic()
> when the console is registered.

I though more about it. My idea is the following:

A. The nbcon side might have basically four modes
   for taking the new nbcon lock. It might have four interfaces:

    nbcon_trylock(struct console *con,
		  enum cons_prio prio);
    nbcon_trylock_emergency(struct console *con,
		  enum cons_prio prio);
    nbcon_trylock_panic(struct console *con,
		  enum cons_prio prio);
    nbcon_lock(struct console *con,
		  enum cons_prio prio);

, where

   + nbcon_trylock() would use the current approach for
     the printk kthread. It means that it would try to get
     the lock with a timeout. But it would never try to
     steel the lock.

   + nbcon_trylock_emergency() would use the current approach
     used in emergency context. It would busy wait and
     then try to steel the lock. But it would take over the lock
     only when it is in safe context.

    + nbcon_trylock_panic() would behave the same way as
      nbcon_trylock_emergency(). But it would allow to
      take over the lock even when it is unsafe. It might
      still fail when it is not called on the CPU that
      handles the panic().

    + nbcon_lock() would wait until the lock is really
      available.

  and

  enum cons_prio would be one of the four priorities.

  The API should disable cpu migration to make sure that
  it will stay the same until the lock is released.

  The caller should rememner the priority somewhere,
  e,g. in struct cons_ctxt.


B. The port->lock side would switch to the new nbcon lock
   when the console is registered. There are two big questions
   that come to my mind:

  1. The original code does not expect that it might lose
     the lock.

     It should be enough to mark the entire section .unsafe.
     In that case, only the final panic() call might steel
     the lock.


  2. The console registration must be done a safe way
     to make sure that all callers will use the same
     real lock (port->lock or nbcon_lock).


  IMHO, the uart_port part might look like:

void uart_port_lock_irqsafe(struct uart_port *port,
	int *cookie,
	unsigned long &flags)
{
	struct console *con;

try_again:
	/* Synchrnonization against console registration. */
	*cookie = console_srcu_read_lock();

	con = rcu_access_pointer(nbcon->cons);

	if (!can_use_nbcon_lock(con)) {
		/* Only the port lock is available. */
		spin_lock_irqsafe(&port->lock, *flags);
		port->locked = LOCKED_BY_PORT_LOCK;
		return;
	}


	/*
	 * The nbcon lock is available. Take it instead of
	 * the port->lock. The only exception is when
	 * there is registration in progress. In this case,
	 * port->lock has to be taken as well.
	 *
	 * It will always be taken only with the normal priority.
	 * when called from the port lock side.
	 */
	nbcon_lock(con, CON_PRIO_NORMAL);
	local_irq_save(*flags);

	if (cons->registration_in_progress) {
		spin_lock(&port->lock);
		port->locked = LOCKED_BY_BOTH_LOCKS;
	} else {
		port->locked = LOCKED_BY_NBCON_LOCK;
	}

	/*
	 * Makes sure that only nbcon_lock_panic() would
	 * be able to steel this lock.
	 */
	if (!nbcon_enter_unsafe(con, CON_PRIO_NORMAL)) {
		revert locks;
		goto try_again;
	}
}

void uart_port_unlock_irqrestore(struct uart_port *port,
	int *cookie, unsigned long *flags)
{
	struct console *con;

	con = rcu_access_pointer(nbcon->cons);

	switch (port->locked) {
	LOCKED_BY_PORT_LOCK:
		spin_unlock_irqrestore(&port->lock, *flags);
		break;
	LOCKED_BY_BOTH_LOCKS:
		spin_unlock(&port->lock);
		fallthrough;
	LOCKED_BY_NBCON_LOCK:
		nbcon_exit_unsafe(con, CON_PRIO_NORMAL);
		local_irq_restore(*flags);
		nbcon_unlock(con, CON_PRIO_NORMAL);
	};

	console_srcu_unlock(*cookie);
}


and finally the registration:

void register_console(struct console *newcon)
{
[...]
	if (con->flags & CON_NBCON) {
		nbcon_lock(con);
		nbcon->regisration_in_progress = true;
		nbcon_unlock(con);

		/*
		 * Make sure that callers are locked by both
		 * nbcon_lock() and port->lock()
		 */
		synchronize_srcu();
	}

	/* Insert the console into console_list */

	if (con->flags & CON_NBCON) {
		nbcon_lock(con);
		nbcon->regisration_in_progress = false;
		nbcon_unlock(con);
	}
[...]
}

and similar thing in uregister_console().

I am not sure if I should send this on Friday evening. But I reworked
it many times and I do not longer see any obvious reason why it
could not work.

My relief is that it builds on top of your code. It basically just
adds the port_lock interface. I hope that it would actually simplify
things a lot.

Well, a huge challenge might be to replace all spin_lock(port->lock)
calls with the new API. There is a lot of code shared between various
consoles and we wanted to migrate them one-by-one.

On the other hand, the new port_lock() API should behave as simple
spin lock when port->cons is a legacy console.

Best Regards,
Petr