linux-kernel - Re: [PATCH printk v4 06/27] printk: nbcon: Add callbacks to synchronize with driver

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZiD3FNBZh_iMOVWY@pathway.suse.cz>
Date: Thu, 18 Apr 2024 12:33:56 +0200
From: Petr Mladek <pmladek@...e.com>
To: John Ogness <john.ogness@...utronix.de>
Cc: Sergey Senozhatsky <senozhatsky@...omium.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Thomas Gleixner <tglx@...utronix.de>, linux-kernel@...r.kernel.org,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Subject: Re: [PATCH printk v4 06/27] printk: nbcon: Add callbacks to
 synchronize with driver

On Wed 2024-04-17 17:00:42, John Ogness wrote:
> On 2024-04-17, Petr Mladek <pmladek@...e.com> wrote:
> >> We want to avoid using nbcon console ownership contention whenever
> >> possible. In fact, there should _never_ be nbcon console owership
> >> contention except for in emergency or panic situations.
> >>
> >> In the normal case, printk will use the driver-specific locking for
> >> synchronization. Previously this was achieved by implementing the
> >> lock/unlock within the write() callback. But with nbcon consoles that
> >> is not possible because the nbcon ownership must be taken outside of
> >> the write callback:
> >> 
> >> con->device_lock()
> >> nbcon_acquire()
> >> con->write_atomic() or con->write_thread()
> >> nbcon_release()
> >> con->device_unlock()
> >
> > This sounds like a strong requirement. So there should be a strong
> > reason
> 
> There is: PREEMPT_RT

This explains it!

I think that a lot of misunderstanding here is caused because
your brain is trained primary in "RT mode" ;-) While I am not
that familiar with the RT tricks and my brain is thinking
in classic preemption mode :-)

I am not sure how it is done in other parts of kernel code where
RT needed to introduce some tricks. But I think that we should
really start mentioning RT behavior in the commit messages and
and comments where the RT mode makes huge changes.


> > when nbcon_acquire() is safe enough in emergency context
> > then it should be safe enough in the normal context either.
> > Otherwise, we would have a problem.
> 
> Of course. That is not the issue.
> 
> > My understanding is that we want to take con->device_lock()
> > in the normal context from two reasons:
> >
> >   1. This is historical, king of speculation, and probably
> >      not the real reason.
> 
> Correct. Not a reason.
> 
> >   2. The con->device() defines the context in which nbcon_acquire()
> >      will be taken and con->write_atomic() called to make it
> >      safe against other operations with the device driver.
> >
> >      For example, con->device() from uart serial consoles would
> >      disable interrupts to prevent deadlocks with the serial
> >      port IRQ handlers.
> >
> >      Some other drivers might just need to disable preemption.
> >      And some (future) drivers might even allow to keep
> >      the preemption enabled.
> 
> (Side note: In PREEMPT_RT, all drivers keep preemption enabled.)

This explains everything. It is a huge game changer.

Sigh, I remember that you told me this on Plumbers. But my
non-RT-trained  brain forgot this "detail". Well, I hope that
I am not the only one and we should mention this in the comments.

> > I still have to shake my head around this. But I would first like
> > to know whether:
> >
> >    + You agree that nbcon_try_acquire() always have to be called with
> >      preemption disabled.
> 
> No, it must not. PREEMPT_RT requires preemption enabled. That has always
> been the core of this whole rework.

Got it! I have completely forgot that spin_lock() is a mutex in RT.

> >    + What do you think about explicitly disabling preemption
> >      in nbcon_try_acquire().
> 
> We cannot do it.
> 
> >    + If it is acceptable for the big picture. It should be fine for
> >      serial consoles. But I think that graphics consoles wanted to
> >      be preemptive when called in the printk kthread.
> 
> In PREEMPT_RT, all are preemptive.
> 
> > I am sure that it will be possible to make nbcon_try_acquire()
> > preemption-safe but it will need some more magic.
> 
> I am still investigating why you think it is not safe (as an inner lock
> for the normal case). Note that for emergency and panic situations,
> preemption _is_ disabled.

The race scenario has been mentioned in
https://lore.kernel.org/r/Zhj5uQ-JJnlIGUXK@localhost.localdomain

CPU0				CPU1

 [ task A ]

 nbcon_context_try_acquire()
   # success with NORMAL prio
   # .unsafe == false;  // safe for takeover

 [ schedule: task A -> B ]


				WARN_ON()
				  nbcon_atomic_flush_pending()
				    nbcon_context_try_acquire()
				      # success with EMERGENCY prio
				      # .unsafe == false;  // safe for takeover

				      # flushing
				      nbcon_context_release()

				      # HERE: con->nbcon_state is free
				      #       to take by anyone !!!


 nbcon_context_try_acquire()
   # success with NORMAL prio [ task B ]
   # .unsafe == false;  // safe for takeover

 [ schedule: task B -> A ]

 nbcon_enter_unsafe()
   nbcon_context_can_proceed()

BUG: nbcon_context_can_proceed() returns "true" because
     the console is owned by a context on CPU0 with
     NBCON_PRIO_NORMAL.

     But it should return "false". The console is owned
     by a context from task B and we do the check
     in a context from task A.


OK, let's look at it with the new RT perspective. Here, the
con->device_lock() plays important role.

The race could NOT happen in:

   + NBCON_PRIO_PANIC context because it does not schedule

   + NBCON_PRIO_EMERGENCY context because we explicitly disable preemption there

   + NBCON_NORMAL_PRIO context when we ALWAYS do nbcon_try_acquire() under
     con->device() lock. Here the con->device_lock() serializes
     nbcon_try_acquire() calls even between running tasks.


Everything makes sense now. And we are probable safe.

I have to double check that we really ALWAYS call nbcon_try_acquire()
under con->device() lock. And I have to think how to describe this
in the commit messages and comments.

Best Regards,
Petr