lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 4 Oct 2016 16:52:21 +0200
From:   Petr Mladek <pmladek@...e.com>
To:     Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
Cc:     Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
        Jan Kara <jack@...e.cz>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Tejun Heo <tj@...nel.org>, Calvin Owens <calvinowens@...com>,
        linux-kernel@...r.kernel.org
Subject: Re: [RFC][PATCH 6/7] printk: use alternative printk buffers

On Fri 2016-09-30 13:15:46, Petr Mladek wrote:
> On Fri 2016-09-30 10:15:44, Sergey Senozhatsky wrote:
> > On (09/29/16 15:00), Petr Mladek wrote:
> > [..]
> > > > @@ -1791,7 +1791,7 @@ asmlinkage int vprintk_emit(int facility, int level,
> > > >  		zap_locks();
> > > >  	}
> > > >  
> > > > -	lockdep_off();
> > > > +	alt_printk_enter();
> > > 
> > > IMHO, we could not longer enter vprintk_emit() recursively. The same
> > > section that was guarded by logbuf_cpu is guarded by
> > > alt_printk_enter()/exit() now.
> > 
> > you might be very right here. I'll take a look.
> > 
> > > IMHO, we could remove all the logic around the recursion. Then we
> > > could even disable/enable irqs inside alt_printk_enter()/exit().
> > 
> > I was thinking of doing something like this; but that would require
> > storing 'unsigned long' flags in per-cpu data
> > 
> > 	alt_enter()
> > 	{
> > 		unsinged long flags;
> > 
> > 		local_irq_save(flags);
> > 		ctx = this_cpu_ptr();
> > 		ctx->flags = flags;
> > 		...
> > 	}
> > 
> > 	alt_exit()
> > 	{
> > 		ctx = this_cpu_ptr();
> > 		...
> > 		local_irq_restore(ctx->flags);
> > 	}
> >
> > and the decision was to keep `unsigned long flags' on stack in the
> > alt_enter/exit caller. besides in most of the cases we already have
> > it (in vprintk_emit() and console_unlock()).
> 
> I would pass the pointer to flags as alt_enter() parameter.
> > 
> > but I can certainly hide these details in alt_enter/exit.
> >
> > > > @@ -2479,7 +2490,9 @@ void console_unlock(void)
> > > >  	 */
> > > >  	raw_spin_lock(&logbuf_lock);
> > > >  	retry = console_seq != log_next_seq;
> > > > -	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
> > > > +	raw_spin_unlock(&logbuf_lock);
> > > > +	alt_printk_exit();
> > > > +	local_irq_restore(flags);
> > > 
> > > We should mention that this patch makes an obsolete artefact from
> > > printk_deferred(). It opens the door for another big cleanup and
> > > relief.
> > 
> > do you mean that, once alt_printk is done properly, we can drop
> > printk_deferred()? I was thinking of it, but decided not to
> > mention/touch it in this patch set.
> 
> My understanding is the following:
> 
> The difference between normal printk() and printk_deferred() is
> that the other does not call console_trylock()/console_unlock().
> It means that printk_deferred() can avoid recursion only from these
> two calls.
> 
> printk_deferred() is used only in scheduler and timekeeping code.
> Therefore it prevents only limited number of possible recursions
> and deadlocks at the moment.
> 
> This patch guards most of the two calls a more generic way.
> The redirected parts prevent recursion not only to into the
> code guarded by console_sem but also into parts guarded
> by lockbuf_lock.
> 
> By other words, this patch is supposed to handle a superset
> of the deadlocks that are currently prevented by printk_deferred().
> If this is true, we do not longer need printk_deferred().
> 
> The only question is if this patch guards enough parts of
> console_try_lock()/console_unlock() to handle the superset
> of the possible deadlocks.
> 
> I see that it does not guard two up_console_sem() calls
> from console_unlock(). But this can be fixed in the next
> version.
> 
> Or is there any other catch that I do not see at the moment?

And there is :-( The above logic looked at the problem only from
one side. It was about errors starting from the printk()
code itself, for example:

 vprintk_emit()
  console_unlock()
   up()                        << raw_spin_lock_irqsave(&sem->lock, flags);
    wake_up_process()
     try_to_wake_up()
      ttwu_queue()
       ttwu_activate()
        activate_task()
         enqueue_task()
          enqueue_task_fair()
           cfs_rq_of()
            task_of()
             WARN_ON_ONCE(!entity_is_task(se))
              vprintk_emit()
               console_trylock()
                down_trylock()
                 raw_spin_lock_irqsave(&sem->lock, flags)
                 ^^^^ deadlock


But it does no solve errors starting in the scheduler or
timekeeping code. I mean:

   any_function_waking_process()
    wake_up_process()
     try_to_wake_up()  // takes &p->pi_lock
      ttwu_queue()
       ttwu_activate()
        activate_task()
         enqueue_task()
          enqueue_task_fair()
           cfs_rq_of()
            task_of()
             WARN_ON_ONCE(!entity_is_task(se))
              vprintk_emit()
               console_trylock()   // success
	       console_unlock()
	         up_console_sem()
		  up()
		    __up()
		     wake_up_process()
		      try_to_wake_up()
		       raw_spin_lock_irqsave(&p->pi_lock, flags);
		       ^^^^ deadlock

The only thing that might help here is to call
alt_printk_enter()/exit() in wake_up_process() itself. Otherwise,
we still would need to keep the printk_deferred() stuff.

By other words, we might need to put alt_printk_enter()/exit()
into the scheduler and timekeeping code. In theory it might
be easier to maintain than the separated printk_deferred() calls.
But there might be some catches because we need to disable
the interrupts, ...

Sigh, this 2nd scenario is much more likely than the 1st one.
I guess that warnings in the scheduler/timekeeping code
will be triggered outside printk() most of the time.

It means that this approach might be much harder to sell
after all :-(

Best Regards,
Petr

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ