[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20161202011143.GC468@jagdpanzerIV>
Date: Fri, 2 Dec 2016 10:11:43 +0900
From: Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
To: Petr Mladek <pmladek@...e.com>
Cc: Peter Zijlstra <peterz@...radead.org>,
Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Jan Kara <jack@...e.cz>, Tejun Heo <tj@...nel.org>,
Calvin Owens <calvinowens@...com>,
Thomas Gleixner <tglx@...utronix.de>,
Mel Gorman <mgorman@...hsingularity.net>,
Steven Rostedt <rostedt@...dmis.org>,
Ingo Molnar <mingo@...hat.com>,
Laura Abbott <labbott@...hat.com>,
Andy Lutomirski <luto@...nel.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Kees Cook <keescook@...omium.org>, linux-kernel@...r.kernel.org
Subject: Re: [RFC][PATCHv4 6/6] printk: remove zap_locks() function
On (12/01/16 14:36), Petr Mladek wrote:
[..]
> > > > > Note that the same code is newly used to flush also the printk_safe
> > > > > per-CPU buffers. It means that logbuf_lock is zapped also when
> > > > > flushing these new buffers.
> > > > >
> > > >
> > > > Note that (raw_)spin_lock_init() as done here and in
> > > > printk_nmi_flush_on_panic() can wreck the lock state and doesn't ensure
> > > > a subsequent spin_lock() of said lock will actually work.
> > > >
> > > > The very best solution is to simply ignore the lock in panic situations
> > > > rather than trying to wreck it.
> > >
> > > do you mean that we can enterily drop the spin_lock_init()? or is there
> > > something else?
> >
> > You should not touch the lock in any way shape or form in the panic
> > path. Just ignore all locking and do the console writes (which gets you
> > into whole different pile of crap).
>
> And this is my fear. I am not sure if the other crap is better than
> the current one.
yeah, that's a good point.
> One crazy idea. A compromise might be to switch into a timelimed locking
> in the panic mode when there are still more CPUs active. If a spin
> lock is not available within X thousands of cycles, there is probably
> a deadlock and we should just enter the critical section. It would
> preserve some reasonable synchronization but it will allow to move
> forward.
logbuf spin_lock is just one of the locks. we also have scheduler spinlocks,
console drivers spinlocks, semaphore spinlock, etc. the messages, on the other
hand, are already in the memory (per-CPU buffers), so they will make it into
the core file (if there will be one).
> Another solution would be to use the temporary buffers if the lock
> is not available and push it into the main buffer and consoles later
> when there is only one CPU running. In this stage, we do not need
> to synchronize and could just skip locking as you suggest.
that's interesting. the problem here is that smp_send_stop() does not
guarantee that all the remaining CPUs will stop by the time it returns
arch/arm/kernel/smp.c
void smp_send_stop(void)
{
unsigned long timeout;
struct cpumask mask;
cpumask_copy(&mask, cpu_online_mask);
cpumask_clear_cpu(smp_processor_id(), &mask);
if (!cpumask_empty(&mask))
smp_cross_call(&mask, IPI_CPU_STOP);
/* Wait up to one second for other CPUs to stop */
timeout = USEC_PER_SEC;
while (num_online_cpus() > 1 && timeout--)
udelay(1);
if (num_online_cpus() > 1)
pr_warn("SMP: failed to stop secondary CPUs\n");
}
> > Put another way, don't do silly things like spin_lock() when you're in a
> > hurry to get your panics out.
> >
> > > spin_lock_init() either does not improve anything or let
> > > us to, at least, move the messages from per-CPU buffers to the logbuf.
> >
> > So spin_lock_init() will completely wreck the lock. And this being the
> > recursion path, not a panic path, we could have continued running the
> > kernel no problem.
>
> printk_nmi_flush_on_panic() is called from panic(). It means that we
> will do this only when the system is really going down. Which is a nice
> improvement. The current code zaps the locks during any Oops.
correct. well, not any oops, but 'oops && printk recursion' combo
if (unlikely(logbuf_cpu == this_cpu)) {
/*
* If a crash is occurring during printk() on this CPU,
* then try to get the crash message out but make sure
* we can't deadlock. Otherwise just return to avoid the
* recursion and return - but flag the recursion so that
* it can be printed at the next appropriate moment:
*/
if (!oops_in_progress && !lockdep_recursing(current)) {
recursion_bug = true;
local_irq_restore(flags);
return 0;
}
zap_locks();
}
other than that - yes, now we do (...we are going to do) it only
from the panic() path.
-ss
Powered by blists - more mailing lists