linux-kernel - Re: [tip:core/debug] debug lockups: Improve lockup detection

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090802192657.GA21882@elte.hu>
Date:	Sun, 2 Aug 2009 21:26:57 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	paulmck@...ux.vnet.ibm.com, mingo@...hat.com, hpa@...or.com,
	linux-kernel@...r.kernel.org, a.p.zijlstra@...llo.nl,
	torvalds@...ux-foundation.org, tglx@...utronix.de,
	linux-tip-commits@...r.kernel.org
Subject: Re: [tip:core/debug] debug lockups: Improve lockup detection


* Andrew Morton <akpm@...ux-foundation.org> wrote:

> On Sun, 2 Aug 2009 13:09:34 GMT tip-bot for Ingo Molnar <mingo@...e.hu> wrote:
> 
> > Commit-ID:  c1dc0b9c0c8979ce4d411caadff5c0d79dee58bc
> > Gitweb:     http://git.kernel.org/tip/c1dc0b9c0c8979ce4d411caadff5c0d79dee58bc
> > Author:     Ingo Molnar <mingo@...e.hu>
> > AuthorDate: Sun, 2 Aug 2009 11:28:21 +0200
> > Committer:  Ingo Molnar <mingo@...e.hu>
> > CommitDate: Sun, 2 Aug 2009 13:27:17 +0200
> > 
> > --- a/drivers/char/sysrq.c
> > +++ b/drivers/char/sysrq.c
> > @@ -24,6 +24,7 @@
> >  #include <linux/sysrq.h>
> >  #include <linux/kbd_kern.h>
> >  #include <linux/proc_fs.h>
> > +#include <linux/nmi.h>
> >  #include <linux/quotaops.h>
> >  #include <linux/perf_counter.h>
> >  #include <linux/kernel.h>
> > @@ -222,12 +223,7 @@ static DECLARE_WORK(sysrq_showallcpus, sysrq_showregs_othercpus);
> >  
> >  static void sysrq_handle_showallcpus(int key, struct tty_struct *tty)
> >  {
> > -	struct pt_regs *regs = get_irq_regs();
> > -	if (regs) {
> > -		printk(KERN_INFO "CPU%d:\n", smp_processor_id());
> > -		show_regs(regs);
> > -	}
> > -	schedule_work(&sysrq_showallcpus);
> > +	trigger_all_cpu_backtrace();
> >  }
> 
> I think this just broke all non-x86 non-sparc SMP architectures.

Yeah - it 'broke' them in the sense of them not having a working 
trigger_all_cpu_backtrace() implementation to begin with. (which 
breaks/degrades spinlock-debug to begin with so it's an existing 
problem)

One solution would be to do a generic trigger_all_cpu_backtrace() 
implementation that does the above schedule_work() approach.

I never understood why we proliferated all these different 
backtrace-triggering mechanisms instead of doing one good approach 
that everything uses.

> >  static struct sysrq_key_op sysrq_showallcpus_op = {
> > diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> > index 7717b95..9c5fa9f 100644
> > --- a/kernel/rcutree.c
> > +++ b/kernel/rcutree.c
> > @@ -35,6 +35,7 @@
> >  #include <linux/rcupdate.h>
> >  #include <linux/interrupt.h>
> >  #include <linux/sched.h>
> > +#include <linux/nmi.h>
> >  #include <asm/atomic.h>
> >  #include <linux/bitops.h>
> >  #include <linux/module.h>
> > @@ -469,6 +470,8 @@ static void print_other_cpu_stall(struct rcu_state *rsp)
> >  	}
> >  	printk(" (detected by %d, t=%ld jiffies)\n",
> >  	       smp_processor_id(), (long)(jiffies - rsp->gp_start));
> > +	trigger_all_cpu_backtrace();
> 
> Be aware that trigger_all_cpu_backtrace() is a PITA when you have 
> a lot of CPUs.
> 
> If a callsite is careful to ensure that the most important 
> information is emitted last then that might improve things.
> 
> otoh, log buffer overflow will truncate, I think.  So that info 
> needs to be emitted first too ;)
> 
> It's a PITA.

Yeah, it is - i'd expect larger systems to have larger log buffers. 
Lack of info was obviously a showstopper with the highest priority.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/