linux-kernel - Re: [Linux 2.6.29-rc2] BUG: using smp_processor

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090130140620.GD17401@elte.hu>
Date:	Fri, 30 Jan 2009 15:06:20 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	"Rafael J. Wysocki" <rjw@...k.pl>
Cc:	Frédéric Weisbecker <fweisbec@...il.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Maciej Rutecki <maciej.rutecki@...il.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [Linux 2.6.29-rc2] BUG: using smp_processor_id() in preemptible


* Rafael J. Wysocki <rjw@...k.pl> wrote:

> On Thursday 29 January 2009, Ingo Molnar wrote:
> > 
> > * Rafael J. Wysocki <rjw@...k.pl> wrote:
> > 
> > > On Tuesday 27 January 2009, Ingo Molnar wrote:
> > > > 
> > > > * Rafael J. Wysocki <rjw@...k.pl> wrote:
> > > > 
> > > > > > In fact whatever check you put in it's _always_ going to be 
> > > > > > fundamentally more fragile than direct instrumentation: you cannot 
> > > > > > possibly check all possible places that enable interrupts. (they could 
> > > > > > be disabling interrupts as a _restore_irqs() sequence for example)
> > > > > 
> > > > > In this particular case, I'm not really interested in that.  What I'm 
> > > > > interested in is which driver's ->suspend_late() or ->resume_early() (or 
> > > > > the equivalents for sysdevs) has enabled interrupts, which is quite easy 
> > > > > to check directly.
> > > > 
> > > > But this is exactly what it does - without any need for debug checks 
> > > > spread around!
> > > > 
> > > > You'll get a _full stack dump_ from the very driver that is enabling 
> > > > interrupts! You dont get a trace - you get a stack dump of the very place 
> > > > that is buggy. It does not get any better than that.
> > > 
> > > I'm not going to argue.
> > > 
> > > Nevertheless, IMO something like the patch below should be sufficient to catch
> > > these bugs.
> > > 
> > > Thanks,
> > > Rafael
> > > 
> > > 
> > > ---
> > >  drivers/base/power/main.c |   12 ++++++++++++
> > >  drivers/base/sys.c        |   21 ++++++++++++++++-----
> > >  include/linux/pm.h        |   18 ++++++++++++++++++
> > >  3 files changed, 46 insertions(+), 5 deletions(-)
> > 
> > hm, so now you sprinkle debug checks all around the code, instead of 
> > putting in a single pair of:
> > 
> >     force_irqs_off_start();
> >     ...
> >     force_irqs_off_end();
> 
> And what debug options exactly would that require to be set to work?

hm, if you worry about that aspect: we could make it seemlessly enabled if 
PM_DEBUG is enabled.

> > which would catch everything that your checks would catch - and it 
> > would catch more.
> 
> Except that the checks trigger in specific places, so if a check 
> triggers you know precisely where the bug happened regardless of what 
> garbage is in the call trace.

This argument is 100% mystery to me. Do you really not see the quality 
difference between a stack trace generated _right at the buggy piece of 
code_ and a warning later on that might (or might not) trigger?

Especially considering that your approach wont catch such bugs:

   ...
   spin_unlock_irq();
   ...
   spin_lock_irq();
   ...

Or such bugs:

   local_irq_enable();
   ...
   local_irq_disable();

Or such bugs:

   spin_lock_irq_save(&lock1, flags);
   ...
           spin_lock_irqsave(&lock2, flags);
           ...
           spin_unlock_irq(&lock2);          /* accidental bug */
   ...
   spin_unlock_irq_restore(&lock1, flags);

Such types of bugs might be especially hard to find in practice, if the 
window where irqs are enabled is small. There is no guarantee at all that 
accidental irq enabling survives a critical section - it can be 
re-disabled in the normal flow of things very easily.

And even if we are lucky and if the irqs stay enabled by the time the 
callback returns, what if your warning flags some big and complex driver, 
one line of which is buggy?

If you had the choice, what would you prefer - a stack dump done at the 
point of incident (pinpointing the driver, the subsystem and the buggy 
function with its full callframe), or your "this driver is buggy" generic 
warning with no specificity about where that bug might be?

Stacktraces _at the point of incident_, and a _guaranteed_ facility that 
_enforces_ that irqs are off during the whole resume cycle are just about 
the highest quality debug info and debug protection we can get in such 
situations.

I really dont understand your points about this.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/