lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20071115053751.GB26338@Krystal>
Date:	Thu, 15 Nov 2007 00:37:51 -0500
From:	Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>
To:	Rusty Russell <rusty@...tcorp.com.au>
Cc:	akpm@...ux-foundation.org, linux-kernel@...r.kernel.org,
	Andi Kleen <ak@....de>, "H. Peter Anvin" <hpa@...or.com>,
	Chuck Ebbert <cebbert@...hat.com>,
	Christoph Hellwig <hch@...radead.org>,
	Jeremy Fitzhardinge <jeremy@...p.org>,
	Ingo Molnar <mingo@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [patch 5/8] Immediate Values - x86 Optimization

* Rusty Russell (rusty@...tcorp.com.au) wrote:
> On Thursday 15 November 2007 15:06:10 Mathieu Desnoyers wrote:
> > * Rusty Russell (rusty@...tcorp.com.au) wrote:
> > > A stop_machine (or lightweight variant using IPI) would be sufficient and
> > > vastly simpler.  Trying to patch NMI handlers while they're running is
> > > already crazy.
> >
> > I wouldn't mind if it was limited to the code within do_nmi(), but then
> > we would have to accept potential GPF if
> >
> > A - the NMI or MCE code calls any external kernel code (printk,
> >   notify_die, spin_lock/unlock, die_nmi, lapic_wd_event (perfctr code,
> >   calls printk too for debugging)...
> 
> Sure, but as I pointed out previously, such calls are already best effort.  
> You can do very little safely from do_nmi(), and calling printk isn't one of 
> them,

yes and no.. do_nmi uses the "bust spinlocks" exactly for this. So this
is ok by design. Other than this, we can end up mixing up the console
data output with different sources of characters, but I doubt something
really bad can happen (like a deadlock).

> nor is grabbing a spinlock (well, actually you could as long as it's 
> *only* used by NMI handlers.  See any of those?).

Yup, see arch/x86/kernel/nmi_64.c : nmi_watchdog_tick()

It defines a spinlock to "Serialise the printks". I guess it's good to
protect against other nmi watchdogs running on other CPUs concurrently,
I guess.

> 
> > Therefore, if one decides to use the immediate values to
> > leave dormant spinlock instrumentation in the kernel, I wouldn't want it
> > to have undesirable side-effects (GPF) when the instrumentation is
> > being enabled, as rare as it could be.
> 
> It's overengineered, since it's less likely than deadlock already.
> 

So should we put a warning telling "enabling tracing or profiling on a
production system that also uses NMI watchdog could potentially cause a
crash" ? The rarer a bug is, the more difficult it is to debug. It does
not make the bug hurt less when it happens.

The normal thing to do when a potential deadlock is detected is to fix
it, not to leave it there under the premise that it doesn't matter since
it happens rarely. In our case, where we know there is a potential race,
I don't see any reason not to make sure it never happens. What's the
cost of it ?

arch/x86/kernel/immediate.o : 2.4K

let's compare..

kernel/stop_machine.o : 3.9K

so I think that code size is not an issue there, especially since the
immediate values are not meant to be deployed on embedded systems.

> > > I'd keep this version up your sleeve for they day when it's needed.
> >
> > If we choose to go this way, stop_machine would have to do a sync_core()
> > on every CPU before it reactivates interrupts for this to respect
> > Intel's errata.
> 
> Yes, I don't think stop_machine is actually what you want anyway, since you 
> are happy to run in interrupt context.  An IPI-based scheme is probably 
> better, and also has the side effect of iret doing the sync you need, IIUC.
> 

Yup, looping in IPIs with interrupts disabled should do the job there.
It's just awful for interrupt latency on large SMP systems :( Being
currently bad at it is not a reason to make it worse. If we have a CPU
that is within a high latency irq disable region when we send the IPI,
we can easily end up waiting for this critical section to end with
interrupts disabled on all CPUs. The fact that we would wait for the
longest interrupt disable region with IRQs disabled implies that we
increase the maximum latency of the system, by design. I'm not sure I
would like to be the new longest record-beating IRQ off region.

Mathieu

> Hope that clarifies,
> Rusty.

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ