linux-kernel - Re: [RESEND] [PATCH v2] [BUGFIX] x86/x86_64: fix CPU offlining triggered "active" device IRQ interrruption

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090604211744.GB9213@us.ibm.com>
Date:	Thu, 4 Jun 2009 14:17:45 -0700
From:	Gary Hade <garyhade@...ibm.com>
To:	Gary Hade <garyhade@...ibm.com>
Cc:	"Eric W. Biederman" <ebiederm@...ssion.com>, mingo@...e.hu,
	mingo@...hat.com, linux-kernel@...r.kernel.org, tglx@...utronix.de,
	hpa@...or.com, x86@...nel.org, yinghai@...nel.org, lcm@...ibm.com
Subject: Re: [RESEND] [PATCH v2] [BUGFIX] x86/x86_64: fix CPU offlining
	triggered "active" device IRQ interrruption

On Thu, Jun 04, 2009 at 01:04:37PM -0700, Gary Hade wrote:
> On Wed, Jun 03, 2009 at 02:13:23PM -0700, Eric W. Biederman wrote:
> > Gary Hade <garyhade@...ibm.com> writes:
> > 
> > > Correct, after the fix was applied my testing did _not_ show
> > > the lockups that you are referring to.  I wonder if there is a
> > > chance that the root cause of those old failures and the root
> > > cause of issue that my fix addresses are the same?
> > >
> > > Can you provide the test case that demonstrated the old failure
> > > cases so I can try it on our systems?  Also, do you recall what
> > > mainline version demonstrated the old failure 
> > 
> > The irq migration has already been moved to interrupt context by the
> > time I started working on it.  And I managed to verify that there were
> > indeed problems with moving it out of interrupt context before my code
> > merged.
> > 
> > So if you want to reproduce it reduce your irq migration to the essentials.
> > Set IRQ_MOVE_PCNTXT, and always migrate the irqs from process context
> > immediately.
> > 
> > Then migrate an irq that fires at a high rate rapidly from one cpu to
> > another.
> > 
> > Right now you are insulated from most of the failures because you still
> > don't have IRQ_MOVE_PCNTXT.  So you are only really testing your new code
> > in the cpu hotunplug path.
> 
> OK, I'm confused. 
> 
> It sounds like you want me force IRQ_MOVE_PCNTXT so that I can
> test in a configuration that you say is already broken.  Why
> in the heck would this config, where you expect lockups without 
> the fix, be a productive environment in which to test the fix?  

Sorry, I did not say this well.  Trying again:

Why would this config, where you already expect lockups
for reasons that you say are not addressed by the fix, be
a productive environment in which to test the fix?

Gary

-- 
Gary Hade
System x Enablement
IBM Linux Technology Center
503-578-4503  IBM T/L: 775-4503
garyhade@...ibm.com
http://www.ibm.com/linux/ltc
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/