linux-kernel - Re: [Xen-devel] [GIT PULL] Fix lost interrupt race in Xen event channels

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <4C751975020000780001214C@vpn.id2.novell.com>
Date:	Wed, 25 Aug 2010 12:24:04 +0100
From:	"Jan Beulich" <JBeulich@...ell.com>
To:	"Daniel Stodden" <daniel.stodden@...rix.com>
Cc:	"Tom Kopec" <tek@....org>, "Jeremy Fitzhardinge" <jeremy@...p.org>,
	"Stable Kernel" <stable@...nel.org>,
	"Linus Torvalds" <torvalds@...ux-foundation.org>,
	"Xen-devel@...ts.xensource.com" <Xen-devel@...ts.xensource.com>,
	"Linux Kernel Mailing List" <linux-kernel@...r.kernel.org>
Subject: Re: [Xen-devel] [GIT PULL] Fix lost interrupt race in Xen
	 event channels

>>> On 25.08.10 at 12:04, Daniel Stodden <daniel.stodden@...rix.com> wrote:
> On Wed, 2010-08-25 at 03:52 -0400, Jan Beulich wrote:
>> >>> On 24.08.10 at 23:35, Jeremy Fitzhardinge <jeremy@...p.org> wrote:
>> > We worked out the root cause was that it was incorrectly treating Xen
>> > events as level rather than edge triggered interrupts, which works fine
>> > unless you're handling one interrupt, the interrupt gets migrated to
>> > another cpu and then re-raised.  This ends up losing the interrupt
>> > because the edge-triggering of the second interrupt is lost.
>> 
>> While this description would seem plausible at the first glance, it
>> doesn't match up with unmask_evtchn() already taking care of
>> exactly this case. Or are you implicitly saying that this code is
>> broken in some way (if so, how, and shouldn't it then be that
>> code that needs fixing, or removing if you want to stay with the
>> edge handling)?
> 
> Not broken, but a different problem. The unmask 'resend' only catches
> the edge lost if the event was raised while it was still masked. But
> level irq doesn't have to save PENDING state. In the Xen event migration
> case the edge isn't lost, but the upcall will drop the invocation when
> the handler is found inprogress on the previous cpu.

Hmm, indeed. But that problem must have existed in all post-2.6.18
kernels then... And that shouldn't be a problem with fasteoi, as that
one calls ->eoi() even when INPROGRESS was set (other than level,
which calls unmask only when it wasn't set).

>> I do however agree that using handle_level_irq() is problematic
>> (see 
> http://lists.xensource.com/archives/html/xen-devel/2010-04/msg01178.html),
>> but as said there I think using the fasteoi logic is preferable. No
>> matter whether using edge or level, the ->end() method will
>> never be called (whereas fasteoi calls ->eoi(), which would
>> just need to be vectored to the same function as ->end()).
>> Without end_pirq() ever called, you can't let Xen know of
>> bad PIRQs (so that it can disable them instead of continuing
>> to call the [now shortcut] handler in the owning domain).
> 
> Not an opinion, just confused: Isn't all that dealt with in
> chip->disable?

With disable_pirq() being empty (at least in the branches I
looked at)?

Jan

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/