lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 21 Feb 2010 05:20:03 +0000 (GMT)
From:	"Maciej W. Rozycki" <macro@...ux-mips.org>
To:	"H. Peter Anvin" <hpa@...or.com>
cc:	Suresh Siddha <suresh.b.siddha@...el.com>,
	"ebiederm@...ssion.com" <ebiederm@...ssion.com>,
	"yinghai@...nel.org" <yinghai@...nel.org>,
	"mingo@...e.hu" <mingo@...e.hu>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [patch 2/2] x86, irq: use 0x20 for the IRQ_MOVE_CLEANUP_VECTOR
 instead of 0x1f

Hi,

 I have finally managed to get back to it -- sorry for the delay, I'm 
running out of my time.

On Mon, 1 Feb 2010, H. Peter Anvin wrote:

> > As we are using the code from 2.6.28 and no one noticed/complained about
> > this issue for more than 1.5 years, probably the pentium APIC issue is
> > not wide-spread.

 Correct, the problem only affected B1, B3 and B5 steppings of the P54C 
Pentium processor.  These are probably extremely rare these days.  It was 
fixed later on.

 But they can be run-time detected -- if we don't support them anymore 
(assuming keeping them supported is too much of maintenance hassle; Linux 
used to be proud to support hardware nobody else seemed to care of 
anymore, so it's really disappointing to see it go), we should panic() on 
bootstrap and print an appropriate message.  They are CPUID family 5, 
model 2 and steppings 1, 2 and 4, respectively.

 Also the note in arch/x86/kernel/smp.c should be adjusted accordingly 
stating that the erratum is no longer worked around (preferably stating 
the last Linux version it was).

> I *think* it's applicable to all CPUs Pentium III or earlier (but not
> Pentium 4 -- I'm unsure about the Pentium M.)  I don't know about
> non-Intel CPUs; I have a vague memory of the Transmeta Efficeon (the
> only Transmeta chip with an APIC) *not* having this limitation.
> 
> The exact reference is SDM vol 3A 10.8.4, page 10-41 [rev 033US Dec 2009]:
> 
> For the P6 family and Pentium processors, the IRR and ISR registers can
> queue no more than two interrupts per priority level, and will reject
> other interrupts that are received within the same priority level.
> 
> However, section 10.8.2 bullet 3 on page 10-38 (and the flowchart on
> page 10-37) indicate that such an interrupt is returned to the IOAPIC
> for a later retry, i.e. it's not lost.  As such, it's not clear to me
> from reading the SDM that there is actually a problem here...

 Here's the text of the relevant erratum:

 "4AP. Three Interrupts of the Same Priority Causes Lost Local Interrupt

PROBLEM: If three interrupts of the same priority level (priority is 
defined in the 4MSB of the interrupt vector), arrive in the following 
circumstance:

1. A interrupt is being serviced by the CPU, and the proper bit is set in 
   the ISR register.

2. A second interrupt is received from the serial bus.

3. At the same time a third interrupt is received from a local interrupt 
   source, which could include local pins (LVT), an APIC timer (Timer), 
   self-interrupt, or an APIC error interrupt.

If the first two conditions are met the third interrupt will be lost, and 
not serviced.

IMPLICATION: The third interrupt will be ignored and not serviced if the 
specific scenario happens as listed above.

WORKAROUND: The problem can be avoided if different priority levels are 
assigned to serial interrupts, than to local interrupts.

STATUS: For the steppings affected see the Summary Table of Changes at the 
beginning of this section."

so you can see the retry mechanism is not the problem here (or, to be 
exact, the lack of an equivalent for local interrupts seems to be).

 I'm not sure how fatal for Linux the implications are though; even then 
it looks to me the approach we took was an overkill.  It's enough to 
guarantee that the APIC error interrupt, the APIC timer interrupt and 
self-IPIs (do we use any at all though?) do not share their priority 
level(s) with any external interrupt (but they can share the level with 
one another).  We only use ever LINT0/1 interrupts as NMIs (for the NMI 
watchdog and the system error, respectively), or ExtINT (in the case of 
LINT0), so this erratum does not apply to them.

 So what priority level(s) do we use for the APIC error and timer 
interrupts (and self-IPIs, if any) these days and how does it correspond 
to the priorities of external interrupts?  It looks like we can work 
around this erratum indefinitely quite cheaply (and should document it 
decently so that newcomers do not break it like it happened with many bits 
in our APIC code many times already; yes, lost hope, I know...).

  Maciej
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ