lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 10 Aug 2007 11:13:03 +0100
From:	Stephen Hemminger <shemminger@...ux-foundation.org>
To:	Ingo Molnar <mingo@...e.hu>
Cc:	Jarek Poplawski <jarkao2@...pl>,
	Thomas Gleixner <tglx@...utronix.de>,
	John Stoffel <john@...ffel.org>, linux-kernel@...r.kernel.org,
	vignaud@...dmail.fr, marcin.slusarz@...il.com,
	torvalds@...ux-foundation.org, akpm@...ux-foundation.org,
	alan@...rguk.ukuu.org.uk, linux-net@...r.kernel.org,
	netdev@...r.kernel.org
Subject: Re: 2.6.23-rc2: WARNING: at kernel/irq/resend.c:70
 check_irq_resend()

On Fri, 10 Aug 2007 11:33:53 +0200
Ingo Molnar <mingo@...e.hu> wrote:

> 
> * Jarek Poplawski <jarkao2@...pl> wrote:
> 
> > > > > +               }
> > > > >  #ifdef CONFIG_HARDIRQS_SW_RESEND
> > > 
> > > we used the hw-resend method unconditionally, right?
> > 
> > Right: unconditionally on a condition they are not edges...
> > 
> > But, since not resending at all seems to work so good in testing, I 
> > thought, _SW_RESEND could be considered as an unnecessarily 
> > complicated alternative.
> > 
> > Now, I'm a bit confused...
> 
> the idea is multi-pronged:
> 
>  - Primarily, we want to fix the regression. 2.6.20 worked, 2.6.21 
>    didnt, that has to be fixed, no matter what - end of story. But we've 
>    got a wide selection of patches for that purpose now, so what matters 
>    at this point is the secondary question:
> 
>  - we want to know _why exactly_ the hang happens. We now have a pretty 
>    good theory: hw-resend hangs the IO-APIC. (there is a delicate dance
>    between local APICs and IO-APICs for level-triggered irqs, and if we
>    interject via hw-resending via the local APIC, existing races, hw
>    bugs or weaknesses in our hw-resend implementation might be exposed)
> 
> and even though we now have a wide selection of patches we really want 
> to get to the bottom of the problem so that we can fix the bug that got 
> exposed: apparently hw resend doesnt always work with level-triggered 
> irqs.
> 
> Note that the hw-resend sequence can trigger _even without our original 
> patch that triggered the regression_, it's just much less likely to 
> happen, so this is a pre-existing IO-APIC/APIC code bug that could 
> trigger anytime, and which we want to see fixed.
> 
> To confirm this theory - does the debug-patch below fix the hang? If it 
> fixes the hang then the theory is confirmed and then the right solution 
> is to retrigger an IRQ for level-triggered irqs with the proper 
> trigger-type set.
> 


All this might explain some of the IRQ loss, I saw with sky2 on mac mini.
Basically, the device would act like it missed an IRQ. The chip and PCI registers
all said "device has asserted IRQ" but the IRQ handler never got called.

Then again, the problem might be completely different since this was with
PCI-E with either MSI or INTA mode.

The workaround was to perodically call the soft IRQ handler and that would
clear the IRQ, but it's not something I want to keep.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ