lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20070725154656.54fd6f13@the-village.bc.nu>
Date:	Wed, 25 Jul 2007 15:46:56 +0100
From:	Alan Cox <alan@...rguk.ukuu.org.uk>
To:	Jarek Poplawski <jarkao2@...pl>
Cc:	Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...e.hu>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Marcin ??lusarz <marcin.slusarz@...il.com>,
	Jean-Baptiste Vignaud <vignaud@...dmail.fr>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	shemminger <shemminger@...ux-foundation.org>,
	linux-net <linux-net@...r.kernel.org>,
	netdev <netdev@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: 2.6.20->2.6.21 - networking dies after random time

> > The code in question lib8390.c does
> > 
> > 	disable_irq();
> > 	fiddle_with_the_network_card_hardware()
> > 	enable_irq();
> ...
> > 
> > No idea how this affects the network card, as the code there must be
> > able to handle interrupts, which are not originated from the card due to
> > interrupt sharing.
> 
> I think, in this last yesterday's patch Ingo could be right, yet!
> The comment at the beginnig points this is done like that because
> of chip's slowness. And problems with timing are mysterious.
> 
> On the other hand author of this code didn't use spin_lock_irqsave
> for some reason, probably after testing this option too. So, I hope
> this is the right path, but alas, I'm not sure this patch has to
> prove this 100%.

The author (me) didn't use spin_lock_irqsave because the slowness of the
card means that approach caused horrible problems like losing serial data
at 38400 baud on some chips. Rememeber many 8390 nics on PCI were ISA
chips with FPGA front ends.

> Anyway, in my opinion this situation where interrupts could/have_to
> be used for such strange things should confirm the need of more
> options for handling irqs individually.

Ok the logic behind the 8390 is very simple:

Things to know
	- IRQ delivery is asynchronous to the PCI bus
	- Blocking the local CPU IRQ via spin locks was too slow
	- The chip has register windows needing locking work

So the path was once (I say once as people appear to have changed it
in the mean time and it now looks rather bogus if the changes to use
disable_irq_nosync_irqsave are disabling the local IRQ)


	Take the page lock
	Mask the IRQ on chip
	Disable the IRQ (but not mask locally- someone seems to have
		broken this with the lock validator stuff)
		[This must be _nosync as the page lock may otherwise
			deadlock us]
	Drop the page lock and turn IRQs back on
	
	At this point an existing IRQ may still be running but we can't
	get a new one

	Take the lock (so we know the IRQ has terminated) but don't mask
the IRQs on the processor
	Set irqlock [for debug]

	Transmit (slow as ****)

	re-enable the IRQ


We have to use disable_irq because otherwise you will get delayed
interrupts on the APIC bus deadlocking the transmit path.

Quite hairy but the chip simply wasn't designed for SMP and you can't
even ACK an interrupt without risking corrupting other parallel
activities on the chip.

Alan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ