lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1353793177.26346.283.camel@shinybook.infradead.org>
Date:	Sat, 24 Nov 2012 21:39:37 +0000
From:	David Woodhouse <dwmw2@...radead.org>
To:	netdev@...r.kernel.org
Cc:	romieu@...zoreil.com, jasowang@...hat.com, gilboad@...il.com,
	jgarzik@...hat.com, nathan@...verse.com.au
Subject: Re: 8139cp TX stall, timeout, failed recovery

On Sat, 2012-11-24 at 18:39 +0000, David Woodhouse wrote:
> This seems to be a consistent pattern — when we get that 0x0080
> interrupt and it dies, it's *very* soon after queueing a new tx:

Once I realise that the 'tx queued' message actually prints the number
of the *next* slot that'll be used, not the slot that was just filled,
that becomes obvious. The hardware takes only 30µs or so to consume the
descriptor that was just submitted. It isn't just coincidence that one
packet completes just as the *next* one is being submitted, as I
originally thought.

The hardware seems to asserts the 0x80 'Tx Descriptor Unavailable'
interrupt first, and the other bits (0x404) come later. I *often* get
into cp_tx() with only 0x80 in the IntrStatus bits, and the other bits
are often set before my heavily-debug-laden cp_tx() has even finished
running).

Register 0x82 indicates the low bits of the address of the most recently
consumed Tx descriptor, and always seems to agree with the driver's
processing of the ring. When we get a 0x80 interrupt, the most recently
consumed descriptor is always the one before tx_head, as you'd expect.
And tx_head always looks sane and (as long as you read it quickly) still
has the 'Own' bit set, as it should.

Eventually I get a 0x80 interrupt which *isn't* immediately followed by
the other 0x404 bits. And then the hardware has crapped itself and is no
longer eating descriptors. We submit more to the queue and we bang on
the TxPoll register *every* time, but that doesn't wake it.

Adding a cp_enable_irq() into the cp_tx_timeout() function at least
makes it recover *eventually*.

-- 
dwmw2


Download attachment "smime.p7s" of type "application/x-pkcs7-signature" (6171 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ