netdev - Re: [PATCH net 2/2] 8139cp: reset BQL when ring tx ring cleared

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1442528065.97487.6.camel@infradead.org>
Date:	Thu, 17 Sep 2015 23:14:25 +0100
From:	David Woodhouse <dwmw2@...radead.org>
To:	Francois Romieu <romieu@...zoreil.com>
Cc:	Stephen Hemminger <stephen@...workplumber.org>,
	David Miller <davem@...emloft.net>, netdev@...r.kernel.org
Subject: Re: [PATCH net 2/2]  8139cp: reset BQL when ring tx ring cleared

On Thu, 2015-09-17 at 22:44 +0200, Francois Romieu wrote:
> David Woodhouse <dwmw2@...radead.org> :
> > On Thu, 2015-09-17 at 12:36 +0100, David Woodhouse wrote:
> > > 
> > > Thanks; I'll try that. In fact since updating to 4.2 the problem has
> > > got worse — now the whole machine dies:
> > 
> > There is something very strange going on here. I've found two ways to
> > make it stop crashing when cp_tx_timeout() hits the 'popf' when
> > unlocking the spinlock.
> 
> cp_tx_timeout takes lock, disables irq, calls cp_clean_rings, thus
> plain dev_kfree_skb if a skb is still referenced in one of the
> rx/tx ring. You may replace it with dev_kfree_skb_any.

Well spotted; I've made that change locally. Although I don't think it
explains the symptoms. Not that I'm sure what *could*.

I've also found that adding a call to __cp_set_rx_mode() seems to fix
the RX after reset, in some tests. Especially the simulated one via the
hack in cp_set_wol(). I think that's necessary, if not sufficient — at
least on real hardware. I didn't see the problem at all when running in
qemu.

Sometimes, though, it still dies in an interrupt storm after re
-enabling IRQs:

[  900.004214] 8139cp 0000:00:0b.0 eth1: Transmit timeout, status  c   2b    0 80ff
[  900.011725] will lock...
[  900.014273] Handling tx timeout, flags 200296
[  900.018774] Will wake queue...
[  900.021645] Will unlock... flags 200296
[  900.021645] 8139cp 0000:00:0b.0 eth1: intr, status 0001 enable 80ff cmd 0c cpcmd 002b
[  900.021645] 8139cp 0000:00:0b.0 eth1: intr, status 0001 enable 80ff cmd 0c cpcmd 002b
... 
[  901.628439] 8139cp 0000:00:0b.0 eth1: intr, status 0001 enable 80ff cmd 0c cpcmd 002b
[  901.636291] 8139cp 0000:00:0b.0 eth1: intr, status 0011 enable 80ff cmd 0c cpcmd 002b
...
[  901.966243] 8139cp 0000:00:0b.0 eth1: intr, status 0011 enable 80ff cmd 0c cpcmd 002b
[  901.968353] 8139cp 0000:00:0b.0 eth1: intr, status 0051 enable 80ff cmd 0c cpcmd 002b
... forever...

And of course, even if I fix the TX timeout handling, I'd still like to
know why it's happening in the first place...

-- 
dwmw2


Download attachment "smime.p7s" of type "application/x-pkcs7-signature" (5691 bytes)