netdev - [RFT] r8169 changes against 2.6.23-rc3

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <46CB3DE4.4060107@gmail.com>
Date:	Tue, 21 Aug 2007 12:32:52 -0700
From:	Bruce Cole <bacole@...il.com>
To:	Francois Romieu <romieu@...zoreil.com>, netdev@...r.kernel.org
CC:	bacole@...il.com
Subject: [RFT] r8169 changes against 2.6.23-rc3

On 8/20/07, Dirk wrote:
 >> So it seems that when the driver tries to queue a packet while the
 >> controller is busy processing the queue, the newly queued packet does
 >> not get noticed by the controller (until further packet activity 
occurs).
 >> Perhaps there is a problem with the memory barriers when adding to the
 >> TX queue, but I'm a newbie on linux kernel memory barriers.
 >
 >One thing I noticed a while ago (march) is that floodpinging (ping -f)
 >the r8169 host from an external system also increases performance
 >without changing code.
Yes, I just tried this and saw the same result.  Makes perfect sense - 
if the TX queue is normally getting stuck until TCP retransmits, then 
keeping the TX queue busy keeps the queue from remaining stuck.
I think this is a good demonstration that the underlying problem is a 
stuck TX queue as suggested.

 >I ended up (until now perhaps :-) with disabling the onboard nic and
 >adding an e1000 card.

Yes, ditching the realtek interface and going with an ad-on nic seems to 
be what everyone has been doing to get around this problem.  Perhaps 
you'd like to try the busy-wait workaround with ndelay(10)?  It has 
saved me from buying an e1000 card as well.

Speaking of the e1000, I notice that its TX queue processing code for 
that driver includes spin_lock_irqsave()/spin_unlock_irqrestore() 
protection on access to the queue.  The r8169 driver seems to be missing 
equivalent code.  Last time I dealt with kernel locking bugs was in the 
old days of splnet()/splx(), so I could use some help here, but I 
suspect this could be fixed with more careful locking.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html