netdev - Re: Strange problem with e1000 driver

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <48607931.20908@linux.vnet.ibm.com>
Date:	Tue, 24 Jun 2008 10:03:53 +0530
From:	Varun Chandramohan <varunc@...ux.vnet.ibm.com>
To:	Robert Hancock <hancockr@...w.ca>
CC:	vatsa@...ux.vnet.ibm.com, linux-kernel@...r.kernel.org,
	e1000-devel@...ts.sourceforge.net, jbarnes@...tuousgeek.org,
	greg@...ah.com, netdev@...r.kernel.org
Subject: Re: Strange problem with e1000 driver - ping packet loss

cc'ing netdev

Robert Hancock wrote:
> Srivatsa Vaddagiri wrote:
>> Hi,
>>     I happened to look at a system which was exhibiting poor ping
>> performance with e1000 driver (in 2.6.25) and had some questions 
>> regarding that.
>>
>> Ping test was done between the system and a laptop, which were connected
>> using a straight ethernet cable. Ping reported round trip times running
>> into seconds (!) and also packet loss.
>>
>> Upon some investigation, I found that the interrupt count field in
>> /proc/interrupts (associated with eth1) is not incrementing as fast as
>> it should. Moreover eth1 interrupt line is shared with the hard disk
>> interrupt (ata_piix) as below:
>>
>> # cat /proc/interrupts
>>
>> .
>>
>>  10:       2296    XT-PIC-XT        ata_piix, eth0, eth1
>>
>> .
>>
>> IRQ10 is thus being shared by both the hard disk and eth0/eth1.
>>
>> Here's the strange observation I made:
>>
>> When I initiate some disk activity (ex: dd if=/dev/zero 
>> of=/tmp/file), ping performance suddently shot up (round trip time in 
>> double digits ms, 0% packet loss)! I presume this is because that 
>> e1000 intr handler is called
>> whenever there was a interrupt from hard disk on IRQ10, which polled
>> NIC and processed packets immediately.
>>
>> As soon as I kill the background disk-write intensive job, ping
>> performance again dropped.
>>
>> This meant that e1000 NIC is having trouble interrupting the OS.
>>
>> Before I could jump up and say this is a hardware issue, I was told
>> that Windows works just fine on the server (and as well as 2.4 kernel,
>> which I couldnt verify) :(
>>
>>
>> Some more observations:
>>
>> 1. I tried setting e1000 parameters (RxIntDelay=0, RxAbsIntDelay=0,
>>    TxIntDelay=0, TxAbsIntDelay=0, InterruptThrottleRate=0). None of
>>    them helped.
>>
>> 2. When ping performance was poor, readprofile showed that system
>>    is mostly idle. This confirms that OS is not getting very
>>    frequenty interrupts from eth1 and hence idling.
>>
>> 3. When ping performance was poor, ethtool -S eth1 showed that
>>    rx_bytes was incrementing at a good pace, showing that the    NIC 
>> was receiving ping responses back, but not handing them over
>>    to OS for further processing
>>
>> 4. e1000 chipset is 82546GB
>>
>> 5. e1000e driver didnt work at all (it doesnt recognize the cards).
>>
>>
>> Any advice on how to fix this problem?
>
> Can you post your dmesg output from bootup with no special options 
> (noacpi, etc.) enabled?
> -- 
> To unsubscribe from this list: send the line "unsubscribe 
> linux-kernel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html