netdev - Re: [PATCH net-next 0/2] net: broadcom: Adaptive interrupt coalescing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Tue, 27 Mar 2018 02:21:42 +0300
From:   Tal Gilboa <talgi@...lanox.com>
To:     Florian Fainelli <f.fainelli@...il.com>, netdev@...r.kernel.org
Cc:     davem@...emloft.net, jaedon.shin@...il.com, pgynther@...gle.com,
        opendmb@...il.com, michael.chan@...adcom.com, gospo@...adcom.com,
        saeedm@...lanox.com
Subject: Re: [PATCH net-next 0/2] net: broadcom: Adaptive interrupt coalescing

On 3/27/2018 1:29 AM, Florian Fainelli wrote:
> On 03/26/2018 03:04 PM, Florian Fainelli wrote:
>> On 03/26/2018 02:16 PM, Tal Gilboa wrote:
>>> On 3/23/2018 4:19 AM, Florian Fainelli wrote:
>>>> Hi all,
>>>>
>>>> This patch series adds adaptive interrupt coalescing for the Gigabit
>>>> Ethernet
>>>> drivers SYSTEMPORT and GENET.
>>>>
>>>> This really helps lower the interrupt count and system load, as
>>>> measured by
>>>> vmstat for a Gigabit TCP RX session:
>>>
>>> I don't see an improvement in system load, the opposite - 42% vs. 100%
>>> for SYSTEMPORT and 85% vs. 100% for GENET. Both with the same bandwidth.
>>
>> Looks like I did not extract the correct data the load could spike in
>> both cases (with and without net_dim) up to 100, but averaged over the
>> transmission I see the following:
>>
>> GENET without:
>>   1  0      0 1169568      0  25556    0    0     0     0 130079 62795  2
>> 86 13  0  0
>>
>> GENET with:
>>   1  0      0 1169536      0  25556    0    0     0     0 10566 10869  1
>> 21 78  0  0
>>
>>> Am I missing something? Talking about bandwidth, I would expect 941Mb/s
>>> (assuming this is TCP over IPv4). Do you know why the reduced interrupt
>>> rate doesn't improve bandwidth?
>>
>> I am assuming that this comes down to a latency, still capturing some
>> pcap files to analyze the TCP session with wireshark and see if that is
>> indeed what is going on. The test machine is actually not that great

I would expect 1GbE full wire speed on almost any setup. I'll try 
applying your code on my setup and see what I get.

>>
>>> Also, any effect on the client side (you
>>> mentioned enabling TX moderation for SYSTEMPORT)?
>>
>> Yes, on SYSTEMPORT, being the TCP IPv4 client, I have the following:
>>
>> SYSTEMPORT without:
>>   2  0      0 191428      0  25748    0    0     0     0 86254  264  0 41
>> 59  0  0
>>
>> SYSTEMPORT with:
>>   3  0      0 190176      0  25748    0    0     0     0 45485 31332  0
>> 100  0  0  0
>>
>> I don't get top to agree with these load results though but it looks
>> like we just have the CPU spinning more, does not look like a win.
> 
> The problem appears to be the timeout selection on TX, ignoring it
> completely allows us to keep the load average down while maintaining the
> bandwidth. Looks like NAPI on TX already does a good job, so interrupt
> mitigation on TX is not such a great idea actually...

I saw a similar behavior for TX. For me the issue was too many 
outstanding bytes without a completion (defined to be 256KB by sysctl 
net.ipv4.tcp_limit_output_bytes). I tested on a 100GbE connection so 
with reasonable timeout values I already waited too long (4 TSO 
sessions). For the 1GbE case this might have no effect since you need a 
very long timeout. I'm currently working on adding TX support for dim. 
If you don't see a good benefit currently you might want to wait a 
little with TX adaptive interrupt moderation. Maybe only adjust static 
moderation for now?

> 
> Also, doing UDP TX tests shows that we can lower the interrupt count by
> setting an appropriate tx-frames (as expected), but we won't be lowering
> the CPU load since that is inherently a CPU intensive work. Past

Do you see higher TX UDP bandwidth? If you are bounded by CPU on both 
cases I would at least expect higher bandwidth with less interrupts 
since you reduce work from the CPU.

> tx-frames=64, the bandwidth completely drops because that would be 1/2
> of the ring size.
>