[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <f314f3c5-0641-42e3-be56-3173fdcf0977@gmail.com>
Date: Wed, 27 Mar 2024 17:07:46 +0100
From: Richard Gobert <richardbgobert@...il.com>
To: Paolo Abeni <pabeni@...hat.com>, Eric Dumazet <edumazet@...gle.com>
Cc: davem@...emloft.net, kuba@...nel.org, willemdebruijn.kernel@...il.com,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-kselftest@...r.kernel.org
Subject: Re: [PATCH net-next v4 4/4] net: gro: move L3 flush checks to
tcp_gro_receive
Paolo Abeni wrote:
> On Tue, 2024-03-26 at 18:25 +0100, Richard Gobert wrote:
>> Paolo Abeni wrote:
>>> Hi,
>>>
>>> On Tue, 2024-03-26 at 16:02 +0100, Richard Gobert wrote:
>>>> This patch is meaningful by itself - removing checks against non-relevant
>>>> packets and making the flush/flush_id checks in a single place.
>>>
>>> I'm personally not sure this patch is a win. The code churn is
>>> significant. I understand this is for performance's sake, but I don't
>>> see the benefit???
>>>
>>
>> Could you clarify what do you mean by code churn?
>
> The diffstat of this patch is not negligible and touches very sensitive
> areas.
>
diff mainly touches flush/flush_id/is_atomic, the new code should be
less complex. I agree this is sensitive as it is part of core GRO -
I checked all relevant flows manually, but I can also create more
tests and ensure that logic remains the same.
>>> he changelog shows that perf reports slightly lower figures for
>>> inet_gro_receive(). That is expected, as this patch move code out of
>>> such functio. What about inet_gro_flush()/tcp_gro_receive() where such
>>> code is moved?
>>>
>>
>> Please consider the following 2 common scenarios:
>>
>> 1) Multiple packets in the GRO bucket - the common case with multiple
>> packets in the bucket (i.e. running super_netperf TCP_STREAM) - each layer
>> executes a for loop - going over each packet in the bucket. Specifically,
>> L3 gro_receive loops over the bucket making flush,flush_id,is_atomic
>> checks.
>
> Only for packets with the same rx hash.
>
Right, but there are only 8 GRO buckets, so a collision can still happen
on multiple concurrent streams.
>> For most packets in the bucket, these checks are not
>> relevant. (possibly also dirtying cache lines with non-relevant p
>> packets). Removing code in the for loop for this case is significant.
>>
>> 2) UDP/TCP streams which do not coalesce in GRO. This is the common case
>> for regular UDP connections (i.e. running netperf UDP_STREAM). In this
>> case, GRO is just overhead. Removing any code from these layers
>> is good (shown in the first measurement of the commit message).
>
> If UDP GRO is not enabled, there are no UDP packet staging in the UDP
> gro engine, the bucket list is empty.
>
>>> Additionally the reported deltas is within noise level according to my
>>> personal experience with similar tests.
>>>
>>
>> I've tested the difference between net-next and this patch repetitively,
>> which showed stable results each time. Is there any specific test you
>> think would be helpful to show the result?
>
> Anything that show measurable gain.
>
> Reporting the CPU utilization in the inet_gro_receive() function alone
> is not enough, as part of the load has been moved into
> gro_network_flush()/tcp_gro_receive().
>
Got it, the numbers I reported were only relevant to UDP flows (so
measuring perf top with -g flag showed the same improvement). I'll post in v5
numbers relevant to TCP as well.
Thanks
Powered by blists - more mailing lists