[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <5d0fbf2f-9feb-3f6b-49b0-39b74285b124@gmail.com>
Date: Wed, 27 Jun 2018 06:49:18 -0700
From: Eric Dumazet <eric.dumazet@...il.com>
To: Tariq Toukan <tariqt@...lanox.com>,
Eric Dumazet <eric.dumazet@...il.com>,
David Miller <davem@...emloft.net>
Cc: netdev <netdev@...r.kernel.org>,
Shawn Bohrer <sbohrer@...advisors.com>,
Shay Agroskin <shayag@...lanox.com>,
Eran Ben Elisha <eranbe@...lanox.com>
Subject: Re: [PATCH net-next] mlx4: do not use rwlock in fast path
On 06/27/2018 05:11 AM, Tariq Toukan wrote:
>
>
> On 09/02/2017 7:10 PM, Eric Dumazet wrote:
>> From: Eric Dumazet <edumazet@...gle.com>
>>
>> Using a reader-writer lock in fast path is silly, when we can
>> instead use RCU or a seqlock.
>>
>> For mlx4 hwstamp clock, a seqlock is the way to go, removing
>> two atomic operations and false sharing.
>>
>> Signed-off-by: Eric Dumazet <edumazet@...gle.com>
>> Cc: Tariq Toukan <tariqt@...lanox.com>
>> ---
>> drivers/net/ethernet/mellanox/mlx4/en_clock.c | 35 ++++++++--------
>> drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 2
>> 2 files changed, 19 insertions(+), 18 deletions(-)
>>
>
> Hi Eric,
>
> When my peer, Shay, modified mlx5 to adopt this same locking scheme/type, he noticed a degradation in packet rate.
> He got back to testing mlx4 and also noticed a degradation introduced by this patch.
>
> Perf numbers (single ring):
>
> mlx4:
> with rw-lock: ~8.54M pps
> with seq-lock: ~8.51M pps
>
> mlx5:
> With rw-lock: ~14.94M pps
> With seq-lock: ~14.48M pps
>
> Actually, this can be explained by the analysis below.
> In short, number of readers is significantly larger than of writers. Hence optimizing the readers flow would give better numbers. The issue is, the read/write lock might cause writers starvation. Maybe RCU fits best here?
>
> Degradation analysis:
> The patch changes the lock type which protects reads and updates of a variable ( (struct mlx4_en_dev).clock variable)
> This variable is used to convert the hw timestamp into skb->hwtstamps.
> This variable is read for each transmitted/received packet and updated only via ptp module and some overflow periodic work we have (maximum of 10 times per second)
> Meaning that there are much more readers than writers, and it’s best to optimize the readers flow.
>
Hi Tariq
Are you sure you enabled time stamps in your tests ?
mlx4_en_fill_hwtstamps() is _really_ called 8,540,000 times per second,
meaning a same amount of read_lock_irqsave()/read_unlock_irqrestore() is performed ?
You have a pretty damn good CPU it seems.
seqlock has no cost for a reader [1], other than reading one integer value and testing it.
[1] If this value never change (and is on a clean cache line).
Really this looks like ring->hwtstamp_rx_filter != HWTSTAMP_FILTER_ALL in your tests.
The numbers you gave just give one cycle difference per packet (half a nano second),
so I really doubt adding back the heavy read_lock_irqsave()/read_unlock_irqrestore()
could be faster.
Conceptually seqlock is some form of RCU, it really optimizes the readers flow.
Thanks
Powered by blists - more mailing lists