[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ffadc3fd-5f2c-461a-8132-7a9ee89add79@intel.com>
Date: Fri, 15 Nov 2024 10:39:12 -0800
From: Jacob Keller <jacob.e.keller@...el.com>
To: Vadim Fedorenko <vadfed@...a.com>, Vadim Fedorenko
<vadim.fedorenko@...ux.dev>, Pavan Chebbi <pavan.chebbi@...adcom.com>,
"Andrew Lunn" <andrew+netdev@...n.ch>, Paolo Abeni <pabeni@...hat.com>,
Michael Chan <michael.chan@...adcom.com>, Jakub Kicinski <kuba@...nel.org>
CC: Richard Cochran <richardcochran@...il.com>, <netdev@...r.kernel.org>,
"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Simon Horman <horms@...nel.org>
Subject: Re: [PATCH net-next v2] bnxt_en: optimize gettimex64
On 11/14/2024 3:48 AM, Vadim Fedorenko wrote:
> Current implementation of gettimex64() makes at least 3 PCIe reads to
> get current PHC time. It takes at least 2.2us to get this value back to
> userspace. At the same time there is cached value of upper bits of PHC
> available for packet timestamps already. This patch reuses cached value
> to speed up reading of PHC time.
>
> Signed-off-by: Vadim Fedorenko <vadfed@...a.com>
> ---
> v1 -> v2:
> * move cycles extension to a helper function and reuse it for both
> timestamp extension and gettimex64() function
>
> I did some benchmarks on host with Broadcom Thor NIC trying to build
> histogram of time spent to call clock_gettime() to query PTP device
> over million iterations.
> With current implementation the result is (long tail is cut):
>
> 2200ns: 902624
> 2300ns: 87404
> 2400ns: 4025
> 2500ns: 1307
> 2600ns: 581
> 2700ns: 261
> 2800ns: 104
> 2900ns: 36
> 3000ns: 32
> 3100ns: 24
> 3200ns: 16
> 3300ns: 29
> 3400ns: 29
> 3500ns: 23
>
> Optimized version on the very same machine and NIC gives next values:
>
> 900ns: 865436
> 1000ns: 128630
> 1100ns: 2671
> 1200ns: 727
> 1300ns: 397
> 1400ns: 178
> 1500ns: 92
> 1600ns: 16
> 1700ns: 15
> 1800ns: 11
> 1900ns: 6
> 2000ns: 20
> 2100ns: 11
>
> That means pct(99) improved from 2300ns to 1000ns.
> ---
The driver already has to read and cache the values, so there's not much
value in repeating that every CLOCK_GETTIME system call. This also
simplifies the system timestamp process, and avoids the duplicate reads.
Clever!
Reviewed-by: Jacob Keller <jacob.e.keller@...el.com>
Powered by blists - more mailing lists