[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bd24eeb0-318c-71a4-527f-02832b74250c@intel.com>
Date: Mon, 1 Aug 2022 16:29:40 -0700
From: Jacob Keller <jacob.e.keller@...el.com>
To: Ilya Evenbach <ievenbach@...ora.tech>,
Alison Chaiken <achaiken@...ora.tech>,
Steve Payne <spayne@...ora.tech>, <jesse.brandeburg@...el.com>,
<richardcochran@...il.com>, <netdev@...r.kernel.org>,
<intel-wired-lan@...ts.osuosl.org>
Subject: Re: Fwd: [PATCH] Use ixgbe_ptp_reset on linkup/linkdown for X550
On 8/1/2022 4:00 PM, Ilya Evenbach wrote:
>>> -----Original Message-----
>>> From: achaiken@...ora.tech <achaiken@...ora.tech>
>>> Sent: Monday, August 01, 2022 6:38 AM
>>> To: Brandeburg, Jesse <jesse.brandeburg@...el.com>;
>>> richardcochran@...il.com
>>> Cc: spayne@...ora.tech; achaiken@...ora.tech; alison@...-devel.com;
>>> netdev@...r.kernel.org; intel-wired-lan@...ts.osuosl.org
>>> Subject: [PATCH] Use ixgbe_ptp_reset on linkup/linkdown for X550
>>>
>>> From: Steve Payne <spayne@...ora.tech>
>>>
>>> For an unknown reason, when `ixgbe_ptp_start_cyclecounter` is called
>>> from `ixgbe_watchdog_link_is_down` the PHC on the NIC jumps backward
>>> by a seemingly inconsistent amount, which causes discontinuities in
>>> time synchronization. Explicitly reset the NIC's PHC to
>>> `CLOCK_REALTIME` whenever the NIC goes up or down by calling
>>> `ixgbe_ptp_reset` instead of the bare `ixgbe_ptp_start_cyclecounter`.
>>>
>>> Signed-off-by: Steve Payne <spayne@...ora.tech>
>>> Signed-off-by: Alison Chaiken <achaiken@...ora.tech>
>>>
>>
>> Resetting PTP could be a problem if the clock was not being synchronized with the kernel CLOCK_REALTIME,
>
> That is true, but most likely not really important, as the unmitigated
> problem also introduces significant discontinuities in time.
> Basically, this patch does not make things worse.
>
Sure, but I am trying to see if I can understand *why* things get wonky.
I suspect the issue is caused because of how we're resetting the
cyclecounter.
>>
>> and does result in some loss of timer precision either way due to the delays involved with setting the time.
>
> That precision loss is negligible compared to jumps resulting from
> link down/up, and should be corrected by normal PTP operation very
> quickly.
>
Only if CLOCK_REALTIME is actually being synchronized. Yes, that is
generally true, but its not necessarily guaranteed.
>>
>> Do you have an example of the clock jump? How much is it?
>
> 2021-02-12T09:24:37.741191+00:00 bench-12 phc2sys: [195230.451]
> CLOCK_REALTIME phc offset 61 s2 freq -36503 delay 2298
> 2021-02-12T09:24:38.741315+00:00 bench-12 phc2sys: [195231.451]
> CLOCK_REALTIME phc offset 169 s2 freq -36377 delay 2294
> 2021-02-12T09:24:39.741407+00:00 bench-12 phc2sys: [195232.451]
> CLOCK_REALTIME phc offset 195213702387037 s2 freq +100000000 delay
> 2301
> 2021-02-12T09:24:40.741489+00:00 bench-12 phc2sys: [195233.452]
> CLOCK_REALTIME phc offset 195213591220495 s2 freq +100000000 delay
> 2081
>
Thanks.
I think what's actually going on is a bug in the
ixgbe_ptp_start_cyclecounter function where the system time registers
are being reset.
What hardware are you operating on? Do you know if its an X550 board? It
looks like this has been the case since a9763f3cb54c ("ixgbe: Update PTP
to support X550EM_x devices").
The start_cyclecounter was never supposed to modify the current time
registers, but resetting it to 0 as it does for X550 devices would give
the exact behavior you're seeing.
Powered by blists - more mailing lists