[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8d06b383-03b4-453e-b4f8-f29f68a4bcd0@machnikowski.net>
Date: Fri, 16 Aug 2024 06:13:19 +0200
From: Maciek Machnikowski <maciek@...hnikowski.net>
To: Andrew Lunn <andrew@...n.ch>
Cc: Richard Cochran <richardcochran@...il.com>,
Vadim Fedorenko <vadim.fedorenko@...ux.dev>, netdev@...r.kernel.org,
jacob.e.keller@...el.com, darinzon@...zon.com, kuba@...nel.org
Subject: Re: [RFC 0/3] ptp: Add esterror support
On 16/08/2024 01:11, Andrew Lunn wrote:
> On Fri, Aug 16, 2024 at 12:06:51AM +0200, Maciek Machnikowski wrote:
>>
>>
>> On 15/08/2024 23:08, Richard Cochran wrote:
>>> On Thu, Aug 15, 2024 at 05:00:24PM +0200, Maciek Machnikowski wrote:
>>>
>>>> Think about a Time Card
>>>> (https://opencomputeproject.github.io/Time-Appliance-Project/docs/time-card/introduction).
>>>
>>> No, I won't think about that!
>>>
>>> You need to present the technical details in the form of patches.
>>>
>>> Hand-wavey hints don't cut it.
>>>
>>> Thanks,
>>> Richard
>>
>> This implementation addresses 3 use cases:
>>
>> 1. Autonomous devices that synchronize themselves to some external
>> sources (GNSS, NTP, dedicated time sync networks) and have the ability
>> to return the estimated error from the HW or FW loop to users
>
> So this contradicts what you said earlier, when you said the device
> does not know its own error, it has to be told it.
No - it’s a different type of device.
> So what is user space supposed to do with this error? And given that
> you said it is undefined what this error includes and excludes, how is
> user space supposed to deal with the error in the error? Given how
> poorly this is defined, what is user space supposed to do when the
> device changes the definition of the error?
Esterror returns the last error to the master clock that the device
synchronizes to.
In the case of PPS - is the last error registered on the top of the second.
In the case of PTP - the last error is calculated based on a transaction.
> The message Richard has always given is that those who care about
> errors freeze their kernel and do measurement campaign to determine
> what the real error is and then configure user space to deal with
> it. Does this error value negate the need for this?
AFIR, this comment was relevant to measuring errors coming from delays
inside the system.
>> 2. Multi function devices that may have a single isolated function
>> synchronizing the device clock (by means of PTP, or PPS or any other)
>> and letting other functions access the uncertainty information
>
> So this is the simple message passing API, which could be implemented
> purely in the core? This sounds like it should be a patch of its own,
> explaining the use case.
If functions are isolated then there is no path for passing the messages
other than through the device.
The trusted function that can control the clock will push the last error
and control the clock to synchronize it as best as it can, other
functions will get the time from the clock and additional info to
calculate the uncertainty.
>> 3. Create a common interface to read the uncertainty from a device
>> (currently you can use PMC for PTP, but there is no way of receiving
>> that information from ts2phc)
>
> That sounds like a problem with ts2phc? Please could you expand on why
> the kernel should be involved in feature deficits of user space tools?
Not really. Why would all userspace processes need to understand what
synchronizes the time currently and talk to the relevant tool?
All it cares is what the time is and the primitives for calculating
error boundaries and understand if the clock is synchronized good enough
for a given application.
>> Also this is an RFC to help align work on this functionality across
>> different devices ] and validate if that's the right direction. If it is
>> - there will be a patch series with real drivers returning uncertainty
>> information using that interface. If it's not - I'd like to understand
>> what should I improve in the interface.
>
> I think you took the wrong approach. You should first state in detail
> the use cases. Then show how you solve each use cases, both the user
> and kernel space parts, and include the needed changes to a real
> device driver.
>
> Andrew
Also there is one more use case I missed in that summary:
4. Device need to consume the information about the uncertainty to act
upon it. This is the case when you want to allow certain features to
work only if error boundaries requirements are met. For example you
don't want to allow launch time to work when the error of the clock is
huge, as your packets will launch precisely at unprecise time.
The current adjtime() API call without any flags can fill the time of
day and the last frequency correction. At the same time, the API's timex
structure consists of many other helpful information that are not
populated for PTP hardware clocks. This information consist of esterror
field which is there since kernel 0.99.13k. I'm not inventing a new API,
just trying to add hooks to implement existing APIs inside the PTP
subsystem.
Powered by blists - more mailing lists