[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200807050321.26877.opurdila@ixiacom.com>
Date: Sat, 5 Jul 2008 03:21:26 +0300
From: Octavian Purdila <opurdila@...acom.com>
To: Patrick Ohly <patrick.ohly@....de>
Cc: netdev@...r.kernel.org
Subject: Re: [RFC] support for IEEE 1588
On Friday 04 July 2008, Patrick Ohly wrote:
> Hallo Tavi,
>
> Interesting initiative. I'm employed by Intel and had the chance to do
> some exploratory work on software PTP support for Intel's new 82576
> Gigabit Ethernet Controller [1], which introduces hardware time stamping
> for PTP packets. I modified the open source PTPd so that it uses the
> more accurate hardware time stamps instead of time stamps generated by
> the Linux IP stack. The advantage was 50x higher accuracy under load.
> You can read more about that in a paper [2].
>
Nice work, will need some time to chew on the paper :)
> > 2. TX path - this is a bit more complicated since we need a new mechanism
> > to wait for a packet transmission on wire, from users-space.
> > - add a new flag for the skb to request TX stamping
> > - add a new control message to propagate the TX stamping request from
> > userspace to the skb
>
> Forgive me my ignorance, can you provide more details how that would
> work?
>
> How about adding a new flag for send/sendto/sendmsg() instead of a new
> control message?
>
The control message will allow us to associate a cookie with the skb (say, for
instance, that the app will receive the value of the skb pointer). That
cookie will be returned when we will get the TX stamp, and will thus allow us
to match the stamp and the packet.
For the PTPd, this is probably not required, but it will help with
applications that have multiple outstanding TX packets.
> > - when the driver will send the packet will get the stamp from the TX
> > completion ring; the driver will then propagate the stamp either to
> > (a) the skb stamp field, or (b) some special structure - this to avoid
> > keeping the skb around
> > - the special structure or the skb will be linked to a special queue in
> > the socket and a POLLPRI event will be generated
> > - the application will use recvmsg and will receive a new control message
> > which contains the timestamp from the socket special queue
>
> Sounds a bit complicated to me. The trick currently used by PTPd might
> be more elegant and/or require less changes: it enables looping of
> outgoing packets with IP_MULTICAST_LOOP. The RX timestamp of the looped
> packet is then used as approximation for the TX time stamp of the
> original outgoing packet. Clearly this is inaccurate, in particular
> under load, but it is very easy to use.
>
I am probably missing something: I thought IP_MULTICAST_LOOP is done in
software... If so, how would the hardware be able to timestamp?
> When a driver gets a skb with the request to generate a TX time stamp,
> it could send the packet, upon completion obtain the time stamp from the
> hardware and feed the packet and the time stamp back to the upper layers
> as if it had just been received. Would that work?
>
> The user space then obtains TX time stamps just like RX time stamps and
> can use the payload to determine what kind of time stamp it got. That
> also avoids the need for special cookies to detect packet loss or
> reordering.
>
For a generic protocols (not PTP) I think this will not work: e.g an UDP
packet could be dropped or hit the wrong socket (due to missmatch between the
source and destination port).
> So far all that we get out of this is access to the raw time stamps.
> There may be some use for that, as Tavi said, but it would be a lot more
> interesting if the kernel would transform the raw time stamps into
> system time stamps if the user space process wants that. Then it can be
> used by a modified PTPd to synchronize the system time inside a cluster
> a lot more accurately than it is currently possible with NTP (think
> sub-microsecond accuracy instead of milliseconds).
>
> For the paper I tried out two different ways of synchronizing the system
> time with the NIC time. The one called "Assisted System Time" could be
> implemented relatively easily inside the IP stack: the driver only has
> to provide access to the NIC's hardware clock. Then the layer above it
> can sample the system time/NIC time offset at regular intervals; when
> they drift apart, that drift rate can be tracked as part of the
> measurements and be taken into account when transforming from one time
> base into the other. The other method ("Two-Level PTP") is more
> complicated and didn't bring much benefit.
>
Will look into it, thanks for pointing it out.
Thanks,
tavi
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists