[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250410180205.455d8488@kmaincent-XPS-13-7390>
Date: Thu, 10 Apr 2025 18:02:05 +0200
From: Kory Maincent <kory.maincent@...tlin.com>
To: "Russell King (Oracle)" <linux@...linux.org.uk>
Cc: Simon Horman <horms@...nel.org>, Andrew Lunn <andrew@...n.ch>, Heiner
Kallweit <hkallweit1@...il.com>, "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, Paolo
Abeni <pabeni@...hat.com>, Marek Behún <kabel@...nel.org>,
Richard Cochran <richardcochran@...il.com>, Thomas Petazzoni
<thomas.petazzoni@...tlin.com>, Maxime Chevallier
<maxime.chevallier@...tlin.com>, linux-kernel@...r.kernel.org,
netdev@...r.kernel.org
Subject: Re: [PATCH net-next v2 2/2] net: phy: Add Marvell PHY PTP support
On Thu, 10 Apr 2025 16:41:06 +0100
"Russell King (Oracle)" <linux@...linux.org.uk> wrote:
> On Thu, Apr 10, 2025 at 11:17:54AM +0200, Kory Maincent wrote:
> > On Wed, 9 Apr 2025 23:38:00 +0100
> > "Russell King (Oracle)" <linux@...linux.org.uk> wrote:
> > > On Wed, Apr 09, 2025 at 06:34:35PM +0100, Russell King (Oracle) wrote:
> > >
> > > With that fixed, ptp4l's output looks very similar to that with mvpp2 -
> > > which doesn't inspire much confidence that the ptp stack is operating
> > > properly with the offset and frequency varying all over the place, and
> > > the "delay timeout" messages spamming frequently. I'm also getting
> > > ptp4l going into fault mode - so PHY PTP is proving to be way more
> > > unreliable than mvpp2 PTP. :(
> >
> > That's really weird. On my board the Marvell PHY PTP is more reliable than
> > MACB. Even by disabling the interrupt.
> > What is the state of the driver you are using?
>
> Right, it seems that some of the problems were using linuxptp v3.0
> rather than v4.4, which seems to work better (in that it doesn't
> seem to time out and drop into fault mode.)
>
> With v4.4, if I try:
>
> # ./ptp4l -i eth2 -m -s -2
> ptp4l[322.396]: selected /dev/ptp0 as PTP clock
> ptp4l[322.453]: port 1 (eth2): INITIALIZING to LISTENING on INIT_COMPLETE
> ptp4l[322.454]: port 0 (/var/run/ptp4l): INITIALIZING to LISTENING on
> INIT_COMPLETE ptp4l[322.455]: port 0 (/var/run/ptp4lro): INITIALIZING to
> LISTENING on INIT_COMPLETE ptp4l[328.797]: selected local clock
> 005182.fffe.113302 as best master
>
> that's all I see. If I drop the -2, then:
It seems you are still using your Marvell PHY drivers without my change.
PTP L2 was broken on your first patch and I fixed it.
I have the same result without the -2 which mean ptp4l uses UDP IPV4.
> and from that you can see that the offset and frequency are very much
> all over the place, not what you would expect from something that is
> supposed to be _hardware_ timestamped - which is why I say that NTP
> seems to be superior to PTP at least here.
>
> With mvpp2, it's a very similar story:
> ptp4l[628.834]: master offset 38211 s2 freq -29874 path delay 62949
> ptp4l[629.834]: master offset -41111 s2 freq -97733 path delay 66289
> ptp4l[630.834]: master offset 33131 s2 freq -35824 path delay 63864
> ptp4l[631.834]: master offset -55578 s2 freq -114594 path delay 63864
> ptp4l[632.833]: master offset 34110 s2 freq -41579 path delay 57582
> ptp4l[633.834]: master offset -13137 s2 freq -78593 path delay 60047
> ptp4l[634.834]: master offset 55063 s2 freq -14334 path delay 49425
> ptp4l[635.834]: master offset -41302 s2 freq -94180 path delay 49425
I can't tell about mvpp2 as I don't have board with this MAC but these values
seem really high and vary a lot. As this behavior is similar between the Marvell
PHY or the mvpp2 MAC maybe the issue comes indeed from your link partner.
> Again, offset all over the place, frequency also showing that it doesn't
> stabilise.
>
> This _could_ be because of the master clock being random - but then it's
> using the FEC PTP implementation with PTPD v2 - maybe either the FEC
> implementation is buggy or maybe it's PTPD v2 causing this. I have no
> idea how I can debug this - and I'm not going to invest in a "grand
> master" PTP clock on a whim just to find out that isn't the problem.
>
> I thought... maybe I can use the PTP implementation in a Marvell
> switch as the network master, but the 88E6176 doesn't support PTP.
>
> Maybe I can use an x86 platform... nope:
>
> # ethtool -T enp0s25
> Time stamping parameters for enp0s25:
> Capabilities:
> software-transmit
> software-receive
> software-system-clock
Still you could try with timestamping from software on the link partner.
On my side I am using a STM32MP157-DK as link partner.
If I set the DK board as PTP master and tell it to use software PTP (-S
parameter) it is still more reliable than yours.
ptp4l[4419.134]: master offset 136 s2 freq -1984 path delay 118390
ptp4l[4420.134]: master offset 1757 s2 freq -322 path delay 115888
ptp4l[4421.134]: master offset -1154 s2 freq -2706 path delay 115888
ptp4l[4422.134]: master offset -1652 s2 freq -3551 path delay 115888
ptp4l[4423.134]: master offset -1199 s2 freq -3593 path delay 115252
> PTP Hardware Clock: none
> Hardware Transmit Timestamp Modes: none
> Hardware Receive Filter Modes: none
>
> Anyway, let's try taking a tcpdump on the x86 machine of the sync
> packets and compare the deviation of the software timestamp to that
> of the hardware timestamp (all deviations relative to the first
> packet part seconds):
>
> 16:30:30.577298 - originTimeStamp : 1744299061 seconds, 762464622 nanoseconds
> 16:30:31.577270 - originTimeStamp : 1744299062 seconds, 762363987 nanoseconds
> -28us -100.635us
> 16:30:32.577303 - originTimeStamp : 1744299063 seconds, 762429696 nanoseconds
> +85us -34.926us
> 16:30:33.577236 - originTimeStamp : 1744299064 seconds, 762328728 nanoseconds
> -62us -135.894us
> 16:30:34.577280 - originTimeStamp : 1744299065 seconds, 762398770 nanoseconds
> -18us -65.852us
>
> We can see here that the timestamp from the software receive is far
> more regular than the origin timestamp in the packets, which, in
> combination with the randomness of both mvpp2 and the 88e151x PTP
> trying to sync with it, makes me question whether there is something
> fundamentally wrong with the FEC PTP implementation / PTPDv2.
So we come to the same conclusion, the issue comes from your link partner! ;)
Regards,
--
Köry Maincent, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com
Powered by blists - more mailing lists