[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z/b2yKMXNwjqTKy4@shell.armlinux.org.uk>
Date: Wed, 9 Apr 2025 23:38:00 +0100
From: "Russell King (Oracle)" <linux@...linux.org.uk>
To: Kory Maincent <kory.maincent@...tlin.com>
Cc: Simon Horman <horms@...nel.org>, Andrew Lunn <andrew@...n.ch>,
Heiner Kallweit <hkallweit1@...il.com>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Marek BehĂșn <kabel@...nel.org>,
Richard Cochran <richardcochran@...il.com>,
Thomas Petazzoni <thomas.petazzoni@...tlin.com>,
Maxime Chevallier <maxime.chevallier@...tlin.com>,
linux-kernel@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: [PATCH net-next v2 2/2] net: phy: Add Marvell PHY PTP support
On Wed, Apr 09, 2025 at 06:34:35PM +0100, Russell King (Oracle) wrote:
> On Wed, Apr 09, 2025 at 06:04:14PM +0200, Kory Maincent wrote:
> > On Wed, 9 Apr 2025 14:35:17 +0100
> > "Russell King (Oracle)" <linux@...linux.org.uk> wrote:
> >
> > > On Wed, Apr 09, 2025 at 02:38:20PM +0200, Kory Maincent wrote:
> > > > Ok, thanks for the tests and these information.
> > > > Did you run ptp4l with this patch applied and did you switch to Marvell PHY
> > > > PTP source?
> > >
> > > This was using mvpp2, but I have my original patch as part of my kernel
> > > rather than your patch.
> >
> > So you are only testing the mvpp2 PTP. It seems there is something broken with
> > it. I don't think it is related to my work.
>
> Yes, and it has worked - but probably was never tested with PTPDv2 but
> with linuxptp. As it was more than five years ago when I worked on this
> stuff, I just can't remember the full details of the test setup I used.
>
> I think the reason I gave up running PTP on my network is the problems
> that having the NIC bound into a Linux bridge essentially means that
> you can't participate in PTP on that machine. That basically means a
> VM host machine using a bridge device for the guests can't use PTP
> to time sync itself.
>
> Well, it looks like the PHY based timestamping also isn't working -
> ptp4l says its failing to timestamp transmitted packets, but having
> added debug, the driver _is_ timestamping them, so the timestamps
> are getting lost somewhere in the networking layer, or are too late
> for ptp4l, which only waits 1ms, and the schedule_delayed_work(, 2)
> will be about 20ms at HZ=100. Increasing the wait in ptp4l to 100ms
> still doesn't appear to get a timestamp. According to the timestamps
> on the debug messages, it's only taking 10ms to return the timestamp.
>
> So, at the moment, ptp looks entirely non-functional. Or the userspace
> tools are broken.
Right, got to the bottom of it at last. I hate linuxptp / ptp4l. The
idea that one looks at the source, sees this:
res = poll(&pfd, 1, sk_tx_timeout);
if (res < 1) {
pr_err(res ? "poll for tx timestamp failed: %m" :
"timed out while polling for tx timestamp");
pr_err("increasing tx_timestamp_timeout may correct "
"this issue, but it is likely caused by a driver bug");
finds this in the same file:
int sk_tx_timeout = 1;
So it seemed obvious and logical that increasing that initialiser would
increase the _default_ timeout... but no, that's not the case, because,
ptp4l.c does:
sk_tx_timeout = config_get_int(cfg, NULL, "tx_timestamp_timeout");
unconditionally, and config.c has a table of config options along with
their defaults... meaning that initialiser above for sk_tx_timeout
means absolutely nothing, and one _has_ to use a config file.
With that fixed, ptp4l's output looks very similar to that with mvpp2 -
which doesn't inspire much confidence that the ptp stack is operating
properly with the offset and frequency varying all over the place, and
the "delay timeout" messages spamming frequently. I'm also getting
ptp4l going into fault mode - so PHY PTP is proving to be way more
unreliable than mvpp2 PTP. :(
Now, the one thing I can't get rid of is the receive timestamp
overflow warning - this occurs whenever e.g. ptp4l is restarted,
and is caused by there being no notification that PTP isn't being
used anymore.
Consequently, we end up with the PHY queuing a timestamp for a Sync
packet which it sees on the network, but because nothing is wanting
the packets (because e.g. ptp4l has been stopped) there's no packets
queued into the receive queue to take this timestamp, so we stop
polling the PHY for timestamps.
If we continue to rapidly poll the PHY, then we could needlessly
waste cycles - because nothing tells us "we have no one wanting
hardware timestamps anymore" which seems to be a glaring hole in
the PTP design.
Not setting DISTSOVERWRITE seems like a solution, but that seems to
lead to issues with timestamps being lost.
Well, having spent much of the afternoon and all evening on this,
and all I see are problems that don't seem to have solutions.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
Powered by blists - more mailing lists