netdev - Re: Microchip net DSA with ptp4l getting tx_timeout failed msg using 6.3.12 kernel and KSZ9567 switch

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2259373.iZASKD2KPV@n95hx1g2>
Date: Fri, 25 Aug 2023 17:49:45 +0200
From: Christian Eggers <ceggers@...i.de>
To: Brian Hutchinson <b.hutchman@...il.com>
CC: <netdev@...r.kernel.org>, Vladimir Oltean <OlteanV@...il.com>,
	<arun.ramadoss@...rochip.com>, <rakesh.sankaranarayanan@...rochip.com>
Subject: Re: Microchip net DSA with ptp4l getting tx_timeout failed msg using 6.3.12 kernel and KSZ9567 switch

Hi Brian,

On Thursday, 24 August 2023, 21:03:32 CEST, Brian Hutchinson wrote:
> Update.  Top posting because I think this is my issue.
> 
> I dug further into my problem.  I'm using E2E and it looks like the
> mainlined Microchip KSZ DSA PTP code is only supporting P2P.
> 
> The 5.10.69 kernel that I was first able to get working with
> Christian's early pre-mainlined patches had:
> 0016-net-dsa-microchip-ksz9477-add-E2E-support.patch

sorry for this, but I forgot that you use E2E.  Unfortunately I
have no up-to-date patches for this, so you may try to port
the old patch yourself.

regards
Christian

> 
> ... which gets into the "sticky" bits of why these patches weren't
> accepted in the first place due to some Microchip specific
> implementation if I recall correctly.
> 
> Regards,
> 
> Brian
> 
> 
> On Thu, Aug 24, 2023 at 2:26 PM Brian Hutchinson <b.hutchman@...il.com> wrote:
> >
> > Hi Christian,
> >
> >
> > On Wed, Aug 23, 2023 at 9:29 AM Brian Hutchinson <b.hutchman@...il.com> wrote:
> > >
> > >
> > >
> > > On Wed, Aug 23, 2023 at 4:22 AM Christian Eggers <ceggers@...i.de> wrote:
> > >>
> > >> Hi Brian,
> > >>
> > >> I just return from my holidays...
> > >
> > >
> > > Hope you had a good one ... I need one too!
> > >
> > >>
> > >>
> > >> Am Dienstag, 22. August 2023, 23:49:33 CEST schrieben Sie:
> > >> > Getting this tx_timestamp_timeout error over and over when I try to run ptp4l:
> > >> >
> > >> > ptp4l[1366.143]: selected best master clock 001747.fffe.70151b
> > >> > ptp4l[1366.143]: updating UTC offset to 37
> > >> > ptp4l[1366.143]: port 1: LISTENING to UNCALIBRATED on RS_SLAVE
> > >> > ptp4l[1366.860]: port 1: delay timeout
> > >> > ptp4l[1376.871]: timed out while polling for tx timestamp
> > >> > ptp4l[1376.871]: increasing tx_timestamp_timeout may correct this
> > >> > issue, but it is likely caused by a driver bug
> > >> > ptp4l[1376.871]: port 1: send delay request failed
> > >> >
> > >> > I was using 5.10.69 with Christians patches before they were mainlined
> > >> > and had everything working with the help of Christian, Vladimir and
> > >> > others.
> > >> >
> > >> > Now I need to update kernel so tried 6.3.12 which contains Christians
> > >> > upstream patches and I also back ported v8 of the upstreamed patches
> > >> > to 6.1.38 and I'm getting the same results with that kernel too.
> > >> >
> > >>
> > >> I am also in the process of upgrading to 6.1.38 (but not really tested).
> > >> I cherry-picked all necessary patches from the latest master (see attached
> > >> archive). Maybe you would like to compare this with your patch series.
> > >
> > >
> > > Excellent, I will check it out!  Yeah, we needed to be on a LTS kernel so that's why I'm focusing on 6.1.38 as it's the latest in the yocto/oe universe.
> >
> > So I checked all of your patches for 6.1.38 vs the ones I had.  I had
> > all except 0002 and 0003.  I didn't have all of 0001 but I got a build
> > error on diff_by_scaled_ppm and back ported that function from 6.3.12
> > to make things build.
> >
> > I applied the missing patches I got from you and rebuilt everything
> > and still have the same result with tx_timestamp_timeout.  Which
> > didn't surprise me as I mentioned before I tried 6.3.12 mainline and
> > get same result there too.
> >
> > Regards,
> >
> > Brian
> >
> > >
> > >>
> > >>
> > >> > [...]
> > >> >
> > >> > I tried increasing tx_timestamp and it doesn't appear to matter. I
> > >> > feel like I had this problem before when first starting to work with
> > >> > 5.10.69 but can't remember if another patch resolved it. With 5.10.69
> > >> > I've got quite a few more patches than just the 13 that were mainlined
> > >> > in 6.3. Looking through old emails I want to say it might have been
> > >> > resolved with net-dsa-ksz9477-avoid-PTP-races-with-the-data-path-l.patch
> > >> > that Vladimir gave me but looking at the code it doesn't appear
> > >> > mainline has that one.
> > >>
> > >> How is the IRQ line of you switch attached? I remember there was a problem
> > >> with the IRQ type (edge vs. level), but I think this has already been
> > >> applied to 6.1.38 (via -stable).
> > >
> > >
> > > So that's one of the first things I thought of which is why I provided cat of /proc/interrupts.
> > >
> > > I also do have a /dev/ptp1 (/dev/ptp0 is imx8mm)
> > >
> > > My device tree node is the same as before:
> > >
> > >          i2c_ksz9567: ksz9567@5f {
> > >                compatible = "microchip,ksz9567";
> > >                reg = <0x5f>;
> > >                phy-mode = "rgmii-id";
> > >                status = "okay";
> > >                interrupt-parent = <&gpio1>;
> > >                interrupts = <10 IRQ_TYPE_LEVEL_LOW>;
> > >
> > >                ports {
> > >                        #address-cells = <1>;
> > >                        #size-cells = <0>;
> > >                        port@0 {
> > >                                reg = <0>;
> > >                                label = "lan1";
> > >                        };
> > >                        port@1 {
> > >                                reg = <1>;
> > >                                label = "lan2";
> > >                        };
> > >                        port@6 {
> > >                                reg = <6>;
> > >                                label = "cpu";
> > >                                ethernet = <&fec1>;
> > >                                phy-mode = "rgmii-id";
> > >                                fixed-link {
> > >                                        speed = <100>;
> > >                                        full-duplex;
> > >                                };
> > >                        };
> > >                };
> > >        };
> > >
> > > And I have same pinmux setup as before.  I double checked all of that.
> > >
> > > I noticed new kernel /proc/interrupts now has a bunch of ksz lines in addition to "gpio-mxc  10 Level" which is IRQ from the ksz switch.
> > >
> > > Here is what the old 5.10.69 /proc/interrupts looked like:
> > >
> > > cat /proc/interrupts
> > >           CPU0       CPU1       CPU2       CPU3
> > > 11:      46141        127        127        124     GICv3  30 Level     arch_timer
> > > 14:       5260          0          0          0     GICv3  79 Level     timer@...a0000
> > > 15:          0          0          0          0     GICv3  23 Level     arm-pmu
> > > 20:          0          0          0          0     GICv3 127 Level     sai
> > > 21:          0          0          0          0     GICv3  82 Level     sai
> > > 32:          0          0          0          0     GICv3 110 Level     30280000.watchdog
> > > 33:          0          0          0          0     GICv3 135 Level     sdma
> > > 34:          0          0          0          0     GICv3  66 Level     sdma
> > > 35:          0          0          0          0     GICv3  52 Level     caam-snvs
> > > 36:          0          0          0          0     GICv3  51 Level     rtc alarm
> > > 37:          0          0          0          0     GICv3  36 Level     30370000.snvs:snvs-powerkey
> > > 39:          0          0          0          0     GICv3  64 Level     30830000.spi
> > > 40:       1412          0          0          0     GICv3  59 Level     30890000.serial
> > > 42:      55291          0          0          0     GICv3  67 Level     30a20000.i2c
> > > 43:          0          0          0          0     GICv3  68 Level     30a30000.i2c
> > > 44:          0          0          0          0     GICv3  69 Level     30a40000.i2c
> > > 45:          0          0          0          0     GICv3  70 Level     30a50000.i2c
> > > 47:          0          0          0          0     GICv3  55 Level     mmc1
> > > 48:       3003          0          0          0     GICv3  56 Level     mmc2
> > > 49:       2565          0          0          0     GICv3 139 Level     30bb0000.spi
> > > 50:          0          0          0          0     GICv3  34 Level     sdma
> > > 51:          0          0          0          0     GICv3 150 Level     30be0000.ethernet
> > > 52:          0          0          0          0     GICv3 151 Level     30be0000.ethernet
> > > 53:       1417          0          0          0     GICv3 152 Level     30be0000.ethernet
> > > 54:          0          0          0          0     GICv3 153 Level     30be0000.ethernet
> > > 56:          0          0          0          0     GICv3 130 Level     imx8_ddr_perf_pmu
> > > 60:          0          0          0          0  gpio-mxc   3 Level     bd718xx-irq
> > > 67:         23          0          0          0  gpio-mxc  10 Level     0-005f
> > > 72:          0          0          0          0  gpio-mxc  15 Edge      30b50000.mmc cd
> > > 217:          0          0          0          0  bd718xx-irq   5 Edge      gpio_keys
> > > IPI0:        29         14         13         13       Rescheduling interrupts
> > > IPI1:         0         41         41         41       Function call interrupts
> > > IPI2:         0          0          0          0       CPU stop interrupts
> > > IPI3:         0          0          0          0       CPU stop (for crash dump) interrupts
> > > IPI4:         0          0          0          0       Timer broadcast interrupts
> > > IPI5:      7959          0          0          0       IRQ work interrupts
> > > IPI6:         0          0          0          0       CPU wake-up interrupts
> > > Err:          0
> > >
> > > I'll check out your 6.1.38 changes compared to what I did.
> > >
> > > Thanks,
> > >
> > > Brian
> > >
> > >>
> > >>
>