netdev - Re: Microchip net DSA with ptp4l getting tx_timeout failed msg using 6.3.12 kernel and KSZ9567 switch

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFZh4h_ueji_KucLdPx9PtTQP1g29PbcjNDFGzLBJYpYK8Rt3w@mail.gmail.com>
Date: Thu, 24 Aug 2023 15:03:32 -0400
From: Brian Hutchinson <b.hutchman@...il.com>
To: Christian Eggers <ceggers@...i.de>
Cc: netdev@...r.kernel.org, Vladimir Oltean <OlteanV@...il.com>, arun.ramadoss@...rochip.com, 
	rakesh.sankaranarayanan@...rochip.com
Subject: Re: Microchip net DSA with ptp4l getting tx_timeout failed msg using
 6.3.12 kernel and KSZ9567 switch

Update.  Top posting because I think this is my issue.

I dug further into my problem.  I'm using E2E and it looks like the
mainlined Microchip KSZ DSA PTP code is only supporting P2P.

The 5.10.69 kernel that I was first able to get working with
Christian's early pre-mainlined patches had:
0016-net-dsa-microchip-ksz9477-add-E2E-support.patch

... which gets into the "sticky" bits of why these patches weren't
accepted in the first place due to some Microchip specific
implementation if I recall correctly.

Regards,

Brian


On Thu, Aug 24, 2023 at 2:26 PM Brian Hutchinson <b.hutchman@...il.com> wrote:
>
> Hi Christian,
>
>
> On Wed, Aug 23, 2023 at 9:29 AM Brian Hutchinson <b.hutchman@...il.com> wrote:
> >
> >
> >
> > On Wed, Aug 23, 2023 at 4:22 AM Christian Eggers <ceggers@...i.de> wrote:
> >>
> >> Hi Brian,
> >>
> >> I just return from my holidays...
> >
> >
> > Hope you had a good one ... I need one too!
> >
> >>
> >>
> >> Am Dienstag, 22. August 2023, 23:49:33 CEST schrieben Sie:
> >> > Getting this tx_timestamp_timeout error over and over when I try to run ptp4l:
> >> >
> >> > ptp4l[1366.143]: selected best master clock 001747.fffe.70151b
> >> > ptp4l[1366.143]: updating UTC offset to 37
> >> > ptp4l[1366.143]: port 1: LISTENING to UNCALIBRATED on RS_SLAVE
> >> > ptp4l[1366.860]: port 1: delay timeout
> >> > ptp4l[1376.871]: timed out while polling for tx timestamp
> >> > ptp4l[1376.871]: increasing tx_timestamp_timeout may correct this
> >> > issue, but it is likely caused by a driver bug
> >> > ptp4l[1376.871]: port 1: send delay request failed
> >> >
> >> > I was using 5.10.69 with Christians patches before they were mainlined
> >> > and had everything working with the help of Christian, Vladimir and
> >> > others.
> >> >
> >> > Now I need to update kernel so tried 6.3.12 which contains Christians
> >> > upstream patches and I also back ported v8 of the upstreamed patches
> >> > to 6.1.38 and I'm getting the same results with that kernel too.
> >> >
> >>
> >> I am also in the process of upgrading to 6.1.38 (but not really tested).
> >> I cherry-picked all necessary patches from the latest master (see attached
> >> archive). Maybe you would like to compare this with your patch series.
> >
> >
> > Excellent, I will check it out!  Yeah, we needed to be on a LTS kernel so that's why I'm focusing on 6.1.38 as it's the latest in the yocto/oe universe.
>
> So I checked all of your patches for 6.1.38 vs the ones I had.  I had
> all except 0002 and 0003.  I didn't have all of 0001 but I got a build
> error on diff_by_scaled_ppm and back ported that function from 6.3.12
> to make things build.
>
> I applied the missing patches I got from you and rebuilt everything
> and still have the same result with tx_timestamp_timeout.  Which
> didn't surprise me as I mentioned before I tried 6.3.12 mainline and
> get same result there too.
>
> Regards,
>
> Brian
>
> >
> >>
> >>
> >> > [...]
> >> >
> >> > I tried increasing tx_timestamp and it doesn't appear to matter. I
> >> > feel like I had this problem before when first starting to work with
> >> > 5.10.69 but can't remember if another patch resolved it. With 5.10.69
> >> > I've got quite a few more patches than just the 13 that were mainlined
> >> > in 6.3. Looking through old emails I want to say it might have been
> >> > resolved with net-dsa-ksz9477-avoid-PTP-races-with-the-data-path-l.patch
> >> > that Vladimir gave me but looking at the code it doesn't appear
> >> > mainline has that one.
> >>
> >> How is the IRQ line of you switch attached? I remember there was a problem
> >> with the IRQ type (edge vs. level), but I think this has already been
> >> applied to 6.1.38 (via -stable).
> >
> >
> > So that's one of the first things I thought of which is why I provided cat of /proc/interrupts.
> >
> > I also do have a /dev/ptp1 (/dev/ptp0 is imx8mm)
> >
> > My device tree node is the same as before:
> >
> >          i2c_ksz9567: ksz9567@5f {
> >                compatible = "microchip,ksz9567";
> >                reg = <0x5f>;
> >                phy-mode = "rgmii-id";
> >                status = "okay";
> >                interrupt-parent = <&gpio1>;
> >                interrupts = <10 IRQ_TYPE_LEVEL_LOW>;
> >
> >                ports {
> >                        #address-cells = <1>;
> >                        #size-cells = <0>;
> >                        port@0 {
> >                                reg = <0>;
> >                                label = "lan1";
> >                        };
> >                        port@1 {
> >                                reg = <1>;
> >                                label = "lan2";
> >                        };
> >                        port@6 {
> >                                reg = <6>;
> >                                label = "cpu";
> >                                ethernet = <&fec1>;
> >                                phy-mode = "rgmii-id";
> >                                fixed-link {
> >                                        speed = <100>;
> >                                        full-duplex;
> >                                };
> >                        };
> >                };
> >        };
> >
> > And I have same pinmux setup as before.  I double checked all of that.
> >
> > I noticed new kernel /proc/interrupts now has a bunch of ksz lines in addition to "gpio-mxc  10 Level" which is IRQ from the ksz switch.
> >
> > Here is what the old 5.10.69 /proc/interrupts looked like:
> >
> > cat /proc/interrupts
> >           CPU0       CPU1       CPU2       CPU3
> > 11:      46141        127        127        124     GICv3  30 Level     arch_timer
> > 14:       5260          0          0          0     GICv3  79 Level     timer@...a0000
> > 15:          0          0          0          0     GICv3  23 Level     arm-pmu
> > 20:          0          0          0          0     GICv3 127 Level     sai
> > 21:          0          0          0          0     GICv3  82 Level     sai
> > 32:          0          0          0          0     GICv3 110 Level     30280000.watchdog
> > 33:          0          0          0          0     GICv3 135 Level     sdma
> > 34:          0          0          0          0     GICv3  66 Level     sdma
> > 35:          0          0          0          0     GICv3  52 Level     caam-snvs
> > 36:          0          0          0          0     GICv3  51 Level     rtc alarm
> > 37:          0          0          0          0     GICv3  36 Level     30370000.snvs:snvs-powerkey
> > 39:          0          0          0          0     GICv3  64 Level     30830000.spi
> > 40:       1412          0          0          0     GICv3  59 Level     30890000.serial
> > 42:      55291          0          0          0     GICv3  67 Level     30a20000.i2c
> > 43:          0          0          0          0     GICv3  68 Level     30a30000.i2c
> > 44:          0          0          0          0     GICv3  69 Level     30a40000.i2c
> > 45:          0          0          0          0     GICv3  70 Level     30a50000.i2c
> > 47:          0          0          0          0     GICv3  55 Level     mmc1
> > 48:       3003          0          0          0     GICv3  56 Level     mmc2
> > 49:       2565          0          0          0     GICv3 139 Level     30bb0000.spi
> > 50:          0          0          0          0     GICv3  34 Level     sdma
> > 51:          0          0          0          0     GICv3 150 Level     30be0000.ethernet
> > 52:          0          0          0          0     GICv3 151 Level     30be0000.ethernet
> > 53:       1417          0          0          0     GICv3 152 Level     30be0000.ethernet
> > 54:          0          0          0          0     GICv3 153 Level     30be0000.ethernet
> > 56:          0          0          0          0     GICv3 130 Level     imx8_ddr_perf_pmu
> > 60:          0          0          0          0  gpio-mxc   3 Level     bd718xx-irq
> > 67:         23          0          0          0  gpio-mxc  10 Level     0-005f
> > 72:          0          0          0          0  gpio-mxc  15 Edge      30b50000.mmc cd
> > 217:          0          0          0          0  bd718xx-irq   5 Edge      gpio_keys
> > IPI0:        29         14         13         13       Rescheduling interrupts
> > IPI1:         0         41         41         41       Function call interrupts
> > IPI2:         0          0          0          0       CPU stop interrupts
> > IPI3:         0          0          0          0       CPU stop (for crash dump) interrupts
> > IPI4:         0          0          0          0       Timer broadcast interrupts
> > IPI5:      7959          0          0          0       IRQ work interrupts
> > IPI6:         0          0          0          0       CPU wake-up interrupts
> > Err:          0
> >
> > I'll check out your 6.1.38 changes compared to what I did.
> >
> > Thanks,
> >
> > Brian
> >
> >>
> >>