netdev - Re: [Linuxptp-devel] strangeness

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAFcVECLKxeX-1SxszUZaWPp1u0gaYL2as6ZNa-2xzvrq83yTSg@mail.gmail.com>
Date:   Fri, 8 Mar 2019 11:38:10 +0530
From:   Harini Katakam <harinik@...inx.com>
To:     Paul Thomas <pthomas8589@...il.com>
Cc:     Richard Cochran <richardcochran@...il.com>,
        "linuxptp-devel@...ts.sourceforge.net" 
        <linuxptp-devel@...ts.sourceforge.net>, netdev@...r.kernel.org
Subject: Re: [Linuxptp-devel] strangeness

Hi Paul,
On Fri, Mar 8, 2019 at 12:33 AM Paul Thomas <pthomas8589@...il.com> wrote:
>
> On Thu, Mar 7, 2019 at 12:32 AM Harini Katakam <harinik@...inx.com> wrote:
> >
> > Hi Paul,
> > On Thu, Mar 7, 2019 at 4:38 AM Paul Thomas <pthomas8589@...il.com> wrote:
> > >
> > > On Fri, Mar 1, 2019 at 1:24 AM Harini Katakam <harinik@...inx.com> wrote:
> > > >
> > > > +netdev
> > > >
> > > > Hi Paul,
> > > > On Fri, Mar 1, 2019 at 12:29 AM Richard Cochran
> > > > <richardcochran@...il.com> wrote:
> > > > >
> > > > > On Thu, Feb 28, 2019 at 12:33:26PM -0500, Paul Thomas wrote:
> > > > > > Yes changing it to TSTAMP_ALL_PTP_FRAMES instead of TSTAMP_ALL_FRAMES
> > > > > > does seem to fix the ssh issue. My worry is that there is still a bug
> > > > > > somewhere in the network stack that this is just masking.
> > > >
> > > > Ok thanks.
> > > > One place to check in the driver will be:
> > > > if (gem_ptp_do_txstamp(queue, skb, desc) == 0) {
> > > > /* skb now belongs to timestamp buffer
> > > > * and will be removed later
> > > > */
> > > > tx_skb->skb = NULL;
> > > > }
> > > > When all TX packets are timestamped, the skb always belongs to the
> > > > timestamp buffer.
> > > >
> > > > >
> > > > > Or the HW isn't sending the frames in the first place.
> > > > >
> > > > > Check that first!
> > > >
> > > > To check this, the statistics registers in MAC will be one way.
> > > > But if there is no TX completion interrupt, then I wouldn't expect
> > > > these statistics to increase either. The used bit status in BD dump
> > > > might be of more use.
> > > >
> > > > I will also try to reproduce (with TX timestamp ALL) and see if any of
> > > > the above gives some clue.
> > > >
> > > > Regards,
> > > > Harini
> > >
> > > Hi Harini, any luck looking at this?
> >
> > I'm sorry, I was not able to debug this further.
> >
> > >
> > > I didn't get very far, even in the "broken" state I see plenty of tx_frames:
> > > root@xu5:/opt/linuxptp# ethtool -S eth0
> > > NIC statistics:
> > >      ...
> > >      tx_frames: 39763
> > >      ...
> > >
> > > When you said "registers in the MAC" is ethtool -S displaying that?
> >
> > Yes, ethtool does display these statistics.
> > I was referring to the registers starting offset 0xFF0B0108 (for GEM0) here:
> > https://www.xilinx.com/html_docs/registers/ug1087/ug1087-zynq-ultrascale-registers.html
> > If you see this value increasing, then the MAC is transmitting successfully.
> > Although, I realize it could be other traffic. To see if specific
> > packets (for the
> > failed SSH connection) are not being queued, a BD dump might help.
> >
> > Regards,
> > Harini
>
> OK, I think things are becoming more clear. After just doing ioctl(fd,
> SIOCSHWTSTAMP, &ifreq) from userspace (tx_bd_control =
> TSTAMP_ALL_FRAMES in macb_ptp.c) then with the nc experiment some udp
> transmits do not make it to macb_start_xmit() until receive traffic on
> the nc connection comes in (one-to-one, one new rx packet means one
> old tx packet goes out).

Could you please share any wireshark log or dump for what is being
received here?

>
> Working setup:
> Before the tx_bd_control = TSTAMP_ALL_FRAMES.
> Every time I hit "sN Enter" from nc I see a macb_start_xmit
> print_hex_dump() and I see the packet on the nc client side:
> # nc -l -u -p 9999
> ...
> s11
> [  347.517080] macb_start_xmit data: 00000000: 20 b0 f7 04 0a 29 20 b0
> f7 04 0a 26 08 00 45 00   ....) ....&..E.
> s12
> [  348.964369] macb_start_xmit data: 00000000: 20 b0 f7 04 0a 29 20 b0
> f7 04 0a 26 08 00 45 00   ....) ....&..E.
> ...
>
> Broken setup:
> After the tx_bd_control = TSTAMP_ALL_FRAMES.
> Not the first nc packet, but many of the subsequent ones never make it
> to macb_start_xmit()
> # nc -l -u -p 9999
> ...
> s3
> s4
> s5
> ...
> Eventually after I send data from the client nc I do see the
> macb_start_xmit() lines.

Thanks for this debug. If macb_start_xmit is never called, one of
the preceeding checks (such as if skb is present or if the TX queues
are off  etc)
should fail. I'm still tracing this but I'm not sure under what
circumstances only
some UDP packets will be prevented from being transmitted.
Just to be sure, could you please confirm you are not seeing any
"buffer exhausted" messaged from TX error tasks?

Regards,
Harini