[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAD56B7fnZQ2YmrVLCgLMT8UwO-0dpnTbh3rz_qA1nh_RsEgbtg@mail.gmail.com>
Date: Fri, 8 Mar 2019 13:07:12 -0500
From: Paul Thomas <pthomas8589@...il.com>
To: Harini Katakam <harinik@...inx.com>
Cc: Richard Cochran <richardcochran@...il.com>,
"linuxptp-devel@...ts.sourceforge.net"
<linuxptp-devel@...ts.sourceforge.net>, netdev@...r.kernel.org
Subject: Re: [Linuxptp-devel] strangeness
Hi Harini,
On Fri, Mar 8, 2019 at 1:08 AM Harini Katakam <harinik@...inx.com> wrote:
>
> Hi Paul,
> On Fri, Mar 8, 2019 at 12:33 AM Paul Thomas <pthomas8589@...il.com> wrote:
> >
> > On Thu, Mar 7, 2019 at 12:32 AM Harini Katakam <harinik@...inx.com> wrote:
> > >
> > > Hi Paul,
> > > On Thu, Mar 7, 2019 at 4:38 AM Paul Thomas <pthomas8589@...il.com> wrote:
> > > >
> > > > On Fri, Mar 1, 2019 at 1:24 AM Harini Katakam <harinik@...inx.com> wrote:
> > > > >
> > > > > +netdev
> > > > >
> > > > > Hi Paul,
> > > > > On Fri, Mar 1, 2019 at 12:29 AM Richard Cochran
> > > > > <richardcochran@...il.com> wrote:
> > > > > >
> > > > > > On Thu, Feb 28, 2019 at 12:33:26PM -0500, Paul Thomas wrote:
> > > > > > > Yes changing it to TSTAMP_ALL_PTP_FRAMES instead of TSTAMP_ALL_FRAMES
> > > > > > > does seem to fix the ssh issue. My worry is that there is still a bug
> > > > > > > somewhere in the network stack that this is just masking.
> > > > >
> > > > > Ok thanks.
> > > > > One place to check in the driver will be:
> > > > > if (gem_ptp_do_txstamp(queue, skb, desc) == 0) {
> > > > > /* skb now belongs to timestamp buffer
> > > > > * and will be removed later
> > > > > */
> > > > > tx_skb->skb = NULL;
> > > > > }
> > > > > When all TX packets are timestamped, the skb always belongs to the
> > > > > timestamp buffer.
> > > > >
> > > > > >
> > > > > > Or the HW isn't sending the frames in the first place.
> > > > > >
> > > > > > Check that first!
> > > > >
> > > > > To check this, the statistics registers in MAC will be one way.
> > > > > But if there is no TX completion interrupt, then I wouldn't expect
> > > > > these statistics to increase either. The used bit status in BD dump
> > > > > might be of more use.
> > > > >
> > > > > I will also try to reproduce (with TX timestamp ALL) and see if any of
> > > > > the above gives some clue.
> > > > >
> > > > > Regards,
> > > > > Harini
> > > >
> > > > Hi Harini, any luck looking at this?
> > >
> > > I'm sorry, I was not able to debug this further.
> > >
> > > >
> > > > I didn't get very far, even in the "broken" state I see plenty of tx_frames:
> > > > root@xu5:/opt/linuxptp# ethtool -S eth0
> > > > NIC statistics:
> > > > ...
> > > > tx_frames: 39763
> > > > ...
> > > >
> > > > When you said "registers in the MAC" is ethtool -S displaying that?
> > >
> > > Yes, ethtool does display these statistics.
> > > I was referring to the registers starting offset 0xFF0B0108 (for GEM0) here:
> > > https://www.xilinx.com/html_docs/registers/ug1087/ug1087-zynq-ultrascale-registers.html
> > > If you see this value increasing, then the MAC is transmitting successfully.
> > > Although, I realize it could be other traffic. To see if specific
> > > packets (for the
> > > failed SSH connection) are not being queued, a BD dump might help.
> > >
> > > Regards,
> > > Harini
> >
> > OK, I think things are becoming more clear. After just doing ioctl(fd,
> > SIOCSHWTSTAMP, &ifreq) from userspace (tx_bd_control =
> > TSTAMP_ALL_FRAMES in macb_ptp.c) then with the nc experiment some udp
> > transmits do not make it to macb_start_xmit() until receive traffic on
> > the nc connection comes in (one-to-one, one new rx packet means one
> > old tx packet goes out).
>
> Could you please share any wireshark log or dump for what is being
> received here?
Here are two wireshark captures, the thing to note in the bad one is
that packets No. 5, 7, 9 from .102 to .103 were actually sent just
after packet No. 2 but they don't show up on the wire until the
packets the other way (one for one).
>
> >
> > Working setup:
> > Before the tx_bd_control = TSTAMP_ALL_FRAMES.
> > Every time I hit "sN Enter" from nc I see a macb_start_xmit
> > print_hex_dump() and I see the packet on the nc client side:
> > # nc -l -u -p 9999
> > ...
> > s11
> > [ 347.517080] macb_start_xmit data: 00000000: 20 b0 f7 04 0a 29 20 b0
> > f7 04 0a 26 08 00 45 00 ....) ....&..E.
> > s12
> > [ 348.964369] macb_start_xmit data: 00000000: 20 b0 f7 04 0a 29 20 b0
> > f7 04 0a 26 08 00 45 00 ....) ....&..E.
> > ...
> >
> > Broken setup:
> > After the tx_bd_control = TSTAMP_ALL_FRAMES.
> > Not the first nc packet, but many of the subsequent ones never make it
> > to macb_start_xmit()
> > # nc -l -u -p 9999
> > ...
> > s3
> > s4
> > s5
> > ...
> > Eventually after I send data from the client nc I do see the
> > macb_start_xmit() lines.
>
> Thanks for this debug. If macb_start_xmit is never called, one of
> the preceeding checks (such as if skb is present or if the TX queues
> are off etc)
> should fail. I'm still tracing this but I'm not sure under what
> circumstances only
> some UDP packets will be prevented from being transmitted.
In this specific test the first tx packets always goes through, and
the subsequent ones don't until rx packets. So it's not random when
they go through, I could have been clearer about that.
> Just to be sure, could you please confirm you are not seeing any
> "buffer exhausted" messaged from TX error tasks?
Correct, I'm not seeing any "buffer exhausted" errors.
thanks,
Paul
Download attachment "badcapt20190308a.pcap" of type "application/vnd.tcpdump.pcap" (695 bytes)
Download attachment "goodcapt20190308a.pcap" of type "application/vnd.tcpdump.pcap" (313 bytes)
Powered by blists - more mailing lists