lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Date:   Fri, 16 Sep 2022 11:45:56 -0700
From:   Michael Chan <michael.chan@...adcom.com>
To:     Simon White <Simon.White@...visolutions.com>
Cc:     "davem@...emloft.net" <davem@...emloft.net>,
        "richardcochran@...il.com" <richardcochran@...il.com>,
        Stephen Hill <Stephen.Hill@...visolutions.com>,
        Netdev <netdev@...r.kernel.org>,
        Pavan Chebbi <pavan.chebbi@...adcom.com>
Subject: Re: tg3 (5720) PTP sync problems

CC netdev instead of lkml and converting to plain text email

On Fri, Sep 16, 2022 at 8:54 AM Simon White
<Simon.White@...visolutions.com> wrote:
>
> In a running setup PTP sync problems were observed when the server providing the PTP grand master performed other high load network transmissions.  Sync errors ranging in the 10s of milli seconds could be experienced by the PTP slaves.

Thanks for reporting the issue.  One of my colleagues will look into this.

>
>
>
> Simplifying the setup and test conditions to two servers (Dell R7527 dual socket servers with 64 core Milans) utilising iperf, we were able to replicate the problem.  Multiple TX rings were tried, where the PTP traffic only was given its own TX ring and set to use a high priority, however that made no difference.  Examination of the problem led to the following code:
>
>
>
> static void tg3_tx(struct tg3_napi *tnapi)
>
> {
>
> [snip]
>
>                 if (tnapi->tx_ring[sw_idx].len_flags & TXD_FLAG_HWTSTAMP) {
>
>                         struct skb_shared_hwtstamps timestamp;
>
>                         u64 hwclock = tr32(TG3_TX_TSTAMP_LSB);
>
>                         hwclock |= (u64)tr32(TG3_TX_TSTAMP_MSB) << 32;
>
>
>
>                         tg3_hwclock_to_timestamp(tp, hwclock, &timestamp);
>
>
>
>                         skb_tstamp_tx(skb, &timestamp);
>
>                 }
>
>
>
> This assumes that the timestamp will have been updated by the time this descriptor in the tx ring has been marked as consumed.  We observe when the interface is under TX load that this nolonger holds true.  Changing tg3_start_xmit to record the timestamp where TXD_FLAG_HWTSTAMP is set and spinning in the above code to ensure the timestamp had updated appears to address the PTP delay calculation.  A patch covering the change described has been attached for reference but am not suggesting it as the solution to the problem.
>
>
>
> Adding printks to record the spinning loop duration showed it could take around 150us for the timestamp to update after the descriptor was marked as being consumed.  It can be speculated how this could come about from BCM5718 Family Programmer’s Reference Guide (broadcom.com) figure 30 (Transmit Flow Diagram) on page 132, however could it be confirmed whether the assumption the tg3.c code makes is correct?
>
>
>
> Part:
>
>
>
> [   24.311626] tg3 0000:e1:00.0 eth0: Tigon3 [partno(BCM95720) rev 5720000] (PCI Express) MAC address xxxxxx
>
> [   24.311630] tg3 0000:e1:00.0 eth0: attached PHY is 5720C (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1])
>
>
>
> Kind Regards,
>
> Simon White

Download attachment "smime.p7s" of type "application/pkcs7-signature" (4209 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ