netdev - Re: igc: missing HW timestamps at TX

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Fri, 12 Aug 2022 14:13:52 +0000
From:   Ferenc Fejes <ferenc.fejes@...csson.com>
To:     "vinicius.gomes@...el.com" <vinicius.gomes@...el.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>
CC:     "marton12050@...il.com" <marton12050@...il.com>,
        "peti.antal99@...il.com" <peti.antal99@...il.com>,
        "vladimir.oltean@....com" <vladimir.oltean@....com>
Subject: Re: igc: missing HW timestamps at TX

Hi Vinicius!

On Thu, 2022-08-11 at 10:33 -0300, Vinicius Costa Gomes wrote:
> Hi Ferenc,
> 
> > 
> > With iperf TCP test line-rate achiveable just like without the
> > patch.
> > 
> 
> That's very good to know.
> 
> > > > 
> > > > If you are feeling adventurous and feel like helping test it,
> > > > here
> > > > is
> > > > the link:
> > > > 
> > > > https%3A%2F%2Fgithub.com%2Fvcgomes%2Fnet-next%2Ftree%2Figc-
> > > > multiple-tstamp-timers-lock-new
> > > > 
> > 
> > Is there any test in partucular you interested in? My testbed is
> > configured so I can do some.
> > 
> 
> The only thing I am worried about is, if in the "dropped" HW
> timestamps
> case, if all the timestamp slots are indeed full, or if there's any
> bug
> and we missed one timestamp.
> 
> Can you verify that for for every dropped HW timestamp in your
> application, can you see that 'tx_hwtstamp_skipped' (from 'ethtool -
> S')
> increases everytime the drop happens? Seeing if
> 'tx_hwtstamp_timeouts'
> also increases would be useful as well.

Yes, its increasing. Let me illustrate it:

Ethtool before the measurement:
$ ethtool -S enp3s0 | grep hwtstamp
     tx_hwtstamp_timeouts: 1
     tx_hwtstamp_skipped: 409
     rx_hwtstamp_cleared: 0

Measurement:
$ sudo isochron send -i enp3s0 -s 64 -c 0.0000005 --client 10.0.0.20 --
num-frames 10000000 -F isochron.dat --sync-threshold 2000 -M $((1 <<
2)) --sched-fifo --sched-priority 99

(note: isochron would try to send a packet in every 500ns, but the rate
actually limited by the sleep/syscall latency so its sending packets in
about every 15-20us)

Output:
isochron[1660315948.335677744]: local ptpmon         -7 sysmon        -
25 receiver ptpmon          0 sysmon          4
Timed out waiting for TX timestamps, 10 timestamps unacknowledged
seqid 3441 missing timestamps: hw, 
seqid 3442 missing timestamps: hw, 
seqid 3443 missing timestamps: hw, 
seqid 3449 missing timestamps: hw, 
seqid 5530 missing timestamps: hw, 
seqid 5531 missing timestamps: hw, 
seqid 7597 missing timestamps: hw, 
seqid 7598 missing timestamps: hw, 
seqid 7599 missing timestamps: hw, 
seqid 7605 missing timestamps: hw, 


Ethtool after the measurement:
ethtool -S enp3s0 | grep hwtstamp
     tx_hwtstamp_timeouts: 1
     tx_hwtstamp_skipped: 419
     rx_hwtstamp_cleared: 0

Which is inline with what the isochron see.

But thats only happens if I forcingly put the affinity of the sender
different CPU core than the ptp worker of the igc. If those running on
the same core I doesnt lost any HW timestam even for 10 million
packets. Worth to mention actually I see many lost timestamp which
confused me a little bit but those are lost because of the small
MSG_ERRQUEUE. When I increased that from few kbytes to 20 mbytes I got
every timestamp successfully.

> 
> If for every drop there's one 'tx_hwtstamp_skipped' increment, then
> it
> means that the driver is doing its best, and the workload is
> requesting
> more timestamps than the system is able to handle.
> 
> If only 'tx_hwtstamp_timeouts' increases then it's possible that
> there
> could be a bug hiding still.

On the other hand I'm little bit confused with the ETF behavior.
Without HW offload, I lost almost every timestamp even with large (one
packet in every 500 us) sending rate and with HW offload I still lost a
lot. But that migh be beyond the igc, and some config issue on my setup
(I have to apply mqprio and do the PTP sync on default priority and
data packets with SO_TXTIME cmsg sent to ETF at prio 2). Does the
tx_queue affect the timestamping?

CC Vladimir, the author of isochron.

> 
> > > > 
> > > > Cheers,
> > > 
> > > Best,
> > > Ferenc

Best,
Ferenc