netdev - Re: igc: missing HW timestamps at TX

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20220812201654.qx7e37otu32pxnbk@skbuf>
Date:   Fri, 12 Aug 2022 20:16:54 +0000
From:   Vladimir Oltean <vladimir.oltean@....com>
To:     Ferenc Fejes <ferenc.fejes@...csson.com>
CC:     "vinicius.gomes@...el.com" <vinicius.gomes@...el.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "marton12050@...il.com" <marton12050@...il.com>,
        "peti.antal99@...il.com" <peti.antal99@...il.com>
Subject: Re: igc: missing HW timestamps at TX

Hi Ferenc,

On Fri, Aug 12, 2022 at 02:13:52PM +0000, Ferenc Fejes wrote:
> Ethtool after the measurement:
> ethtool -S enp3s0 | grep hwtstamp
>      tx_hwtstamp_timeouts: 1
>      tx_hwtstamp_skipped: 419
>      rx_hwtstamp_cleared: 0
> 
> Which is inline with what the isochron see.
> 
> But thats only happens if I forcingly put the affinity of the sender
> different CPU core than the ptp worker of the igc. If those running on
> the same core I doesnt lost any HW timestam even for 10 million
> packets. Worth to mention actually I see many lost timestamp which
> confused me a little bit but those are lost because of the small
> MSG_ERRQUEUE. When I increased that from few kbytes to 20 mbytes I got
> every timestamp successfully.

I have zero knowledge of Intel hardware. That being said, I've looked at
the driver for about 5 minutes, and the design seems to be that where
the timestamp is not available in band from the TX completion NAPI as
part of BD ring metadata, but rather, a TX timestamp complete is raised,
and this results in igc_tsync_interrupt() being called. However there
are 2 paths in the driver which call this, one is igc_msix_other() and
the other is igc_intr_msi() - this latter one is also the interrupt that
triggers the napi_schedule(). It would be interesting to see exactly
which MSI-X interrupt is the one that triggers igc_tsync_interrupt().

It's also interesting to understand what you mean precisely by affinity
of isochron. It has a main thread (used for PTP monitoring and for TX
timestamps) and a pthread for the sending process. The main thread's
affinity is controlled via taskset; the sender thread via --cpu-mask.
Is it the *sender* thread the one who makes the TX timestamps be
available quicker to user space, rather than the main thread, who
actually dequeues them from the error queue? If so, it might be because
the TX packets will trigger the TX completion interrupt, and this will
accelerate the processing of the TX timestamps. I'm unclear what happens
when the sender thread runs on a different CPU core than the TX
timestamp thread.

Your need to increase the SO_RCVBUF is also interesting. Keep in mind
that isochron at that scheduling priority and policy is a CPU hog, and
that igc_tsync_interrupt() calls schedule_work() - which uses the system
workqueue that runs at a very low priority (this begs the question, how
do you know how to match the CPU on which isochron runs with the CPU of
the system workqueue?). So isochron, high priority, competes for CPU
time with igc_ptp_tx_work(), low priority. One produces data, one
consumes it; queues are bound to get full at some point.
On the other hand, other drivers use the ptp_aux_kworker() that the PTP
core creates specifically for this purpose. It is a dedicated kthread
whose scheduling policy and priority can be adjusted using chrt. I think
it would be interesting to see how things behave when you replace
schedule_work() with ptp_schedule_worker().