netdev - Re: igc: missing HW timestamps at TX

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1016fb1e514ff38ebfd22c52e2d848a7e8bc8d1a.camel@ericsson.com>
Date:   Mon, 15 Aug 2022 06:47:31 +0000
From:   Ferenc Fejes <ferenc.fejes@...csson.com>
To:     "vladimir.oltean@....com" <vladimir.oltean@....com>
CC:     "vinicius.gomes@...el.com" <vinicius.gomes@...el.com>,
        "marton12050@...il.com" <marton12050@...il.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "peti.antal99@...il.com" <peti.antal99@...il.com>
Subject: Re: igc: missing HW timestamps at TX

Hi Vladimir!

Thank you for the reply!

On Fri, 2022-08-12 at 20:16 +0000, Vladimir Oltean wrote:
> Hi Ferenc,
> 
> On Fri, Aug 12, 2022 at 02:13:52PM +0000, Ferenc Fejes wrote:
> > Ethtool after the measurement:
> > ethtool -S enp3s0 | grep hwtstamp
> >      tx_hwtstamp_timeouts: 1
> >      tx_hwtstamp_skipped: 419
> >      rx_hwtstamp_cleared: 0
> > 
> > Which is inline with what the isochron see.
> > 
> > But thats only happens if I forcingly put the affinity of the
> > sender
> > different CPU core than the ptp worker of the igc. If those running
> > on
> > the same core I doesnt lost any HW timestam even for 10 million
> > packets. Worth to mention actually I see many lost timestamp which
> > confused me a little bit but those are lost because of the small
> > MSG_ERRQUEUE. When I increased that from few kbytes to 20 mbytes I
> > got
> > every timestamp successfully.
> 
> I have zero knowledge of Intel hardware. That being said, I've looked
> at
> the driver for about 5 minutes, and the design seems to be that where
> the timestamp is not available in band from the TX completion NAPI as
> part of BD ring metadata, but rather, a TX timestamp complete is
> raised,
> and this results in igc_tsync_interrupt() being called. However there
> are 2 paths in the driver which call this, one is igc_msix_other()
> and
> the other is igc_intr_msi() - this latter one is also the interrupt
> that
> triggers the napi_schedule(). It would be interesting to see exactly
> which MSI-X interrupt is the one that triggers igc_tsync_interrupt().
> 
> It's also interesting to understand what you mean precisely by
> affinity
> of isochron. It has a main thread (used for PTP monitoring and for TX
> timestamps) and a pthread for the sending process. The main thread's
> affinity is controlled via taskset; the sender thread via --cpu-mask.

I just played with those a little. Looks like the --cpu-mask the one it
helps in my case. For example I checked the CPU core of the
igc_ptp_tx_work:

# bpftrace -e 'kprobe:igc_ptp_tx_work { printf("%d\n", cpu); exit(); }'
Attaching 1 probe...
0

Looks like its running on core 0, so I run the isochro:
taskset -c 0 isochron ... --cpu-mask $((1 << 0)) - no lost timestamps
taskset -c 1 isochron ... --cpu-mask $((1 << 0)) - no lost timestamps
taskset -c 0 isochron ... --cpu-mask $((1 << 1)) - losing timestamps
taskset -c 1 isochron ... --cpu-mask $((1 << 1)) - losing timestamps

> Is it the *sender* thread the one who makes the TX timestamps be
> available quicker to user space, rather than the main thread, who
> actually dequeues them from the error queue? If so, it might be
> because
> the TX packets will trigger the TX completion interrupt, and this
> will
> accelerate the processing of the TX timestamps. I'm unclear what
> happens
> when the sender thread runs on a different CPU core than the TX
> timestamp thread.

Well I have no clue unfortunately but your theory makes sense. Vinicius
might help us out here.

> 
> Your need to increase the SO_RCVBUF is also interesting. Keep in mind
> that isochron at that scheduling priority and policy is a CPU hog,
> and
> that igc_tsync_interrupt() calls schedule_work() - which uses the
> system
> workqueue that runs at a very low priority (this begs the question,
> how
> do you know how to match the CPU on which isochron runs with the CPU
> of
> the system workqueue?). So isochron, high priority, competes for CPU
> time with igc_ptp_tx_work(), low priority. One produces data, one
> consumes it; queues are bound to get full at some point.

Maybe this is what helps in my case? With funccount tracer I checked
that when the sender thread and igc_ptp_tx_work running on the same
core, the worker called exactly as many times as many packets I sent.

However if the worker running on different core, funccount show some
random number (less than the packets sent) and in that case I also lost
timestamps.

I'm not sure what happening here, maybe the "deferred" scheduling of
the worker sometimes too slow to enqueue every timestamp into the error
queue? And because I force both the sender and worker to the same core,
they executed in order (my system pretty much idle other than these
processes) introducing some sort of throtthling to the timestamp
processing?

> On the other hand, other drivers use the ptp_aux_kworker() that the
> PTP
> core creates specifically for this purpose. It is a dedicated kthread
> whose scheduling policy and priority can be adjusted using chrt. I
> think
> it would be interesting to see how things behave when you replace
> schedule_work() with ptp_schedule_worker().

I will try to take a look into that. Anyway, thank you for the
insights, I'm happy with the way how it works now (at least I can do my
experiments with that).

Best,
Ferenc