netdev - Re: igc: missing HW timestamps at TX

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87v8qti3u2.fsf@intel.com>
Date:   Mon, 15 Aug 2022 14:39:33 -0700
From:   Vinicius Costa Gomes <vinicius.gomes@...el.com>
To:     Vladimir Oltean <vladimir.oltean@....com>,
        Ferenc Fejes <ferenc.fejes@...csson.com>
Cc:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "marton12050@...il.com" <marton12050@...il.com>,
        "peti.antal99@...il.com" <peti.antal99@...il.com>
Subject: Re: igc: missing HW timestamps at TX

Hi Vladimir,

Vladimir Oltean <vladimir.oltean@....com> writes:

> Hi Ferenc,
>
> On Fri, Aug 12, 2022 at 02:13:52PM +0000, Ferenc Fejes wrote:
>> Ethtool after the measurement:
>> ethtool -S enp3s0 | grep hwtstamp
>>      tx_hwtstamp_timeouts: 1
>>      tx_hwtstamp_skipped: 419
>>      rx_hwtstamp_cleared: 0
>> 
>> Which is inline with what the isochron see.
>> 
>> But thats only happens if I forcingly put the affinity of the sender
>> different CPU core than the ptp worker of the igc. If those running on
>> the same core I doesnt lost any HW timestam even for 10 million
>> packets. Worth to mention actually I see many lost timestamp which
>> confused me a little bit but those are lost because of the small
>> MSG_ERRQUEUE. When I increased that from few kbytes to 20 mbytes I got
>> every timestamp successfully.
>
> I have zero knowledge of Intel hardware. That being said, I've looked at
> the driver for about 5 minutes, and the design seems to be that where
> the timestamp is not available in band from the TX completion NAPI as
> part of BD ring metadata, but rather, a TX timestamp complete is raised,
> and this results in igc_tsync_interrupt() being called. However there
> are 2 paths in the driver which call this, one is igc_msix_other() and
> the other is igc_intr_msi() - this latter one is also the interrupt that
> triggers the napi_schedule(). It would be interesting to see exactly
> which MSI-X interrupt is the one that triggers igc_tsync_interrupt().

Just some aditional information (note that I know very little about
interrupt internal workings), igc_intr_msi() is called when MSI-X is not
enabled (i.e. "MSI only" system), igc_msix_other() is called when MSI-X
is available. When MSI-X is available, i225/i226 sets up a separate
interrupt handler for "general" events, the TX timestamp being available
to be read from the registers is one those events.

>
> It's also interesting to understand what you mean precisely by affinity
> of isochron. It has a main thread (used for PTP monitoring and for TX
> timestamps) and a pthread for the sending process. The main thread's
> affinity is controlled via taskset; the sender thread via --cpu-mask.
> Is it the *sender* thread the one who makes the TX timestamps be
> available quicker to user space, rather than the main thread, who
> actually dequeues them from the error queue? If so, it might be because
> the TX packets will trigger the TX completion interrupt, and this will
> accelerate the processing of the TX timestamps. I'm unclear what happens
> when the sender thread runs on a different CPU core than the TX
> timestamp thread.
>
> Your need to increase the SO_RCVBUF is also interesting. Keep in mind
> that isochron at that scheduling priority and policy is a CPU hog, and
> that igc_tsync_interrupt() calls schedule_work() - which uses the system
> workqueue that runs at a very low priority (this begs the question, how
> do you know how to match the CPU on which isochron runs with the CPU of
> the system workqueue?). So isochron, high priority, competes for CPU
> time with igc_ptp_tx_work(), low priority. One produces data, one
> consumes it; queues are bound to get full at some point.
> On the other hand, other drivers use the ptp_aux_kworker() that the PTP
> core creates specifically for this purpose. It is a dedicated kthread
> whose scheduling policy and priority can be adjusted using chrt. I think
> it would be interesting to see how things behave when you replace
> schedule_work() with ptp_schedule_worker().

I was planning to do the conversion to use the PTP aux worker thread at
some point, perhaps this is the "excuse" I was looking for.


Cheers,
-- 
Vinicius