lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87ldncoqez.fsf@jax.kurt.home>
Date: Thu, 21 Aug 2025 13:38:44 +0200
From: Kurt Kanzenbach <kurt@...utronix.de>
To: Jacob Keller <jacob.e.keller@...el.com>, Miroslav Lichvar
 <mlichvar@...hat.com>
Cc: Tony Nguyen <anthony.l.nguyen@...el.com>, Przemek Kitszel
 <przemyslaw.kitszel@...el.com>, Andrew Lunn <andrew+netdev@...n.ch>,
 "David S. Miller" <davem@...emloft.net>, Eric Dumazet
 <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, Paolo
 Abeni <pabeni@...hat.com>, Richard Cochran <richardcochran@...il.com>,
 Vinicius Costa Gomes <vinicius.gomes@...el.com>, Sebastian Andrzej Siewior
 <bigeasy@...utronix.de>, intel-wired-lan@...ts.osuosl.org,
 netdev@...r.kernel.org
Subject: Re: [Intel-wired-lan] [PATCH iwl-next] igb: Retrieve Tx timestamp
 directly from interrupt

On Wed Aug 20 2025, Jacob Keller wrote:
> On 8/20/2025 12:56 AM, Miroslav Lichvar wrote:
>> On Tue, Aug 19, 2025 at 04:31:49PM -0700, Jacob Keller wrote:
>>> I'm having trouble interpreting what exactly this data shows, as its
>>> quite a lot of data and numbers. I guess that it is showing when it
>>> switches over to software timestamps.. It would be nice if ntpperf
>>> showed number of events which were software vs hardware timestamping, as
>>> thats likely the culprit. igb hardare only has a single outstanding Tx
>>> timestamp at a time.
>> 
>> The server doesn't have a way to tell the client (ntpperf) which
>> timestamps are HW or SW, we can only guess from the measured offset as
>> HW timestamps should be more accurate, but on the server side the
>> number of SW and HW TX timestamps provided to the client can be
>> monitored with the "chronyc serverstats" command. The server requests
>> both SW and HW TX timestamps and uses the better one it gets from the
>> kernel, if it can actually get one before it receives the next
>> request from the same client (ntpperf simulates up to 16384 concurrent
>> clients).
>> 
>> When I run ntpperf at a fixed rate of 140000 requests per second
>> for 10 seconds (-r 140000 -t 10), I get the following numbers.
>> 
>> Without the patch:
>> NTP daemon TX timestamps   : 28056
>> NTP kernel TX timestamps   : 1012864
>> NTP hardware TX timestamps : 387239
>> 
>> With the patch:
>> NTP daemon TX timestamps   : 28047
>> NTP kernel TX timestamps   : 707674
>> NTP hardware TX timestamps : 692326
>> 
>> The number of HW timestamps is significantly higher with the patch, so
>> that looks good.
>> 
>> But when I increase the rate to 200000, I get this:
>> 
>> Without the patch:
>> NTP daemon TX timestamps   : 35835
>> NTP kernel TX timestamps   : 1410956
>> NTP hardware TX timestamps : 581575            
>> 
>> With the patch:
>> NTP daemon TX timestamps   : 476908
>> NTP kernel TX timestamps   : 646146
>> NTP hardware TX timestamps : 412095
>> 
>
> When does the NTP daemon decide to go with timestamping within the
> daemon vs timestamping in the kernel? It seems odd that we don't achieve
> 100% kernel timestamps...
>
>> With the patch, the server is now dropping requests and can provide
>> a smaller number of HW timestamps and also a smaller number of SW
>> timestamps, i.e. less work is done overall.
>> 
>> Could the explanation be that a single CPU core now needs to do more
>> work, while it was better distributed before?
>> 
>
> Hm. The interrupt vector may be fired on the same CPU maybe? The work
> items can go into the general pool which spreads to all CPUs, and I
> guess the amount of work to submit the timestamp is high enough that we
> do end up costing too much?
>
> Hmm.
>
> We could experiment with using a kthread via the ptp_aux_work setup and
> tuning to ensure that thread has good prioritization? I don't know what
> the best compromise is since its clear the interrupt is best if the
> timestamp volume isn't too high.

Miroslav, can you test the following patch? Does this help?

From 0b795f37fecd235bf4c9965148afaf33e4a5139c Mon Sep 17 00:00:00 2001
From: Kurt Kanzenbach <kurt@...utronix.de>
Date: Wed, 20 Aug 2025 14:01:45 +0200
Subject: [PATCH] igb: Convert Tx timestamping to PTP aux worker

The current implementation uses schedule_work() which is executed by the
system work queue to retrieve Tx timestamps. This increases latency and can
lead to timeouts in case of heavy system load.

Therefore, switch to the PTP aux worker which can be prioritized according
to use case.

Tested on Intel i210.

Signed-off-by: Kurt Kanzenbach <kurt@...utronix.de>
---
 drivers/net/ethernet/intel/igb/igb.h      |  1 -
 drivers/net/ethernet/intel/igb/igb_main.c |  6 +++---
 drivers/net/ethernet/intel/igb/igb_ptp.c  | 26 ++++++++++++-----------
 3 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
index 43401c3c824f..0d2bdde47c32 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -624,7 +624,6 @@ struct igb_adapter {
 	struct ptp_clock *ptp_clock;
 	struct ptp_clock_info ptp_caps;
 	struct delayed_work ptp_overflow_work;
-	struct work_struct ptp_tx_work;
 	struct sk_buff *ptp_tx_skb;
 	struct kernel_hwtstamp_config tstamp_config;
 	unsigned long ptp_tx_start;
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index fe0c0a5caa85..5974180721b1 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -6665,7 +6665,7 @@ netdev_tx_t igb_xmit_frame_ring(struct sk_buff *skb,
 			adapter->ptp_tx_skb = skb_get(skb);
 			adapter->ptp_tx_start = jiffies;
 			if (adapter->hw.mac.type == e1000_82576)
-				schedule_work(&adapter->ptp_tx_work);
+				ptp_schedule_worker(adapter->ptp_clock, 0);
 		} else {
 			adapter->tx_hwtstamp_skipped++;
 		}
@@ -6701,7 +6701,7 @@ netdev_tx_t igb_xmit_frame_ring(struct sk_buff *skb,
 		dev_kfree_skb_any(adapter->ptp_tx_skb);
 		adapter->ptp_tx_skb = NULL;
 		if (adapter->hw.mac.type == e1000_82576)
-			cancel_work_sync(&adapter->ptp_tx_work);
+			ptp_cancel_worker_sync(adapter->ptp_clock);
 		clear_bit_unlock(__IGB_PTP_TX_IN_PROGRESS, &adapter->state);
 	}
 
@@ -7169,7 +7169,7 @@ static void igb_tsync_interrupt(struct igb_adapter *adapter)
 
 	if (tsicr & E1000_TSICR_TXTS) {
 		/* retrieve hardware timestamp */
-		schedule_work(&adapter->ptp_tx_work);
+		ptp_schedule_worker(adapter->ptp_clock, 0);
 	}
 
 	if (tsicr & TSINTR_TT0)
diff --git a/drivers/net/ethernet/intel/igb/igb_ptp.c b/drivers/net/ethernet/intel/igb/igb_ptp.c
index a7876882aeaf..f0b08809cf91 100644
--- a/drivers/net/ethernet/intel/igb/igb_ptp.c
+++ b/drivers/net/ethernet/intel/igb/igb_ptp.c
@@ -803,15 +803,15 @@ static int igb_ptp_verify_pin(struct ptp_clock_info *ptp, unsigned int pin,
  * This work function polls the TSYNCTXCTL valid bit to determine when a
  * timestamp has been taken for the current stored skb.
  **/
-static void igb_ptp_tx_work(struct work_struct *work)
+static long igb_ptp_tx_work(struct ptp_clock_info *ptp)
 {
-	struct igb_adapter *adapter = container_of(work, struct igb_adapter,
-						   ptp_tx_work);
+	struct igb_adapter *adapter = container_of(ptp, struct igb_adapter,
+						   ptp_caps);
 	struct e1000_hw *hw = &adapter->hw;
 	u32 tsynctxctl;
 
 	if (!adapter->ptp_tx_skb)
-		return;
+		return -1;
 
 	if (time_is_before_jiffies(adapter->ptp_tx_start +
 				   IGB_PTP_TX_TIMEOUT)) {
@@ -824,15 +824,17 @@ static void igb_ptp_tx_work(struct work_struct *work)
 		 */
 		rd32(E1000_TXSTMPH);
 		dev_warn(&adapter->pdev->dev, "clearing Tx timestamp hang\n");
-		return;
+		return -1;
 	}
 
 	tsynctxctl = rd32(E1000_TSYNCTXCTL);
-	if (tsynctxctl & E1000_TSYNCTXCTL_VALID)
+	if (tsynctxctl & E1000_TSYNCTXCTL_VALID) {
 		igb_ptp_tx_hwtstamp(adapter);
-	else
-		/* reschedule to check later */
-		schedule_work(&adapter->ptp_tx_work);
+		return -1;
+	}
+
+	/* reschedule to check later */
+	return 1;
 }
 
 static void igb_ptp_overflow_check(struct work_struct *work)
@@ -915,7 +917,7 @@ void igb_ptp_tx_hang(struct igb_adapter *adapter)
 	 * timestamp bit when this occurs.
 	 */
 	if (timeout) {
-		cancel_work_sync(&adapter->ptp_tx_work);
+		ptp_cancel_worker_sync(adapter->ptp_clock);
 		dev_kfree_skb_any(adapter->ptp_tx_skb);
 		adapter->ptp_tx_skb = NULL;
 		clear_bit_unlock(__IGB_PTP_TX_IN_PROGRESS, &adapter->state);
@@ -1381,6 +1383,7 @@ void igb_ptp_init(struct igb_adapter *adapter)
 		return;
 	}
 
+	adapter->ptp_caps.do_aux_work = igb_ptp_tx_work;
 	adapter->ptp_clock = ptp_clock_register(&adapter->ptp_caps,
 						&adapter->pdev->dev);
 	if (IS_ERR(adapter->ptp_clock)) {
@@ -1392,7 +1395,6 @@ void igb_ptp_init(struct igb_adapter *adapter)
 		adapter->ptp_flags |= IGB_PTP_ENABLED;
 
 		spin_lock_init(&adapter->tmreg_lock);
-		INIT_WORK(&adapter->ptp_tx_work, igb_ptp_tx_work);
 
 		if (adapter->ptp_flags & IGB_PTP_OVERFLOW_CHECK)
 			INIT_DELAYED_WORK(&adapter->ptp_overflow_work,
@@ -1437,7 +1439,7 @@ void igb_ptp_suspend(struct igb_adapter *adapter)
 	if (adapter->ptp_flags & IGB_PTP_OVERFLOW_CHECK)
 		cancel_delayed_work_sync(&adapter->ptp_overflow_work);
 
-	cancel_work_sync(&adapter->ptp_tx_work);
+	ptp_cancel_worker_sync(adapter->ptp_clock);
 	if (adapter->ptp_tx_skb) {
 		dev_kfree_skb_any(adapter->ptp_tx_skb);
 		adapter->ptp_tx_skb = NULL;
-- 
2.47.2


Download attachment "signature.asc" of type "application/pgp-signature" (862 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ