lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20251024120707.3516550-1-edumazet@google.com>
Date: Fri, 24 Oct 2025 12:07:07 +0000
From: Eric Dumazet <edumazet@...gle.com>
To: "David S . Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>, 
	Paolo Abeni <pabeni@...hat.com>
Cc: Simon Horman <horms@...nel.org>, Neal Cardwell <ncardwell@...gle.com>, 
	Kuniyuki Iwashima <kuniyu@...gle.com>, netdev@...r.kernel.org, eric.dumazet@...il.com, 
	Eric Dumazet <edumazet@...gle.com>
Subject: [PATCH net-next] tcp: remove one ktime_get() from recvmsg() fast path

Each time some payload is consumed by user space (recvmsg() and friends),
TCP calls tcp_rcv_space_adjust() to run DRS algorithm to check
if an increase of sk->sk_rcvbuf is needed.

This function is based on time sampling, and currently calls
tcp_mstamp_refresh(tp), which is a wrapper around ktime_get_ns().

ktime_get_ns() has a high cost on some platforms.
100+ cycles for rdtscp on AMD EPYC Turin for instance.

We do not have to refresh tp->tcp_mpstamp, using the last cached value
is enough. We only need to refresh it from __tcp_cleanup_rbuf()
if an ACK must be sent (this is a rare event).

Signed-off-by: Eric Dumazet <edumazet@...gle.com>
---
 net/ipv4/tcp.c       |  4 +++-
 net/ipv4/tcp_input.c | 10 ++++++++--
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index b79da6d39392751e189f1f65969b15c904a6792a..a9345aa5a2e5f4a2ca7ca599e7523d017ffa64ee 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1556,8 +1556,10 @@ void __tcp_cleanup_rbuf(struct sock *sk, int copied)
 				time_to_ack = true;
 		}
 	}
-	if (time_to_ack)
+	if (time_to_ack) {
+		tcp_mstamp_refresh(tp);
 		tcp_send_ack(sk);
+	}
 }
 
 void tcp_cleanup_rbuf(struct sock *sk, int copied)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 8fc97f4d8a6b2f8e39cabf6c9b3e6cdae294a5f5..ff19f6e54d55cb63f04c2da0b241e3d7d2f946a0 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -928,9 +928,15 @@ void tcp_rcv_space_adjust(struct sock *sk)
 
 	trace_tcp_rcv_space_adjust(sk);
 
-	tcp_mstamp_refresh(tp);
+	if (unlikely(!tp->rcv_rtt_est.rtt_us))
+		return;
+
+	/* We do not refresh tp->tcp_mstamp here.
+	 * Some platforms have expensive ktime_get() implementations.
+	 * Using the last cached value is enough for DRS.
+	 */
 	time = tcp_stamp_us_delta(tp->tcp_mstamp, tp->rcvq_space.time);
-	if (time < (tp->rcv_rtt_est.rtt_us >> 3) || tp->rcv_rtt_est.rtt_us == 0)
+	if (time < (tp->rcv_rtt_est.rtt_us >> 3))
 		return;
 
 	/* Number of bytes copied to user in last RTT */
-- 
2.51.1.821.gb6fe4d2222-goog


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ