lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADVnQykt=1rFCBJgSu1b2sm4VQ3t=gdwZ=7cPXMFJ245dhAm4A@mail.gmail.com>
Date: Thu, 20 Mar 2025 10:43:46 -0400
From: Neal Cardwell <ncardwell@...gle.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: "David S . Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>, 
	Paolo Abeni <pabeni@...hat.com>, Kuniyuki Iwashima <kuniyu@...zon.com>, Simon Horman <horms@...nel.org>, 
	netdev@...r.kernel.org, eric.dumazet@...il.com
Subject: Re: [PATCH net-next] tcp: avoid atomic operations on sk->sk_rmem_alloc

On Thu, Mar 20, 2025 at 8:16 AM Eric Dumazet <edumazet@...gle.com> wrote:
>
> TCP uses generic skb_set_owner_r() and sock_rfree()
> for received packets, with socket lock being owned.
>
> Switch to private versions, avoiding two atomic operations
> per packet.
>
> Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> ---
>  include/net/tcp.h       | 15 +++++++++++++++
>  net/ipv4/tcp.c          | 18 ++++++++++++++++--
>  net/ipv4/tcp_fastopen.c |  2 +-
>  net/ipv4/tcp_input.c    |  6 +++---
>  4 files changed, 35 insertions(+), 6 deletions(-)
>
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index d08fbf90495de69b157d3c87c50e82d781a365df..dd6d63a6f42b99774e9461b69d3e7932cf629082 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h

Very nice. Thanks!

Reviewed-by: Neal Cardwell <ncardwell@...gle.com>

A couple quick thoughts:

> @@ -779,6 +779,7 @@ static inline int tcp_bound_to_half_wnd(struct tcp_sock *tp, int pktsize)
>
>  /* tcp.c */
>  void tcp_get_info(struct sock *, struct tcp_info *);
> +void tcp_sock_rfree(struct sk_buff *skb);
>
>  /* Read 'sendfile()'-style from a TCP socket */
>  int tcp_read_sock(struct sock *sk, read_descriptor_t *desc,
> @@ -2898,4 +2899,18 @@ enum skb_drop_reason tcp_inbound_hash(struct sock *sk,
>                 const void *saddr, const void *daddr,
>                 int family, int dif, int sdif);
>
> +/* version of skb_set_owner_r() avoiding one atomic_add() */
> +static inline void tcp_skb_set_owner_r(struct sk_buff *skb, struct sock *sk)
> +{
> +       skb_orphan(skb);
> +       skb->sk = sk;
> +       skb->destructor = tcp_sock_rfree;
> +
> +       sock_owned_by_me(sk);
> +       atomic_set(&sk->sk_rmem_alloc,
> +                  atomic_read(&sk->sk_rmem_alloc) + skb->truesize);
> +
> +       sk_forward_alloc_add(sk, -skb->truesize);
> +}
> +
>  #endif /* _TCP_H */
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 989c3c3d8e757361a0ac4a9f039a3cfca10d9612..b1306038b8e6e8c55fd1b4803c5d8ca626491aae 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -1525,11 +1525,25 @@ void tcp_cleanup_rbuf(struct sock *sk, int copied)
>         __tcp_cleanup_rbuf(sk, copied);
>  }
>
> +/* private version of sock_rfree() avoiding one atomic_sub() */
> +void tcp_sock_rfree(struct sk_buff *skb)
> +{
> +       struct sock *sk = skb->sk;
> +       unsigned int len = skb->truesize;
> +
> +       sock_owned_by_me(sk);
> +       atomic_set(&sk->sk_rmem_alloc,
> +                  atomic_read(&sk->sk_rmem_alloc) - len);
> +
> +       sk_forward_alloc_add(sk, len);
> +       sk_mem_reclaim(sk);

One thought on readability: it might be nice to make these functions
both use skb->truesize rather than having one use skb->truesize and
one use len (particularly since "len" in the skb context often refers
to the payload length). I realize the "len" helper variable was
inherited from sock_rfree() but it might be nice to make the TCP
versions easier to read and audit?

Also, it might be nice to have the comments above
tcp_skb_set_owner_r() and tcp_sock_rfree() reference the other
function, so maintainers can be reminded of the fact that the
arithmetic in the two functions needs to be kept exactly in sync?
Perhaps something like:

/* A version of skb_set_owner_r() avoiding one atomic_add().
 * These adjustments are later inverted by tcp_sock_rfree().
 */
static inline void tcp_skb_set_owner_r(struct sk_buff *skb, struct sock *sk)
...

/* A private version of sock_rfree() avoiding one atomic_sub().
 * Inverts the earlier adjustments made by tcp_skb_set_owner_r().
 */
void tcp_sock_rfree(struct sk_buff *skb)

Anyway, the patches LGTM. Those were just some thoughts. :-)

Thanks!

neal

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ