netdev - Re: [PATCH] tcp: use linear buffer for small frames

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CANn89i+-v3Lzi851UKNWie8242y=760f-fiVELjPwSHduLyf5Q@mail.gmail.com>
Date:   Tue, 30 Aug 2022 07:03:11 -0700
From:   Eric Dumazet <edumazet@...gle.com>
To:     Zhen Chen <chenzhen126@...wei.com>
Cc:     David Miller <davem@...emloft.net>,
        Alexey Kuznetsov <kuznet@....inr.ac.ru>,
        Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
        netdev <netdev@...r.kernel.org>, yanan@...wei.com,
        Caowangbao <caowangbao@...wei.com>
Subject: Re: [PATCH] tcp: use linear buffer for small frames

On Tue, Aug 30, 2022 at 5:37 AM Zhen Chen <chenzhen126@...wei.com> wrote:
>
> 472c2e07eef0 ("tcp: add one skb cache for tx") and related patches added a
> machanism to relax slab layer in tcp stack, by caching one skb per socket.
> The feature is disabled by default and the patch also dropped linear payload
> for small frames, which caused about 5% of performance regression for small
> packets because nic drivers would bother to deal with fraglist than before.

I do not think it is true. Which driver exhibits a 5% penalty exactly ?

I decided to not bring back this feature, and instead make TCP stack
less complex.

We want instead to have all TCP payload in page frags, there is still
a part to rewrite (MTU probing),
and maybe retransmit aggregation.

>
> As d8b81175e412 ("tcp: remove sk_{tr}x_skb_cache") reverted the whole
> machanism but skipped the linear part, just make the revert complete.
>
> Signed-off-by: Zhen Chen <chenzhen126@...wei.com>
> ---
>  net/ipv4/tcp.c | 28 +++++++++++++++++++++++++++-
>  1 file changed, 27 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index e5011c136fdb..0b6010051598 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -1154,6 +1154,30 @@ int tcp_sendpage(struct sock *sk, struct page *page, int offset,
>  }
>  EXPORT_SYMBOL(tcp_sendpage);
>
> +/* Do not bother using a page frag for very small frames.
> + * But use this heuristic only for the first skb in write queue.
> + *
> + * Having no payload in skb->head allows better SACK shifting
> + * in tcp_shift_skb_data(), reducing sack/rack overhead, because
> + * write queue has less skbs.
> + * Each skb can hold up to MAX_SKB_FRAGS * 32Kbytes, or ~0.5 MB.
> + * This also speeds up tso_fragment(), since it wont fallback
> + * to tcp_fragment().
> + */
> +static int linear_payload_sz(bool first_skb)
> +{
> +       if (first_skb)
> +               return SKB_WITH_OVERHEAD(2048 - MAX_TCP_HEADER);
> +       return 0;
> +}
> +
> +static int select_size(bool first_skb, bool zc)
> +{
> +       if (zc)
> +               return 0;
> +       return linear_payload_sz(first_skb);
> +}
> +
>  void tcp_free_fastopen_req(struct tcp_sock *tp)
>  {
>         if (tp->fastopen_req) {
> @@ -1311,6 +1335,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
>
>                 if (copy <= 0 || !tcp_skb_can_collapse_to(skb)) {
>                         bool first_skb;
> +                       int linear;
>
>  new_segment:
>                         if (!sk_stream_memory_free(sk))
> @@ -1322,7 +1347,8 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
>                                         goto restart;
>                         }
>                         first_skb = tcp_rtx_and_write_queues_empty(sk);
> -                       skb = tcp_stream_alloc_skb(sk, 0, sk->sk_allocation,
> +                       linear = select_size(first_skb, zc);
> +                       skb = tcp_stream_alloc_skb(sk, linear, sk->sk_allocation,
>                                                    first_skb);
>                         if (!skb)
>                                 goto wait_for_space;
> --
> 2.23.0
>