[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CANn89i+-v3Lzi851UKNWie8242y=760f-fiVELjPwSHduLyf5Q@mail.gmail.com>
Date: Tue, 30 Aug 2022 07:03:11 -0700
From: Eric Dumazet <edumazet@...gle.com>
To: Zhen Chen <chenzhen126@...wei.com>
Cc: David Miller <davem@...emloft.net>,
Alexey Kuznetsov <kuznet@....inr.ac.ru>,
Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
netdev <netdev@...r.kernel.org>, yanan@...wei.com,
Caowangbao <caowangbao@...wei.com>
Subject: Re: [PATCH] tcp: use linear buffer for small frames
On Tue, Aug 30, 2022 at 5:37 AM Zhen Chen <chenzhen126@...wei.com> wrote:
>
> 472c2e07eef0 ("tcp: add one skb cache for tx") and related patches added a
> machanism to relax slab layer in tcp stack, by caching one skb per socket.
> The feature is disabled by default and the patch also dropped linear payload
> for small frames, which caused about 5% of performance regression for small
> packets because nic drivers would bother to deal with fraglist than before.
I do not think it is true. Which driver exhibits a 5% penalty exactly ?
I decided to not bring back this feature, and instead make TCP stack
less complex.
We want instead to have all TCP payload in page frags, there is still
a part to rewrite (MTU probing),
and maybe retransmit aggregation.
>
> As d8b81175e412 ("tcp: remove sk_{tr}x_skb_cache") reverted the whole
> machanism but skipped the linear part, just make the revert complete.
>
> Signed-off-by: Zhen Chen <chenzhen126@...wei.com>
> ---
> net/ipv4/tcp.c | 28 +++++++++++++++++++++++++++-
> 1 file changed, 27 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index e5011c136fdb..0b6010051598 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -1154,6 +1154,30 @@ int tcp_sendpage(struct sock *sk, struct page *page, int offset,
> }
> EXPORT_SYMBOL(tcp_sendpage);
>
> +/* Do not bother using a page frag for very small frames.
> + * But use this heuristic only for the first skb in write queue.
> + *
> + * Having no payload in skb->head allows better SACK shifting
> + * in tcp_shift_skb_data(), reducing sack/rack overhead, because
> + * write queue has less skbs.
> + * Each skb can hold up to MAX_SKB_FRAGS * 32Kbytes, or ~0.5 MB.
> + * This also speeds up tso_fragment(), since it wont fallback
> + * to tcp_fragment().
> + */
> +static int linear_payload_sz(bool first_skb)
> +{
> + if (first_skb)
> + return SKB_WITH_OVERHEAD(2048 - MAX_TCP_HEADER);
> + return 0;
> +}
> +
> +static int select_size(bool first_skb, bool zc)
> +{
> + if (zc)
> + return 0;
> + return linear_payload_sz(first_skb);
> +}
> +
> void tcp_free_fastopen_req(struct tcp_sock *tp)
> {
> if (tp->fastopen_req) {
> @@ -1311,6 +1335,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
>
> if (copy <= 0 || !tcp_skb_can_collapse_to(skb)) {
> bool first_skb;
> + int linear;
>
> new_segment:
> if (!sk_stream_memory_free(sk))
> @@ -1322,7 +1347,8 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
> goto restart;
> }
> first_skb = tcp_rtx_and_write_queues_empty(sk);
> - skb = tcp_stream_alloc_skb(sk, 0, sk->sk_allocation,
> + linear = select_size(first_skb, zc);
> + skb = tcp_stream_alloc_skb(sk, linear, sk->sk_allocation,
> first_skb);
> if (!skb)
> goto wait_for_space;
> --
> 2.23.0
>
Powered by blists - more mailing lists