[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iLP15xQjmPHxvQBQ=bWbbVk4_41yLC8o5E97TQWFmRioQ@mail.gmail.com>
Date: Wed, 7 Sep 2022 09:06:36 -0700
From: Eric Dumazet <edumazet@...gle.com>
To: Alexandra Winter <wintera@...ux.ibm.com>
Cc: David Miller <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
Yuchung Cheng <ycheng@...gle.com>,
Soheil Hassas Yeganeh <soheil@...gle.com>,
Willem de Bruijn <willemb@...gle.com>,
Paolo Abeni <pabeni@...hat.com>,
Mat Martineau <mathew.j.martineau@...ux.intel.com>,
Saeed Mahameed <saeedm@...dia.com>,
Niklas Schnelle <schnelle@...ux.ibm.com>,
Christian Borntraeger <borntraeger@...ux.ibm.com>,
netdev <netdev@...r.kernel.org>, linux-s390@...r.kernel.org,
Heiko Carstens <hca@...ux.ibm.com>
Subject: Re: [RFC net] tcp: Fix performance regression for request-response workloads
On Wed, Sep 7, 2022 at 5:26 AM Alexandra Winter <wintera@...ux.ibm.com> wrote:
>
> Since linear payload was removed even for single small messages,
> an additional page is required and we are measuring performance impact.
>
> 3613b3dbd1ad ("tcp: prepare skbs for better sack shifting")
> explicitely allowed "payload in skb->head for first skb put in the queue,
> to not impact RPC workloads."
> 472c2e07eef0 ("tcp: add one skb cache for tx")
> made that obsolete and removed it.
> When
> d8b81175e412 ("tcp: remove sk_{tr}x_skb_cache")
> reverted it, this piece was not reverted and not added back in.
>
> When running uperf with a request-response pattern with 1k payload
> and 250 connections parallel, we measure 13% difference in throughput
> for our PCI based network interfaces since 472c2e07eef0.
> (our IO MMU is sensitive to the number of mapped pages)
>
> Could you please consider allowing linear payload for the first
> skb in queue again? A patch proposal is appended below.
No.
Please add a work around in your driver.
You can increase throughput by 20% by premapping a coherent piece of
memory in which
you can copy small skbs (skb->head included)
Something like 256 bytes per slot in the TX ring.
>
> Kind regards
> Alexandra
>
> ---------------------------------------------------------------
>
> tcp: allow linear skb payload for first in queue
>
> Allow payload in skb->head for first skb in the queue,
> RPC workloads will benefit.
>
> Fixes: 472c2e07eef0 ("tcp: add one skb cache for tx")
> Signed-off-by: Alexandra Winter <wintera@...ux.ibm.com>
> ---
> net/ipv4/tcp.c | 39 +++++++++++++++++++++++++++++++++++++--
> 1 file changed, 37 insertions(+), 2 deletions(-)
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index e5011c136fdb..f7cbccd41d85 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -1154,6 +1154,30 @@ int tcp_sendpage(struct sock *sk, struct page *page, int offset,
> }
> EXPORT_SYMBOL(tcp_sendpage);
>
> +/* Do not bother using a page frag for very small frames.
> + * But use this heuristic only for the first skb in write queue.
> + *
> + * Having no payload in skb->head allows better SACK shifting
> + * in tcp_shift_skb_data(), reducing sack/rack overhead, because
> + * write queue has less skbs.
> + * Each skb can hold up to MAX_SKB_FRAGS * 32Kbytes, or ~0.5 MB.
> + * This also speeds up tso_fragment(), since it won't fallback
> + * to tcp_fragment().
> + */
> +static int linear_payload_sz(bool first_skb)
> +{
> + if (first_skb)
> + return SKB_WITH_OVERHEAD(2048 - MAX_TCP_HEADER);
> + return 0;
> +}
> +
> +static int select_size(bool first_skb, bool zc)
> +{
> + if (zc)
> + return 0;
> + return linear_payload_sz(first_skb);
> +}
> +
> void tcp_free_fastopen_req(struct tcp_sock *tp)
> {
> if (tp->fastopen_req) {
> @@ -1311,6 +1335,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
>
> if (copy <= 0 || !tcp_skb_can_collapse_to(skb)) {
> bool first_skb;
> + int linear;
>
> new_segment:
> if (!sk_stream_memory_free(sk))
> @@ -1322,7 +1347,9 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
> goto restart;
> }
> first_skb = tcp_rtx_and_write_queues_empty(sk);
> - skb = tcp_stream_alloc_skb(sk, 0, sk->sk_allocation,
> + linear = select_size(first_skb, zc);
> + skb = tcp_stream_alloc_skb(sk, linear,
> + sk->sk_allocation,
> first_skb);
> if (!skb)
> goto wait_for_space;
> @@ -1344,7 +1371,15 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
> if (copy > msg_data_left(msg))
> copy = msg_data_left(msg);
>
> - if (!zc) {
> + /* Where to copy to? */
> + if (skb_availroom(skb) > 0 && !zc) {
> + /* We have some space in skb head. Superb! */
> + copy = min_t(int, copy, skb_availroom(skb));
> + err = skb_add_data_nocache(sk, skb, &msg->msg_iter,
> + copy);
> + if (err)
> + goto do_error;
> + } else if (!zc) {
> bool merge = true;
> int i = skb_shinfo(skb)->nr_frags;
> struct page_frag *pfrag = sk_page_frag(sk);
> --
> 2.24.3 (Apple Git-128)
>
Powered by blists - more mailing lists