[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iJRHuFgPrsezhm2DuAw7JmLL2ZkPhZaf2Ymuq+STUm-8w@mail.gmail.com>
Date: Thu, 14 Dec 2023 09:40:09 +0100
From: Eric Dumazet <edumazet@...gle.com>
To: Salvatore Dipietro <dipiets@...zon.com>
Cc: alisaidi@...zon.com, benh@...zon.com, blakgeof@...zon.com,
davem@...emloft.net, dipietro.salvatore@...il.com, dsahern@...nel.org,
kuba@...nel.org, netdev@...r.kernel.org, pabeni@...hat.com
Subject: Re: [PATCH] tcp: disable tcp_autocorking for socket when TCP_NODELAY
flag is set
On Wed, Dec 13, 2023 at 10:30 PM Salvatore Dipietro <dipiets@...zon.com> wrote:
>
> > It looks like the above disables autocorking even after the userspace
> > sets TCP_CORK. Am I reading it correctly? Is that expected?
>
> I have tested a new version of the patch which can target only TCP_NODELAY.
> Results using previous benchmark are identical. I will submit it in a new
> patch version.
Well, I do not think we will accept a patch there, because you
basically are working around the root cause
for a certain variety of workloads.
Issue would still be there for applications not using TCP_NODELAY
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -716,7 +716,8 @@
>
> tcp_mark_urg(tp, flags);
>
> - if (tcp_should_autocork(sk, skb, size_goal)) {
> + if (!(nonagle & TCP_NAGLE_OFF) &&
> + tcp_should_autocork(sk, skb, size_goal)) {
>
> /* avoid atomic op if TSQ_THROTTLED bit is already set */
> if (!test_bit(TSQ_THROTTLED, &sk->sk_tsq_flags)) {
>
>
>
> > Also I wonder about these 40ms delays, TCP small queue handler should
> > kick when the prior skb is TX completed.
> >
> > It seems the issue is on the driver side ?
> >
> > Salvatore, which driver are you using ?
>
> I am using ENA driver.
>
> Eric can you please clarify where do you think the problem is?
The problem is that TSQ logic is not working properly, probably
because the driver
holds a packet that has been sent.
TX completion seems to be delayed until the next transmit happens on
the transmit queue.
I suspect some kind of missed interrupt or a race.
virtio_net is known to have a similar issue (not sure if this has been
fixed lately)
ena_io_poll() and ena_intr_msix_io() logic, playing with
ena_napi->interrupts_masked seem
convoluted/risky to me.
ena_start_xmit() also seems to have bugs vs xmit_more logic, but this
is orthogonal.
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c
b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index c44c44e26ddfe74a93b7f1fb3c3ca90f978909e2..5282e718699ba9e64765bea2435e1c5a55aaa89b
100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -3235,6 +3235,8 @@ static netdev_tx_t ena_start_xmit(struct sk_buff
*skb, struct net_device *dev)
error_drop_packet:
dev_kfree_skb(skb);
+ /* Make sure to ring the doorbell. */
+ ena_ring_tx_doorbell(tx_ring);
return NETDEV_TX_OK;
}
Powered by blists - more mailing lists