netdev - Re: [PATCH] tcp: disable tcp_autocorking for socket when TCP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANn89iJRHuFgPrsezhm2DuAw7JmLL2ZkPhZaf2Ymuq+STUm-8w@mail.gmail.com>
Date: Thu, 14 Dec 2023 09:40:09 +0100
From: Eric Dumazet <edumazet@...gle.com>
To: Salvatore Dipietro <dipiets@...zon.com>
Cc: alisaidi@...zon.com, benh@...zon.com, blakgeof@...zon.com, 
	davem@...emloft.net, dipietro.salvatore@...il.com, dsahern@...nel.org, 
	kuba@...nel.org, netdev@...r.kernel.org, pabeni@...hat.com
Subject: Re: [PATCH] tcp: disable tcp_autocorking for socket when TCP_NODELAY
 flag is set

On Wed, Dec 13, 2023 at 10:30 PM Salvatore Dipietro <dipiets@...zon.com> wrote:
>
> > It looks like the above disables autocorking even after the userspace
> > sets TCP_CORK. Am I reading it correctly? Is that expected?
>
> I have tested a new version of the patch which can target only TCP_NODELAY.
> Results using previous benchmark are identical. I will submit it in a new
> patch version.

Well, I do not think we will accept a patch there, because you
basically are working around the root cause
for a certain variety of workloads.

Issue would still be there for applications not using TCP_NODELAY

>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -716,7 +716,8 @@
>
>         tcp_mark_urg(tp, flags);
>
> -       if (tcp_should_autocork(sk, skb, size_goal)) {
> +       if (!(nonagle & TCP_NAGLE_OFF) &&
> +           tcp_should_autocork(sk, skb, size_goal)) {
>
>                 /* avoid atomic op if TSQ_THROTTLED bit is already set */
>                 if (!test_bit(TSQ_THROTTLED, &sk->sk_tsq_flags)) {
>
>
>
> > Also I wonder about these 40ms delays, TCP small queue handler should
> > kick when the prior skb is TX completed.
> >
> > It seems the issue is on the driver side ?
> >
> > Salvatore, which driver are you using ?
>
> I am using ENA driver.
>
> Eric can you please clarify where do you think the problem is?

The problem is that TSQ logic is not working properly, probably
because the driver
holds a packet that has been sent.

TX completion seems to be delayed until the next transmit happens on
the transmit queue.

I suspect some kind of missed interrupt or a race.

virtio_net is known to have a similar issue (not sure if this has been
fixed lately)

ena_io_poll() and ena_intr_msix_io() logic, playing with
ena_napi->interrupts_masked seem
convoluted/risky to me.

ena_start_xmit() also seems to have bugs vs xmit_more logic, but this
is orthogonal.

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c
b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index c44c44e26ddfe74a93b7f1fb3c3ca90f978909e2..5282e718699ba9e64765bea2435e1c5a55aaa89b
100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -3235,6 +3235,8 @@ static netdev_tx_t ena_start_xmit(struct sk_buff
*skb, struct net_device *dev)

 error_drop_packet:
        dev_kfree_skb(skb);
+       /* Make sure to ring the doorbell. */
+       ena_ring_tx_doorbell(tx_ring);
        return NETDEV_TX_OK;
 }