netdev - RE: [PATCH] tcp: disable tcp_autocorking for socket when TCP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3baf5407-34b1-d616-9552-19696933e0c2@amazon.com>
Date: Thu, 14 Dec 2023 09:52:21 -0600
From: Geoff Blake <blakgeof@...zon.com>
To: Eric Dumazet <edumazet@...gle.com>
CC: Salvatore Dipietro <dipiets@...zon.com>, <alisaidi@...zon.com>,
	<benh@...zon.com>, <davem@...emloft.net>, <dipietro.salvatore@...il.com>,
	<dsahern@...nel.org>, <kuba@...nel.org>, <netdev@...r.kernel.org>,
	<pabeni@...hat.com>
Subject: RE: [PATCH] tcp: disable tcp_autocorking for socket when TCP_NODELAY flag is
 set

Thanks for helping dig in here Eric, but what is supposed to happen on TX 
completion? We're unfamiliar with TCP small queues beside finding your old 
LKML listing that states a tasklet is supposed to run if there is pending 
data.  So need a bit more guidance if you could.

I think its supposed to call tcp_free() when the skb is destructed and 
that invokes the tasklet?  There is also sock_wfree(), it does not appear 
to have the linkage to the tasklet by design.

We did attach probes at one point to look at whether there was a chance an 
interrupt went missing (but don't have them on-hand anymore), but we 
always saw the TX completion happen. When the 40ms latency happened 
we'd see that the completion had happened just after the other packet decided to 
be corked.  But it certainly doesn't hurt to double check.  

- Geoff Blake

On Thu, 14 Dec 2023, Eric Dumazet wrote:

> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> On Wed, Dec 13, 2023 at 10:30 PM Salvatore Dipietro <dipiets@...zon.com> wrote:
> >
> > > It looks like the above disables autocorking even after the userspace
> > > sets TCP_CORK. Am I reading it correctly? Is that expected?
> >
> > I have tested a new version of the patch which can target only TCP_NODELAY.
> > Results using previous benchmark are identical. I will submit it in a new
> > patch version.
> >
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -716,7 +716,8 @@
> >
> >         tcp_mark_urg(tp, flags);
> >
> > -       if (tcp_should_autocork(sk, skb, size_goal)) {
> > +       if (!(nonagle & TCP_NAGLE_OFF) &&
> > +           tcp_should_autocork(sk, skb, size_goal)) {
> >
> >                 /* avoid atomic op if TSQ_THROTTLED bit is already set */
> >                 if (!test_bit(TSQ_THROTTLED, &sk->sk_tsq_flags)) {
> >
> >
> >
> > > Also I wonder about these 40ms delays, TCP small queue handler should
> > > kick when the prior skb is TX completed.
> > >
> > > It seems the issue is on the driver side ?
> > >
> > > Salvatore, which driver are you using ?
> >
> > I am using ENA driver.
> >
> > Eric can you please clarify where do you think the problem is?
> >
> 
> Following bpftrace program could double check if ena driver is
> possibly holding TCP skbs too long:
> 
> bpftrace -e 'k:dev_hard_start_xmit {
>  $skb = (struct sk_buff *)arg0;
>  if ($skb->fclone == 2) {
>   @start[$skb] = nsecs;
>  }
> }
> k:__kfree_skb {
>  $skb = (struct sk_buff *)arg0;
>  if ($skb->fclone == 2 && @start[$skb]) {
>   @tx_compl_usecs = hist((nsecs - @start[$skb])/1000);
>   delete(@start[$skb]);
> }
> } END { clear(@start); }'
> 
> iroa21:/home/edumazet# ./trace-tx-completion.sh
> Attaching 3 probes...
> ^C
> 
> 
> @tx_compl_usecs:
> [2, 4)                13 |                                                    |
> [4, 8)               182 |                                                    |
> [8, 16)          2379007 |@@@@@@@@@@@@@@@                                     |
> [16, 32)         7865369 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> [32, 64)         6040939 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@             |
> [64, 128)         199255 |@                                                   |
> [128, 256)          9235 |                                                    |
> [256, 512)            89 |                                                    |
> [512, 1K)             37 |                                                    |
> [1K, 2K)              19 |                                                    |
> [2K, 4K)              56 |                                                    |
>