lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3baf5407-34b1-d616-9552-19696933e0c2@amazon.com>
Date: Thu, 14 Dec 2023 09:52:21 -0600
From: Geoff Blake <blakgeof@...zon.com>
To: Eric Dumazet <edumazet@...gle.com>
CC: Salvatore Dipietro <dipiets@...zon.com>, <alisaidi@...zon.com>,
	<benh@...zon.com>, <davem@...emloft.net>, <dipietro.salvatore@...il.com>,
	<dsahern@...nel.org>, <kuba@...nel.org>, <netdev@...r.kernel.org>,
	<pabeni@...hat.com>
Subject: RE: [PATCH] tcp: disable tcp_autocorking for socket when TCP_NODELAY flag is
 set

Thanks for helping dig in here Eric, but what is supposed to happen on TX 
completion? We're unfamiliar with TCP small queues beside finding your old 
LKML listing that states a tasklet is supposed to run if there is pending 
data.  So need a bit more guidance if you could.

I think its supposed to call tcp_free() when the skb is destructed and 
that invokes the tasklet?  There is also sock_wfree(), it does not appear 
to have the linkage to the tasklet by design.

We did attach probes at one point to look at whether there was a chance an 
interrupt went missing (but don't have them on-hand anymore), but we 
always saw the TX completion happen. When the 40ms latency happened 
we'd see that the completion had happened just after the other packet decided to 
be corked.  But it certainly doesn't hurt to double check.  

- Geoff Blake

On Thu, 14 Dec 2023, Eric Dumazet wrote:

> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> On Wed, Dec 13, 2023 at 10:30 PM Salvatore Dipietro <dipiets@...zon.com> wrote:
> >
> > > It looks like the above disables autocorking even after the userspace
> > > sets TCP_CORK. Am I reading it correctly? Is that expected?
> >
> > I have tested a new version of the patch which can target only TCP_NODELAY.
> > Results using previous benchmark are identical. I will submit it in a new
> > patch version.
> >
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -716,7 +716,8 @@
> >
> >         tcp_mark_urg(tp, flags);
> >
> > -       if (tcp_should_autocork(sk, skb, size_goal)) {
> > +       if (!(nonagle & TCP_NAGLE_OFF) &&
> > +           tcp_should_autocork(sk, skb, size_goal)) {
> >
> >                 /* avoid atomic op if TSQ_THROTTLED bit is already set */
> >                 if (!test_bit(TSQ_THROTTLED, &sk->sk_tsq_flags)) {
> >
> >
> >
> > > Also I wonder about these 40ms delays, TCP small queue handler should
> > > kick when the prior skb is TX completed.
> > >
> > > It seems the issue is on the driver side ?
> > >
> > > Salvatore, which driver are you using ?
> >
> > I am using ENA driver.
> >
> > Eric can you please clarify where do you think the problem is?
> >
> 
> Following bpftrace program could double check if ena driver is
> possibly holding TCP skbs too long:
> 
> bpftrace -e 'k:dev_hard_start_xmit {
>  $skb = (struct sk_buff *)arg0;
>  if ($skb->fclone == 2) {
>   @start[$skb] = nsecs;
>  }
> }
> k:__kfree_skb {
>  $skb = (struct sk_buff *)arg0;
>  if ($skb->fclone == 2 && @start[$skb]) {
>   @tx_compl_usecs = hist((nsecs - @start[$skb])/1000);
>   delete(@start[$skb]);
> }
> } END { clear(@start); }'
> 
> iroa21:/home/edumazet# ./trace-tx-completion.sh
> Attaching 3 probes...
> ^C
> 
> 
> @tx_compl_usecs:
> [2, 4)                13 |                                                    |
> [4, 8)               182 |                                                    |
> [8, 16)          2379007 |@@@@@@@@@@@@@@@                                     |
> [16, 32)         7865369 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> [32, 64)         6040939 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@             |
> [64, 128)         199255 |@                                                   |
> [128, 256)          9235 |                                                    |
> [256, 512)            89 |                                                    |
> [512, 1K)             37 |                                                    |
> [1K, 2K)              19 |                                                    |
> [2K, 4K)              56 |                                                    |
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ