[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3baf5407-34b1-d616-9552-19696933e0c2@amazon.com>
Date: Thu, 14 Dec 2023 09:52:21 -0600
From: Geoff Blake <blakgeof@...zon.com>
To: Eric Dumazet <edumazet@...gle.com>
CC: Salvatore Dipietro <dipiets@...zon.com>, <alisaidi@...zon.com>,
<benh@...zon.com>, <davem@...emloft.net>, <dipietro.salvatore@...il.com>,
<dsahern@...nel.org>, <kuba@...nel.org>, <netdev@...r.kernel.org>,
<pabeni@...hat.com>
Subject: RE: [PATCH] tcp: disable tcp_autocorking for socket when TCP_NODELAY flag is
set
Thanks for helping dig in here Eric, but what is supposed to happen on TX
completion? We're unfamiliar with TCP small queues beside finding your old
LKML listing that states a tasklet is supposed to run if there is pending
data. So need a bit more guidance if you could.
I think its supposed to call tcp_free() when the skb is destructed and
that invokes the tasklet? There is also sock_wfree(), it does not appear
to have the linkage to the tasklet by design.
We did attach probes at one point to look at whether there was a chance an
interrupt went missing (but don't have them on-hand anymore), but we
always saw the TX completion happen. When the 40ms latency happened
we'd see that the completion had happened just after the other packet decided to
be corked. But it certainly doesn't hurt to double check.
- Geoff Blake
On Thu, 14 Dec 2023, Eric Dumazet wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>
>
>
> On Wed, Dec 13, 2023 at 10:30 PM Salvatore Dipietro <dipiets@...zon.com> wrote:
> >
> > > It looks like the above disables autocorking even after the userspace
> > > sets TCP_CORK. Am I reading it correctly? Is that expected?
> >
> > I have tested a new version of the patch which can target only TCP_NODELAY.
> > Results using previous benchmark are identical. I will submit it in a new
> > patch version.
> >
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -716,7 +716,8 @@
> >
> > tcp_mark_urg(tp, flags);
> >
> > - if (tcp_should_autocork(sk, skb, size_goal)) {
> > + if (!(nonagle & TCP_NAGLE_OFF) &&
> > + tcp_should_autocork(sk, skb, size_goal)) {
> >
> > /* avoid atomic op if TSQ_THROTTLED bit is already set */
> > if (!test_bit(TSQ_THROTTLED, &sk->sk_tsq_flags)) {
> >
> >
> >
> > > Also I wonder about these 40ms delays, TCP small queue handler should
> > > kick when the prior skb is TX completed.
> > >
> > > It seems the issue is on the driver side ?
> > >
> > > Salvatore, which driver are you using ?
> >
> > I am using ENA driver.
> >
> > Eric can you please clarify where do you think the problem is?
> >
>
> Following bpftrace program could double check if ena driver is
> possibly holding TCP skbs too long:
>
> bpftrace -e 'k:dev_hard_start_xmit {
> $skb = (struct sk_buff *)arg0;
> if ($skb->fclone == 2) {
> @start[$skb] = nsecs;
> }
> }
> k:__kfree_skb {
> $skb = (struct sk_buff *)arg0;
> if ($skb->fclone == 2 && @start[$skb]) {
> @tx_compl_usecs = hist((nsecs - @start[$skb])/1000);
> delete(@start[$skb]);
> }
> } END { clear(@start); }'
>
> iroa21:/home/edumazet# ./trace-tx-completion.sh
> Attaching 3 probes...
> ^C
>
>
> @tx_compl_usecs:
> [2, 4) 13 | |
> [4, 8) 182 | |
> [8, 16) 2379007 |@@@@@@@@@@@@@@@ |
> [16, 32) 7865369 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> [32, 64) 6040939 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
> [64, 128) 199255 |@ |
> [128, 256) 9235 | |
> [256, 512) 89 | |
> [512, 1K) 37 | |
> [1K, 2K) 19 | |
> [2K, 4K) 56 | |
>
Powered by blists - more mailing lists