[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAN9Uquc9Ji2o4WA-Bo6JCY-4X4G54KaLPS1c5VOcCbhWMkR0KQ@mail.gmail.com>
Date: Wed, 10 Jul 2024 21:19:57 +0800
From: Niigee Mashook <mashookniigee@...il.com>
To: Potnuri Bharat Teja <bharat@...lsio.com>, "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
netdev@...r.kernel.org
Subject: Questions about the chelsio/cxgb3 Driver - TX Stall
Hello everyone in the networking field!
As a learner of kernel networking, I came across the following comment
in the t3_eth_xmit() while exploring the chelsio/cxgb3 driver code (in
drivers/net/ethernet/chelsio/cxgb3/sge.c file):
/*
* We do not use Tx completion interrupts to free DMAd Tx packets.
* This is good for performance but means that we rely on new Tx
* packets arriving to run the destructors of completed packets,
* which open up space in their sockets' send queues. Sometimes
* we do not get such new packets causing Tx to stall. A single
* UDP transmitter is a good example of this situation. We have
* a clean up timer that periodically reclaims completed packets
* but it doesn't run often enough (nor do we want it to) to prevent
* lengthy stalls. A solution to this problem is to run the
* destructor early, after the packet is queued but before it's DMAd.
* A cons is that we lie to socket memory accounting, but the amount
* of extra memory is reasonable (limited by the number of Tx
* descriptors), the packets do actually get freed quickly by new
* packets almost always, and for protocols like TCP that wait for
* acks to really free up the data the extra memory is even less.
* On the positive side we run the destructors on the sending CPU
* rather than on a potentially different completing CPU, usually a
* good thing. We also run them without holding our Tx queue lock,
* unlike what reclaim_completed_tx() would otherwise do.
*
* Run the destructor before telling the DMA engine about the packet
* to make sure it doesn't complete and get freed prematurely.
*/
if (likely(!skb_shared(skb)))
skb_orphan(skb);
I tried to understand this insightful comment but found myself unsure
of certain points. Here are my main questions:
1. Why is not using Tx completion interrupts considered better?
One reason I can think of is that reducing interrupts to the CPU can
improve overall performance by allowing the CPU to handle packets more
efficiently. However, I am concerned that using skb_orphan might cause
issues like invalidating autocork and leading to bufferbloat(TSQ's
functionality), which could negatively impact performance. Would this
not cause a performance regression?
2. The comment specifically mentions skb_orphan, and not using it
would cause a Tx stall. Why is that?
My understanding is that when sk->sk_sndbuf is small, it might allow
only the first packet to be sent. Without skb_orphan, after sending
the first packet, sk->sk_sndbuf becomes equal to sk_wmem_alloc, which
would prevent subsequent packets from being sent. As a result,
sk_wmem_alloc would never decrease, leading to a Tx stall. Is this
correct?
Looking forward to your insights!
Powered by blists - more mailing lists