netdev - Questions about the chelsio/cxgb3 Driver

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAN9Uquc9Ji2o4WA-Bo6JCY-4X4G54KaLPS1c5VOcCbhWMkR0KQ@mail.gmail.com>
Date: Wed, 10 Jul 2024 21:19:57 +0800
From: Niigee Mashook <mashookniigee@...il.com>
To: Potnuri Bharat Teja <bharat@...lsio.com>, "David S. Miller" <davem@...emloft.net>, 
	Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, 
	netdev@...r.kernel.org
Subject: Questions about the chelsio/cxgb3 Driver - TX Stall

Hello everyone in the networking field!

As a learner of kernel networking, I came across the following comment
in the t3_eth_xmit() while exploring the chelsio/cxgb3 driver code (in
drivers/net/ethernet/chelsio/cxgb3/sge.c file)：

       /*
         * We do not use Tx completion interrupts to free DMAd Tx packets.
         * This is good for performance but means that we rely on new Tx
         * packets arriving to run the destructors of completed packets,
         * which open up space in their sockets' send queues.  Sometimes
         * we do not get such new packets causing Tx to stall.  A single
         * UDP transmitter is a good example of this situation.  We have
         * a clean up timer that periodically reclaims completed packets
         * but it doesn't run often enough (nor do we want it to) to prevent
         * lengthy stalls.  A solution to this problem is to run the
         * destructor early, after the packet is queued but before it's DMAd.
         * A cons is that we lie to socket memory accounting, but the amount
         * of extra memory is reasonable (limited by the number of Tx
         * descriptors), the packets do actually get freed quickly by new
         * packets almost always, and for protocols like TCP that wait for
         * acks to really free up the data the extra memory is even less.
         * On the positive side we run the destructors on the sending CPU
         * rather than on a potentially different completing CPU, usually a
         * good thing.  We also run them without holding our Tx queue lock,
         * unlike what reclaim_completed_tx() would otherwise do.
         *
         * Run the destructor before telling the DMA engine about the packet
         * to make sure it doesn't complete and get freed prematurely.
         */
        if (likely(!skb_shared(skb)))
                skb_orphan(skb);

I tried to understand this insightful comment but found myself unsure
of certain points. Here are my main questions:

1. Why is not using Tx completion interrupts considered better?
One reason I can think of is that reducing interrupts to the CPU can
improve overall performance by allowing the CPU to handle packets more
efficiently. However, I am concerned that using skb_orphan might cause
issues like invalidating autocork and leading to bufferbloat(TSQ's
functionality), which could negatively impact performance. Would this
not cause a performance regression?

2. The comment specifically mentions skb_orphan, and not using it
would cause a Tx stall. Why is that?
My understanding is that when sk->sk_sndbuf is small, it might allow
only the first packet to be sent. Without skb_orphan, after sending
the first packet, sk->sk_sndbuf becomes equal to sk_wmem_alloc, which
would prevent subsequent packets from being sent. As a result,
sk_wmem_alloc would never decrease, leading to a Tx stall. Is this
correct?

Looking forward to your insights!