[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID:
<PH8PR11MB796577DE9F07CFE627B1DE4C958BA@PH8PR11MB7965.namprd11.prod.outlook.com>
Date: Fri, 9 May 2025 01:56:47 +0000
From: <Ronnie.Kunin@...rochip.com>
To: <kuba@...nel.org>, <Thangaraj.S@...rochip.com>
CC: <Bryan.Whitehead@...rochip.com>, <andrew+netdev@...n.ch>,
<davem@...emloft.net>, <andrew@...n.ch>, <linux-kernel@...r.kernel.org>,
<netdev@...r.kernel.org>, <pabeni@...hat.com>, <edumazet@...gle.com>,
<UNGLinuxDriver@...rochip.com>
Subject: RE: [PATCH v1 net-next] net: lan743x: configure interrupt moderation
timers based on speed
> On Thu, 8 May 2025 03:36:17 +0000 Thangaraj.S@...rochip.com wrote:
> > I agree with your comments and will implement the ethtool option to
> > provide flexibility, while keeping the default behavior as defined in
> > this patch based on speed.
>
> Why the speed based defaults? Do other vendors do that?
> 330usec is a very high latency, and if the link is running at 10M we probably don't need IRQ moderation
> at all?
>
> For Tx deferring the completion based on link speed makes sense.
> We want an IRQ for a fixed amount of data, doesn't matter how fast its going out.
IRQ for a fixed amount of data is not (at least not directly, may indirectly) be any of our criteria. From the use case perspective, I do enumerate our requirements later in this mail. From the functional perspective of the device, we need to free up DMA descriptors at least fast enough after the DMA operation completed that we do not delay initiating new DMA Tx operations due to lack descriptors since this could end up causing undesired transmission gaps on the wire.
> But for Rx the more aggressive the moderation the higher the latency. In my experience the Rx moderation should not
> depend on link speed.
In my experience you cannot make such an absolute statements. It really depends on the use case and what is most important to prioritize for that use case. Use cases will drive requirements, and different requirements can have conflicting moderation values. Different products or even use cases within the same product may therefore end up having to use different moderation values. So how do you pick what the default behavior should be?
What our customers (aggregate requirements) have told us FOR THIS PRODUCT is that the order of priority (1 being the highest) is:
1. Line rate throughput (or as closest as possible to it) using iPerf3 TCP test with defaults as a benchmark
2. Lowest possible CPU utilization (but not in detriment of throughput)
3. Lowest possible power consumption (but not to detriment of throughput or CPU util)
4. Lowest possible latency (but not to detriment of any of the above)
Therefore this is what we use to guide our moderation criteria.
When this Linux driver was initially submitted, #1 throughput referred to unidirectional line rate, and the value of 400us still currently used for moderation was achieving those requirements for our 1G LAN743x devices. Even if the latency might seem high, since it has not been changed at all since its introduction in 2018 it is clear that has not been a concern for anybody using this device with this driver. During the last year or two the #1 requirement has become more stringent as our customers need to achieve concurrent bidirectional line rate throughput with either iPerf3 TCP or UDP defaults, while still maintaining 2-4 as secondary goals. We also released new devices (PCI11x1x) serviced by this driver that support 2.5G. 400us is not suitable for these higher/bidir data rates. Even if we reduced the moderation value to meet the new highest throughput demands, using a fixed value does not let you achieve these goals for all speeds (i.e. if we set a fixed low moderation value for the highest possible throughput of 2.5G, our CPU utilization is blown up for lower speeds). This is what lead to our decision to now adjust moderation based on current link speed, and we came up with those values by empirically tunning them. Eventually, rather than only use the current link speed, we will likely continue to evolve to an approach where we will measure throughput periodically and adjust moderation dynamically to the measured throughput value, which will be much better. The latter will take some time to be designed properly (or may even require some hardware mods to implement it best), so this is an initial low-cost step in the right direction that satisfies our current product requirements. We understand that there may be a smaller portion of the users of this device whose use cases might drive different order of priorities and would therefore need a different moderation value (that might include those that see a regression after the change because 400us is no longer used). That's where your/Andrew's recommendation to implement ethtool -C (thanks!) becomes very useful, allowing those users to set whatever static value they want (speed adaptive mode will be turned off when they do). But going back to the beginning, for the default configuration, we do need the moderation to be link speed adaptive. This is what we have already been using very successfully for some time in our drivers for this device for other OSs and is what satisfies the requirements of the lion share of our customers.
Powered by blists - more mailing lists