netdev - Re: [PATCH iwl-next v3] ice: use netif_get_num_default_rss

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <ba97ee38-a9d6-4d59-811b-055534ffbe8a@intel.com>
Date: Wed, 5 Nov 2025 11:14:09 +0100
From: Przemek Kitszel <przemyslaw.kitszel@...el.com>
To: Michal Swiatkowski <michal.swiatkowski@...ux.intel.com>
CC: Paul Menzel <pmenzel@...gen.mpg.de>, <intel-wired-lan@...ts.osuosl.org>,
	<netdev@...r.kernel.org>, <aleksander.lobakin@...el.com>,
	<jacob.e.keller@...el.com>, Aleksandr Loktionov
	<aleksandr.loktionov@...el.com>
Subject: Re: [PATCH iwl-next v3] ice: use netif_get_num_default_rss_queues()

On 10/31/25 14:17, Michal Swiatkowski wrote:
> On Thu, Oct 30, 2025 at 11:39:30AM +0100, Przemek Kitszel wrote:
>> On 10/30/25 10:37, Michal Swiatkowski wrote:
>>> On Thu, Oct 30, 2025 at 10:10:32AM +0100, Paul Menzel wrote:
>>>> Dear Michal,
>>>>
>>>>
>>>> Thank you for your patch. For the summary, I’d add:
>>>>
>>>> ice: Use netif_get_num_default_rss_queues() to decrease queue number
>>
>> I would instead just say:
>> ice: cap the default number of queues to 64
>>
>> as this is exactly what happens. Then next paragraph could be:
>> Use netif_get_num_default_rss_queues() as a better base (instead of
>> the number of CPU cores), but still cap it to 64 to avoid excess IRQs
>> assigned to PF (what would leave, in some cases, nothing for VFs).
>>
>> sorry for such late nitpicks
>> and, see below too
> 
> I moved away from capping to 64, now it is just call to
> netif_get_num_default_rss_queues(). Following Olek's comment, dividing
> by 2 is just fine now and looks like there is no good reasone to cap it
> more in the driver, but let's discuss it here if you have different
> opinion.

I see, sorry for the confusion
with that I'm fine with the change being -next material, and commit
message is good (not sure if perfect, but it does not need to be)
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@...el.com>

> 
>>
>>>>
>>>> Am 30.10.25 um 09:30 schrieb Michal Swiatkowski:
>>>>> On some high-core systems (like AMD EPYC Bergamo, Intel Clearwater
>>>>> Forest) loading ice driver with default values can lead to queue/irq
>>>>> exhaustion. It will result in no additional resources for SR-IOV.
>>>>
>>>> Could you please elaborate how to make the queue/irq exhaustion visible?
>>>>
>>>
>>> What do you mean? On high core system, lets say num_online_cpus()
>>> returns 288, on 8 ports card we have online 256 irqs per eqch PF (2k in
>>> total). Driver will load with the 256 queues (and irqs) on each PF.
>>> Any VFs creation command will fail due to no free irqs available.
>>
>> this clearly means this is a -net material,
>> even if this commit will be rather unpleasant for backports to stable
>>
> 
> In my opinion it isn't. It is just about default values. Still in the
> described case user can call ethtool -L and lower the queues to create
> VFs without a problem.
> 
>>> (echo X > /sys/class/net/ethX/device/sriov_numvfs)
>>>
>>>>> In most cases there is no performance reason for more than half
>>>>> num_cpus(). Limit the default value to it using generic
>>>>> netif_get_num_default_rss_queues().
>>>>>
>>>>> Still, using ethtool the number of queues can be changed up to
>>>>> num_online_cpus(). It can be done by calling:
>>>>> $ethtool -L ethX combined $(nproc)
>>>>>
>>>>> This change affects only the default queue amount.
>>>>
>>>> How would you judge the regression potential, that means for people where
>>>> the defaults work good enough, and the queue number is reduced now?
>>>>
>>>
>>> You can take a look into commit that introduce /2 change in
>>> netif_get_num_default_rss_queues() [1]. There is a good justification
>>> for such situation. In short, heaving physical core number is just a
>>> wasting of CPU resources.
>>>
>>> [1] https://lore.kernel.org/netdev/20220315091832.13873-1-ihuguet@redhat.com/
>>>
>> [...]