linux-kernel - Re: [PATCH] net: openvswitch: Use for_each_cpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2dc70249-7de2-4178-9184-2d50cc0dffe9@ovn.org>
Date: Thu, 14 Aug 2025 23:21:02 +0200
From: Ilya Maximets <i.maximets@....org>
To: Yury Norov <yury.norov@...il.com>
Cc: i.maximets@....org, Aaron Conole <aconole@...hat.com>,
 Eelco Chaudron <echaudro@...hat.com>, "David S. Miller"
 <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
 Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
 Simon Horman <horms@...nel.org>, netdev@...r.kernel.org,
 dev@...nvswitch.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] net: openvswitch: Use for_each_cpu_from() where
 appropriate

On 8/14/25 11:05 PM, Yury Norov wrote:
> On Thu, Aug 14, 2025 at 10:49:30PM +0200, Ilya Maximets wrote:
>> On 8/14/25 9:58 PM, Yury Norov wrote:
>>> From: Yury Norov (NVIDIA) <yury.norov@...il.com>
>>>
>>> Openvswitch opencodes for_each_cpu_from(). Fix it and drop some
>>> housekeeping code.
>>>
>>> Signed-off-by: Yury Norov (NVIDIA) <yury.norov@...il.com>
>>> ---
>>>  net/openvswitch/flow.c       | 14 ++++++--------
>>>  net/openvswitch/flow_table.c |  8 ++++----
>>>  2 files changed, 10 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
>>> index b80bd3a90773..b464ab120731 100644
>>> --- a/net/openvswitch/flow.c
>>> +++ b/net/openvswitch/flow.c
>>> @@ -129,15 +129,14 @@ void ovs_flow_stats_get(const struct sw_flow *flow,
>>>  			struct ovs_flow_stats *ovs_stats,
>>>  			unsigned long *used, __be16 *tcp_flags)
>>>  {
>>> -	int cpu;
>>> +	/* CPU 0 is always considered */
>>> +	unsigned int cpu = 1;
>>
>> Hmm.  I'm a bit confused here.  Where is CPU 0 considered if we start
>> iteration from 1?
> 
> I didn't touch this part of the original comment, as you see, and I'm
> not a domain expert, so don't know what does this wording mean.
> 
> Most likely 'always considered' means that CPU0 is not accounted in this
> statistics.
>   
>>>  	*used = 0;
>>>  	*tcp_flags = 0;
>>>  	memset(ovs_stats, 0, sizeof(*ovs_stats));
>>>  
>>> -	/* We open code this to make sure cpu 0 is always considered */
>>> -	for (cpu = 0; cpu < nr_cpu_ids;
>>> -	     cpu = cpumask_next(cpu, flow->cpu_used_mask)) {
>>> +	for_each_cpu_from(cpu, flow->cpu_used_mask) {
>>
>> And why it needs to be a for_each_cpu_from() and not just for_each_cpu() ?
> 
> The original code explicitly ignores CPU0.

No, it's not.  The loop explicitly starts from zero.  And the comments
are saying that the loop is open-coded specifically to always have zero
in the iteration.

> If we use for_each_cpu(),
> it would ignore initial value in 'cpu'. Contrary, for_each_cpu_from()
> does respect it.
> 
>> Note: the original logic here came from using for_each_node() back when
>> stats were collected per numa, and it was important to check node 0 when
>> the system didn't have it, so the loop was open-coded, see commit:
>>   40773966ccf1 ("openvswitch: fix flow stats accounting when node 0 is not possible")
>>
>> Later the stats collection was changed to be per-CPU instead of per-NUMA,
>> th eloop was adjusted to CPUs, but remained open-coded, even though it
>> was probbaly safe to use for_each_cpu() macro here, as it accepts the
>> mask and doesn't limit it to available CPUs, unlike the for_each_node()
>> macro that only iterates over possible NUMA node numbers and will skip
>> the zero.  The zero is importnat, because it is used as long as only one
>> core updates the stats, regardless of the number of that core, AFAIU.
>>
>> So, the comments in the code do not really make a lot of sense, especially
>> in this patch.
> 
> I can include CPU0 and iterate over it, but it would break the existing
> logic. The intention of my work is to minimize direct cpumask_next()
> usage over the kernel, and as I said I'm not a domain expert here.
> 
> Let's wait for more comments. If it's indeed a bug in current logic,
> I'll happily send v2.
> 
> Thanks,
> Yury