lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 17 Dec 2021 11:48:05 +0900
From:   Rei Yamamoto <yamamoto.rei@...fujitsu.com>
To:     ming.lei@...hat.com
Cc:     hch@....de, kbusch@...nel.org, linux-kernel@...r.kernel.org,
        maz@...nel.org, tglx@...utronix.de, yamamoto.rei@...fujitsu.com
Subject: Re: [PATCH] irq: consider cpus on nodes are unbalanced

On Wed, Dec 15, 2021 at 12:33, Ming Lei wrote:
>> >> If cpus on a node are offline at boot time, there are
>> >> difference in the number of nodes between when building affinity
>> >> masks for present cpus and when building affinity masks for possible
>> >> cpus.
>
> There is always difference between the two number of nodes, the 1st is
> node number covering present cpus, and the 2nd one is the node number
> covering other possible cpus not spread.

In this case, building affinity masks for possible cpus would change even
the affinity mask bits for present cpus in the "if (numvecs <= nodes)" route.
This is the second problem I mentioned.
I will explain about the actual case later.

>
>>> This patch fixes 2 problems caused by the difference of the
>
> Is there any user visible problem?

The panic occured in lpfc driver.

>
>> >> number of nodes:
>> >>
>> >>  - If some unused vectors remain after building masks for present cpus,
>
> We just select a new vector for starting the spread if un-allocated
> vectors remains, but the number for allocation is still numvecs. We hope both
> present cpus and non-present cpus can be balanced on each vector, so that each
> vector may get present cpu allocated.

I understood.
I withdraw the first problem I mentioned.

>
>> >>    remained vectors are assigned for building masks for possible cpus.
>> >>    Therefore "numvecs <= nodes" condition must be
>> >>    "vecs_to_assign <= nodes_to_assign". Fix this problem by making this
>> >>    condition appropriate.
>> >>
>> >>  - The routine of "numvecs <= nodes" condition can overwrite bits of
>> >>    masks for present cpus in building masks for possible cpus. Fix this
>> >>    problem by making CPU bits, which is not target, not changing.
>
> 'numvecs' is always the total number of vectors for assigning CPUs, if
> the number is <= nodes, we just assign interested cpus in the whole
> node into each vector until all interested cpus are allocated out.
>
>
>> Do you have any comments?
>
> Not see issues in current way, or can you explain a bit the real
> user visible problem in details?

I experienced a panic occurred in lpfc driver with broken affinity masks.

The system had the following configuration:
-----
node num: cpu num
Node #0: #0 #1 (#4 #8 #12)
Node #1: #2 #3 (#5 #9 #13)
Node #2: (#6 #10 #14)
Node #3: (#7 #11 #15)

Number of CPUs: 16
Present CPU: cpu0, cpu1, cpu2, cpu3
Number of nodes covering present cpus: 2
Number of nodes covering possible cpus: 4
Number of vectors: 4
-----

Due to the configuration above, cpumask_var_t *node_to_cpumask was as follows:
-----
node_to_cpumask[0] = 0x1113
node_to_cpumask[1] = 0x222c
node_to_cpumask[2] = 0x4440
node_to_cpumask[3] = 0x8880
-----

As the result of assigning vertors for present cpus, masks[].mask were as follows:
-----
masks[vec1].mask = 0x0004
masks[vec2].mask = 0x0008
masks[vec3].mask = 0x0001
masks[vec4].mask = 0x0002
-----

As the result of assigning vertors for possible cpus, masks[].mask were as follows:
-----
masks[vec1].mask = 0x1117
masks[vec2].mask = 0x222c
masks[vec3].mask = 0x4441
masks[vec4].mask = 0x8882
-----

The problem I encountered was that multiple vectors were assigned for
a single present cpu unexpectedly.
For example, vec1 and vec3 were assigned to cpu0.
Due to this mask, the panic occured in lpfc driver.

>> >>  - The routine of "numvecs <= nodes" condition can overwrite bits of
>> >>    masks for present cpus in building masks for possible cpus. Fix this
>> >>    problem by making CPU bits, which is not target, not changing.

Therefore, if it uses node_to_cpumask, AND is necessary in order not to change
CPU bits of non target.

Thanks,
Rei

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ