lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <87h68qttjn.ffs@tglx>
Date: Sat, 02 Nov 2024 00:37:16 +0100
From: Thomas Gleixner <tglx@...utronix.de>
To: mapicccy <guanjun@...ux.alibaba.com>
Cc: corbet@....net, axboe@...nel.dk, mst@...hat.com, jasowang@...hat.com,
 xuanzhuo@...ux.alibaba.com, eperezma@...hat.com, vgoyal@...hat.com,
 stefanha@...hat.com, miklos@...redi.hu, peterz@...radead.org,
 akpm@...ux-foundation.org, paulmck@...nel.org, thuth@...hat.com,
 rostedt@...dmis.org, bp@...en8.de, xiongwei.song@...driver.com,
 linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
 linux-block@...r.kernel.org, virtualization@...ts.linux.dev,
 linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH RFC v1 1/2] genirq/affinity: add support for limiting
 managed interrupts

On Fri, Nov 01 2024 at 11:03, mapicccy wrote:
>> 2024年10月31日 18:35,Thomas Gleixner <tglx@...utronix.de> 写道:
>>> +	get_nodes_in_cpumask(node_to_cpumask, premask, &nodemsk);
>>> +
>>> +	for_each_node_mask(n, nodemsk) {
>>> +		cpumask_and(&managed_irqs_cpumsk[n], &managed_irqs_cpumsk[n], premask);
>>> +		cpumask_and(&managed_irqs_cpumsk[n], &managed_irqs_cpumsk[n], node_to_cpumask[n]);
>> 
>> How is this managed_irqs_cpumsk array protected against concurrency?
>
> My intention was to allocate up to `managed_irq_per_node` cpu bits from `managed_irqs_cpumask[n]`,
> even if another task modifies some of the bits in the `managed_irqs_cpumask[n]` at the same time.

That may have been your intention, but how is this even remotely
correct?

Aside of that. If it's intentional and you think it's correct then you
should have documented that in the code and also annotated it to not
trigger santiziers.

>> Given the limitations of the x86 vector space, which is not going away
>> anytime soon, there are only two options IMO to handle such a scenario.
>> 
>>   1) Tell the nvme/block layer to disable queue affinity management
>> 
>>   2) Restrict the devices and queues to the nodes they sit on
>
> I have tried fixing this issue through nvme driver, but later
> discovered that the same issue exists with virtio net.  Therefore, I
> want to address this with a more general solution.

I understand, but a general solution for this problem won't exist
ever.

It's very reasonable to restrict this for one particular device type or
subsystem while maintaining the strict managed property for others, no?

General solutions are definitely preferred, but not for the price that
they break existing completely correct and working setups. Which is what
your 2/2 patch does for sure.

Thanks,

        tglx

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ