linux-kernel - Re: [PATCH 1/1] x86/vector: Fix vector leak during CPU offline

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <2c5fed84-bd3f-4f57-893d-59434e64b9c3@oracle.com>
Date: Wed, 22 May 2024 14:44:36 -0700
From: Dongli Zhang <dongli.zhang@...cle.com>
To: Thomas Gleixner <tglx@...utronix.de>, x86@...nel.org
Cc: mingo@...hat.com, dave.hansen@...ux.intel.com, hpa@...or.com,
        joe.jin@...cle.com, linux-kernel@...r.kernel.org,
        virtualization@...ts.linux.dev, Borislav Petkov <bp@...en8.de>
Subject: Re: [PATCH 1/1] x86/vector: Fix vector leak during CPU offline



On 5/21/24 5:00 AM, Thomas Gleixner wrote:
> On Wed, May 15 2024 at 12:51, Dongli Zhang wrote:
>> On 5/13/24 3:46 PM, Thomas Gleixner wrote:
>>> So yes, moving the invocation of irq_force_complete_move() before the
>>> irq_needs_fixup() call makes sense, but it wants this to actually work
>>> correctly:
>>> @@ -1097,10 +1098,11 @@ void irq_force_complete_move(struct irq_
>>>  		goto unlock;
>>>  
>>>  	/*
>>> -	 * If prev_vector is empty, no action required.
>>> +	 * If prev_vector is empty or the descriptor was previously
>>> +	 * not on the outgoing CPU no action required.
>>>  	 */
>>>  	vector = apicd->prev_vector;
>>> -	if (!vector)
>>> +	if (!vector || apicd->prev_cpu != smp_processor_id())
>>>  		goto unlock;
>>>  
>>
>> The above may not work. migrate_one_irq() relies on irq_force_complete_move() to
>> always reclaim the apicd->prev_vector. Otherwise, the call of
>> irq_do_set_affinity() later may return -EBUSY.
> 
> You're right. But that still can be handled in irq_force_complete_move()
> with a single unconditional invocation in migrate_one_irq():
> 
> 	cpu = smp_processor_id();
> 	if (!vector || (apicd->cur_cpu != cpu && apicd->prev_cpu != cpu))
> 		goto unlock;

The current affine is apicd->cpu :)

Thank you very much for the suggestion!

> 
> because there are only two cases when a cleanup is required:
> 
>    1) The outgoing CPU is the current target
> 
>    2) The outgoing CPU was the previous target
> 
> No?

I agree with this statement.

My only concern is: while we use "apicd->cpu", the irq_needs_fixup() uses a
different way. It uses d->common->effective_affinity or d->common->affinity to
decide whether to move forward to migrate the interrupt.

I have spent some time reading about the discussion that happened in the year
2017 (below link). According to my understanding,
CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK always relies on CONFIG_SMP, and we do not
have the chance to encounter the issue for x86.

https://lore.kernel.org/all/alpine.DEB.2.20.1710042208400.2406@nanos/T/#u

I have tested the new patch for a while and never encountered any issue.

Therefore, I will send v2.

Thank you very much for all suggestions!

Dongli Zhang