[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <2c5fed84-bd3f-4f57-893d-59434e64b9c3@oracle.com>
Date: Wed, 22 May 2024 14:44:36 -0700
From: Dongli Zhang <dongli.zhang@...cle.com>
To: Thomas Gleixner <tglx@...utronix.de>, x86@...nel.org
Cc: mingo@...hat.com, dave.hansen@...ux.intel.com, hpa@...or.com,
joe.jin@...cle.com, linux-kernel@...r.kernel.org,
virtualization@...ts.linux.dev, Borislav Petkov <bp@...en8.de>
Subject: Re: [PATCH 1/1] x86/vector: Fix vector leak during CPU offline
On 5/21/24 5:00 AM, Thomas Gleixner wrote:
> On Wed, May 15 2024 at 12:51, Dongli Zhang wrote:
>> On 5/13/24 3:46 PM, Thomas Gleixner wrote:
>>> So yes, moving the invocation of irq_force_complete_move() before the
>>> irq_needs_fixup() call makes sense, but it wants this to actually work
>>> correctly:
>>> @@ -1097,10 +1098,11 @@ void irq_force_complete_move(struct irq_
>>> goto unlock;
>>>
>>> /*
>>> - * If prev_vector is empty, no action required.
>>> + * If prev_vector is empty or the descriptor was previously
>>> + * not on the outgoing CPU no action required.
>>> */
>>> vector = apicd->prev_vector;
>>> - if (!vector)
>>> + if (!vector || apicd->prev_cpu != smp_processor_id())
>>> goto unlock;
>>>
>>
>> The above may not work. migrate_one_irq() relies on irq_force_complete_move() to
>> always reclaim the apicd->prev_vector. Otherwise, the call of
>> irq_do_set_affinity() later may return -EBUSY.
>
> You're right. But that still can be handled in irq_force_complete_move()
> with a single unconditional invocation in migrate_one_irq():
>
> cpu = smp_processor_id();
> if (!vector || (apicd->cur_cpu != cpu && apicd->prev_cpu != cpu))
> goto unlock;
The current affine is apicd->cpu :)
Thank you very much for the suggestion!
>
> because there are only two cases when a cleanup is required:
>
> 1) The outgoing CPU is the current target
>
> 2) The outgoing CPU was the previous target
>
> No?
I agree with this statement.
My only concern is: while we use "apicd->cpu", the irq_needs_fixup() uses a
different way. It uses d->common->effective_affinity or d->common->affinity to
decide whether to move forward to migrate the interrupt.
I have spent some time reading about the discussion that happened in the year
2017 (below link). According to my understanding,
CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK always relies on CONFIG_SMP, and we do not
have the chance to encounter the issue for x86.
https://lore.kernel.org/all/alpine.DEB.2.20.1710042208400.2406@nanos/T/#u
I have tested the new patch for a while and never encountered any issue.
Therefore, I will send v2.
Thank you very much for all suggestions!
Dongli Zhang
Powered by blists - more mailing lists