[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1651b43b-3230-4404-b65c-50de6d971d74@amd.com>
Date: Tue, 12 Aug 2025 15:36:07 -0400
From: Jason Andryuk <jason.andryuk@....com>
To: Andrew Cooper <andrew.cooper3@...rix.com>, Juergen Gross
<jgross@...e.com>, Stefano Stabellini <sstabellini@...nel.org>, Oleksandr
Tyshchenko <oleksandr_tyshchenko@...m.com>, Chris Wright
<chrisw@...s-sol.org>, Jeremy Fitzhardinge <jeremy@...source.com>,
Christopher Clark <christopher.w.clark@...il.com>, Daniel Smith
<dpsmith@...rtussolutions.com>
CC: <stable@...r.kernel.org>, <xen-devel@...ts.xenproject.org>,
<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] xen/events: Fix Global and Domain VIRQ tracking
On 2025-08-12 15:10, Andrew Cooper wrote:
> On 12/08/2025 8:00 pm, Jason Andryuk wrote:
>> VIRQs come in 3 flavors, per-VPU, per-domain, and global. The existing
>> tracking of VIRQs is handled by per-cpu variables virq_to_irq.
>>
>> The issue is that bind_virq_to_irq() sets the per_cpu virq_to_irq at
>> registration time - typically CPU 0. Later, the interrupt can migrate,
>> and info->cpu is updated. When calling unbind_from_irq(), the per-cpu
>> virq_to_irq is cleared for a different cpu. If bind_virq_to_irq() is
>> called again with CPU 0, the stale irq is returned.
>>
>> Change the virq_to_irq tracking to use CPU 0 for per-domain and global
>> VIRQs. As there can be at most one of each, there is no need for
>> per-vcpu tracking. Also, per-domain and global VIRQs need to be
>> registered on CPU 0 and can later move, so this matches the expectation.
>>
>> Fixes: e46cdb66c8fc ("xen: event channels")
>> Cc: stable@...r.kernel.org
>> Signed-off-by: Jason Andryuk <jason.andryuk@....com>
>> ---
>> Fixes is the introduction of the virq_to_irq per-cpu array.
>>
>> This was found with the out-of-tree argo driver during suspend/resume.
>> On suspend, the per-domain VIRQ_ARGO is unbound. On resume, the driver
>> attempts to bind VIRQ_ARGO. The stale irq is returned, but the
>> WARN_ON(info == NULL || info->type != IRQT_VIRQ) in bind_virq_to_irq()
>> triggers for NULL info. The bind fails and execution continues with the
>> driver trying to clean up by unbinding. This eventually faults over the
>> NULL info.
>
> I don't think the Fixes: tag is entirely appropriate.
>
> per-domain VIRQs were created (unexpectedly) by the merge of ARGO into
> Xen. It was during some unrelated cleanup that this was noticed and
> bugfixed into working. i.e. the ARGO VIRQ is the singular weird one here.
>
> In Xen we did accept that per-domain VIRQs now exist; they had for
> several releases before we realised.
AFAICT, global VIRQs have the same issue - I think they just aren't
unbound and rebound. I just happened to trigger this with the
per-domain ARGO VIRQ.
I double checked, and the cpu is updated like so:
set_affinity_irq()
rebind_irq_to_cpu()
bind_evtchn_to_cpu()
cpu_evtchn[chn] = cpu;
unbind_from_irq()
case IRQT_VIRQ:
per_cpu(virq_to_irq, cpu_from_evtchn(evtchn))
[index_from_irq(irq)] = -1;
cpu_from_evtchn()
return cpu_evtchn[evtchn];
So global VIRQs were mis-tracked even in the Fixes commit.
Regards,
Jason
>> ---
>> drivers/xen/events/events_base.c | 17 ++++++++++++++++-
>> 1 file changed, 16 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
>> index 41309d38f78c..a27e4d7f061e 100644
>> --- a/drivers/xen/events/events_base.c
>> +++ b/drivers/xen/events/events_base.c
>> @@ -159,7 +159,19 @@ static DEFINE_MUTEX(irq_mapping_update_lock);
>>
>> static LIST_HEAD(xen_irq_list_head);
>>
>> -/* IRQ <-> VIRQ mapping. */
>> +static bool is_per_vcpu_virq(int virq) {
>> + switch (virq) {
>> + case VIRQ_TIMER:
>> + case VIRQ_DEBUG:
>> + case VIRQ_XENOPROF:
>> + case VIRQ_XENPMU:
>> + return true;
>> + default:
>> + return false;
>> + }
>> +}
>> +
>> +/* IRQ <-> VIRQ mapping. Global/Domain virqs are tracked in cpu 0. */
>> static DEFINE_PER_CPU(int [NR_VIRQS], virq_to_irq) = {[0 ... NR_VIRQS-1] = -1};
>>
>> /* IRQ <-> IPI mapping */
>> @@ -974,6 +986,9 @@ static void __unbind_from_irq(struct irq_info *info, unsigned int irq)
>>
>> switch (info->type) {
>> case IRQT_VIRQ:
>> + if (!is_per_vcpu_virq(virq_from_irq(info)))
>> + cpu = 0;
>> +
>> per_cpu(virq_to_irq, cpu)[virq_from_irq(info)] = -1;
>> break;
>> case IRQT_IPI:
>
Powered by blists - more mailing lists