[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <7AECA9A6-1F7D-4F82-A4C1-83616D4D6566@oracle.com>
Date: Wed, 31 May 2023 18:52:39 +0000
From: Chuck Lever III <chuck.lever@...cle.com>
To: Thomas Gleixner <tglx@...utronix.de>
CC: Eli Cohen <elic@...dia.com>, Leon Romanovsky <leon@...nel.org>,
Saeed
Mahameed <saeedm@...dia.com>,
linux-rdma <linux-rdma@...r.kernel.org>,
"open
list:NETWORKING [GENERAL]" <netdev@...r.kernel.org>,
Peter Zijlstra
<peterz@...radead.org>
Subject: Re: system hang on start-up (mlx5?)
> On May 31, 2023, at 1:11 PM, Thomas Gleixner <tglx@...utronix.de> wrote:
>
> On Wed, May 31 2023 at 15:06, Chuck Lever III wrote:
>>> On May 31, 2023, at 10:43 AM, Thomas Gleixner <tglx@...utronix.de> wrote:
>>>
>>> mlx5_irq_alloc(af_desc)
>>> pci_msix_alloc_irq_at(af_desc)
>>> msi_domain_alloc_irq_at(af_desc)
>>> __msi_domain_alloc_irqs(af_desc)
>>> 1) msidesc->affinity = kmemdup(af_desc);
>>> __irq_domain_alloc_irqs()
>>> __irq_domain_alloc_irqs(aff=msidesc->affinity)
>>> irq_domain_alloc_irqs_locked(aff)
>>> irq_domain_alloc_irqs_locked(aff)
>>> irq_domain_alloc_descs(aff)
>>> alloc_desc(mask=&aff->mask)
>>> desc_smp_init(mask)
>>> 2) cpumask_copy(desc->irq_common_data.affinity, mask);
>>> irq_domain_alloc_irqs_hierarchy()
>>> msi_domain_alloc()
>>> intel_irq_remapping_alloc()
>>> x86_vector_alloc_irqs()
>>
>> It is x86_vector_alloc_irqs() where the struct irq_data is
>> fabricated that ends up in irq_matrix_reserve_managed().
>
> Kinda fabricated :)
>
> irqd = irq_domain_get_irq_data(domain, virq + i);
>
> Thats finding the irqdata which is associated to the vector domain. That
> has been allocated earlier. The affinity mask is retrieved via:
>
> const struct cpumask *affmsk = irq_data_get_affinity_mask(irqd);
>
> which does:
>
> return irqd->common->affinity;
>
> irqd->common points to desc->irq_common_data. The affinity there was
> copied in #2 above.
>
>>> This also ends up in the wrong place. That mlx code does:
>>>
>>> af_desc.is_managed = false;
>>>
>>> but the allocation ends up allocating a managed vector.
>>
>> That line was changed in 6.4-rc4 to address another bug,
>> and it avoids the crash by not calling into the misbehaving
>> code. It doesn't address the mlx5_core initialization issue
>> though, because as I said before, execution continues and
>> crashes in a similar scenario later on.
>
> Ok.
>
>> On my system, I've reverted that fix:
>>
>> - af_desc.is_managed = false;
>> + af_desc.is_managed = 1;
>>
>> so that we can continue debugging this crash.
>
> Ah.
>
>>> Can you please instrument this along the call chain so we can see where
>>> or at least when this gets corrupted? Please print the relevant pointer
>>> addresses too so we can see whether that's consistent or not.
>>
>> I will continue working on this today.
>>> But that's just the symptom, not the root cause. That code is perfectly
>>> fine when all callers use the proper cpumask functions.
>>
>> Agreed: we're crashing here because of the extra bits
>> in the affinity mask, but those bits should not be set
>> in the first place.
>
> Correct.
>
>> I wasn't sure if for_each_cpu() was supposed to iterate
>> into non-present CPUs -- and I guess the answer
>> is yes, it will iterate the full length of the mask.
>> The caller is responsible for ensuring the mask is valid.
>
> Yes, that's the assumption of this constant optimization for the small
> number of CPUs case. All other cases use nr_cpu_ids as limit and won't
> go into non-possible CPUs. I didn't spot it yesterday night either.
This addresses the problem for me with both is_managed = 1
and is_managed = false:
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c b/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
index db5687d9fec9..bcf5df316c8f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
@@ -570,11 +570,11 @@ int mlx5_irqs_request_vectors(struct mlx5_core_dev *dev, u16 *cpus, int nirqs,
af_desc.is_managed = false;
for (i = 0; i < nirqs; i++) {
+ cpumask_clear(&af_desc.mask);
cpumask_set_cpu(cpus[i], &af_desc.mask);
irq = mlx5_irq_request(dev, i + 1, &af_desc, rmap);
if (IS_ERR(irq))
break;
- cpumask_clear(&af_desc.mask);
irqs[i] = irq;
}
If you agree this looks reasonable, I can package it with a
proper patch description and send it to Eli and Saeed.
--
Chuck Lever
Powered by blists - more mailing lists