lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87ttvsftoc.ffs@tglx>
Date: Wed, 31 May 2023 16:43:15 +0200
From: Thomas Gleixner <tglx@...utronix.de>
To: Chuck Lever III <chuck.lever@...cle.com>
Cc: Eli Cohen <elic@...dia.com>, Leon Romanovsky <leon@...nel.org>, Saeed
 Mahameed <saeedm@...dia.com>, linux-rdma <linux-rdma@...r.kernel.org>,
 "open list:NETWORKING [GENERAL]" <netdev@...r.kernel.org>, Peter Zijlstra
 <peterz@...radead.org>
Subject: Re: system hang on start-up (mlx5?)

On Tue, May 30 2023 at 21:48, Chuck Lever III wrote:
>> On May 30, 2023, at 3:46 PM, Thomas Gleixner <tglx@...utronix.de> wrote:
>> Can you please add after the cpumask_copy() in that mlx5 code:
>> 
>>    pr_info("ONLINEBITS: %016lx\n", cpu_online_mask->bits[0]);
>>    pr_info("MASKBITS:   %016lx\n", af_desc.mask.bits[0]);
>
> Both are 0000 0000 0000 0fff, as expected on a system
> where 12 CPUs are present.

So the non-initialized mask on stack has the online bits correctly
copied and bits 12-63 are cleared, which is what cpumask_copy()
achieves by copying longs and not bits.

> [   71.273798][ T1185] irq_matrix_reserve_managed: MASKBITS: ffffb1a74686bcd8

How can that end up with a completely different content here?

As I said before that's clearly a direct map address.

So the call chain is:

mlx5_irq_alloc(af_desc)
  pci_msix_alloc_irq_at(af_desc)
    msi_domain_alloc_irq_at(af_desc)
      __msi_domain_alloc_irqs(af_desc)
1)      msidesc->affinity = kmemdup(af_desc);
        __irq_domain_alloc_irqs()
          __irq_domain_alloc_irqs(aff=msidesc->affinity)
            irq_domain_alloc_irqs_locked(aff)
              irq_domain_alloc_irqs_locked(aff)
                irq_domain_alloc_descs(aff)
                  alloc_desc(mask=&aff->mask)
                    desc_smp_init(mask)
2)                    cpumask_copy(desc->irq_common_data.affinity, mask);
                irq_domain_alloc_irqs_hierarchy()
                  msi_domain_alloc()
                    intel_irq_remapping_alloc()
                      x86_vector_alloc_irqs()
                        reserve_managed_vector()
                          mask = desc->irq_common_data.affinity;
                          irq_matrix_reserve_managed(mask)

So af_desc is kmemdup'ed at #1 and then the result is copied in #2.

Anything else just hands pointers around. So where gets either af_desc
or msidesc->affinity or desc->irq_common_data.affinity overwritten? Or
one of the pointers mangled. I doubt that it's the latter as this is 99%
generic code which would end up in random fails all over the place.

This also ends up in the wrong place. That mlx code does:

   af_desc.is_managed = false;

but the allocation ends up allocating a managed vector.

This screams memory corruption ....

Can you please instrument this along the call chain so we can see where
or at least when this gets corrupted? Please print the relevant pointer
addresses too so we can see whether that's consistent or not.

> The lowest 16 bits of MASKBITS are bcd8, or in binary:
>
> ... 1011 1100 1101 1000
>
> Starting from the low-order side: bits 3, 4, 6, 7, 10, 11, and
> 12, matching the CPU IDs from the loop. At bit 12, we fault,
> since there is no CPU 12 on the system.

Thats due to a dubious optimization from Linus:

#if NR_CPUS <= BITS_PER_LONG
  #define small_cpumask_bits ((unsigned int)NR_CPUS)
  #define large_cpumask_bits ((unsigned int)NR_CPUS)
#elif NR_CPUS <= 4*BITS_PER_LONG
  #define small_cpumask_bits nr_cpu_ids

small_cpumask_bits is not nr_cpu_ids(12), it's NR_CPUS(32) which is why
the loop does not terminate. Bah!

But that's just the symptom, not the root cause. That code is perfectly
fine when all callers use the proper cpumask functions.

Thanks,

        tglx




Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ