lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20170824074233.GB20225@lamia.panaceas.james.local>
Date:   Thu, 24 Aug 2017 08:42:33 +0100
From:   James <lkml@...ingley.org>
To:     linux-kernel@...r.kernel.org
Subject: regmap irqs cause oops when hotpluging a cpu


I've been running linux on Intel cherry-trail devices and discovered that I
couldn't get into S4. Debugging I found that the system would hang if I
attempted to hotplug a CPU (as is done in the final UP to SMP transition in
S4).

The simplest repro was:

echo 0 > /sys/devices/system/cpu/cpu1/online
echo 1 > /sys/devices/system/cpu/cpu1/online

looking closer it was exploding in 

static void __setup_vector_irq(int cpu)
{
         struct apic_chip_data *data;
         struct irq_desc *desc;
         int irq, vector;

         /* Mark the inuse vectors */
         for_each_irq_desc(irq, desc) {
                 struct irq_data *idata = irq_desc_get_irq_data(desc);

                 data = apic_chip_data(idata);
>>>>             if (!data || !cpumask_test_cpu(cpu, data->domain))
                         continue; 

because data->domain wasn't a valid pointer. (setup_vector_irq is called to
setup the vectors of the LAPIC on the new cpu.)

digging, it appears that  the (l)apic code assumes that all interrupt domain
hierarchies end up at a lapic (since that's the only way an interrupt gets
delivered on x86)

to that end it does, for every interrupt on the platform:

static struct apic_chip_data *apic_chip_data(struct irq_data *irq_data)
{
        if (!irq_data)
                return NULL;

        while (irq_data->parent_data)
                irq_data = irq_data->parent_data;

        return irq_data->chip_data;
}

Irq_data->chip_data is private void pointer for the use of the irq chip, but
this code doesn't bother checking that irq_data->chip is &lapic_controller
(it's own chip) because it must be true by plumbing [it assumes walking the
parents from any interrupt must end up at a lapic].

parent_data is meant to get set from domain->parent when the interrupt is
allocated in irq_domain_alloc_irq_data.

but for lots of the interrupts on cherry trail either domain->parent is unset
or the allocation of the irq doesn't follow this path. As a result the code
above explodes. (For example there are lots of regmap irq interrupts, and
regmap_add_irq_chip calls irq_domain_add_linear rather than
irq_domain_add_hierarchy which doesn't set parent)

I'm not sure what the right way(tm) to fix this is - as the place in irqdomain
where the interrupt is mapped likely doesn't know anything about the parent
irq. I expect all calls to add_linear need replacing, and the logic in
irqdomain.c needs to write ->parent_data. 

[For a quick hack, perhaps the winning plan is to make setup_vector_irq just
put in a generic cpu bitmap for anything that isn't a lapic interrupt.] 

James.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ