lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150908104048.GB12118@lahna.fi.intel.com>
Date:	Tue, 8 Sep 2015 13:40:48 +0300
From:	Mika Westerberg <mika.westerberg@...ux.intel.com>
To:	Jiang Liu <jiang.liu@...ux.intel.com>,
	Thomas Gleixner <tglx@...utronix.de>
Cc:	jarkko.nikula@...ux.intel.com, linux-kernel@...r.kernel.org
Subject: Question related to patch: x86/irq: Remove
 x86_io_apic_ops.set_affinity and related interfaces

Hi Jiang and Thomas,

With recent kernels I'm getting crash when a GPIO interrupt triggers. For
unknown reasons I'm not able to capture the crash on the serial console so I
did following change to irq_move_masked_irq():

	assert_raw_spin_locked(&desc->lock);

to

	WARN_ON(!raw_spin_is_locked(&desc->lock));

The backtrace looks like:

[    6.733640] WARNING: CPU: 0 PID: 5 at kernel/irq/migration.c:32 irq_move_masked_irq+0xb8/0xc0() 
[    6.768765] Modules linked in: x86_pkg_temp_thermal i2c_hid(+)
[    6.775425] CPU: 0 PID: 5 Comm: kworker/0:0H Not tainted 4.2.0+ #124
[    6.782639]  ffffffff81747e59 ffff8801c3c03e18 ffffffff815d6165 0000000000000000
[    6.791110]  ffff8801c3c03e50 ffffffff81050c9d ffff8801bf00a600 ffff8801beb594c0
[    6.799540]  ffff8801beb594c0 0000000000000002 0000000042000000 ffff8801c3c03e60
[    6.807989] Call Trace:
[    6.810755]  <IRQ>  [<ffffffff815d6165>] dump_stack+0x4e/0x79
[    6.817307]  [<ffffffff81050c9d>] warn_slowpath_common+0x7d/0xb0
[    6.824121]  [<ffffffff81050d85>] warn_slowpath_null+0x15/0x20
[    6.830736]  [<ffffffff810a0a88>] irq_move_masked_irq+0xb8/0xc0
[    6.837450]  [<ffffffff8103c161>] ioapic_ack_level+0x111/0x130
[    6.844079]  [<ffffffff812bbfe8>] intel_gpio_irq_handler+0x148/0x1c0
[    6.851306]  [<ffffffff81006bfb>] handle_irq+0xab/0x180
[    6.857235]  [<ffffffff812a8a87>] ? debug_smp_processor_id+0x17/0x20
[    6.864440]  [<ffffffff810063c2>] do_IRQ+0x52/0xe0
[    6.869877]  [<ffffffff815dcfbf>] common_interrupt+0x7f/0x7f
[    6.876292]  <EOI>  [<ffffffff815dbf6d>] ? _raw_write_unlock_irq+0xd/0x30
[    6.884008]  [<ffffffff815dbf99>] _raw_spin_unlock_irq+0x9/0x10
[    6.890721]  [<ffffffff810745bc>] finish_task_switch+0x7c/0x1a0
[    6.897438]  [<ffffffff815d7b9b>] __schedule+0x33b/0xb20
[    6.903461]  [<ffffffff8107cc77>] ? __enqueue_entity+0x67/0x70
[    6.910079]  [<ffffffff815d8408>] schedule+0x38/0x90
[    6.915711]  [<ffffffff815db3a8>] schedule_timeout+0x158/0x250
[    6.922328]  [<ffffffff812a8a87>] ? debug_smp_processor_id+0x17/0x20
[    6.929534]  [<ffffffff815d933f>] wait_for_completion_killable+0xaf/0x220
[    6.937231]  [<ffffffff81076d80>] ? wake_up_q+0x60/0x60
[    6.943158]  [<ffffffff810682d0>] ? process_one_work+0x4a0/0x4a0
[    6.949970]  [<ffffffff8106d5c2>] kthread_create_on_node+0xd2/0x170
[    6.957079]  [<ffffffff8129ac39>] ? snprintf+0x39/0x40
[    6.962907]  [<ffffffff810665b9>] create_worker+0xb9/0x190
[    6.969130]  [<ffffffff810685d3>] worker_thread+0x303/0x4b0
[    6.975452]  [<ffffffff810682d0>] ? process_one_work+0x4a0/0x4a0
[    6.982262]  [<ffffffff8106d8bf>] kthread+0xcf/0xf0
[    6.987797]  [<ffffffff815dbf99>] ? _raw_spin_unlock_irq+0x9/0x10
[    6.994722]  [<ffffffff8106d7f0>] ? kthread_worker_fn+0x190/0x190
[    7.001636]  [<ffffffff815dc86f>] ret_from_fork+0x3f/0x70
[    7.007765]  [<ffffffff8106d7f0>] ? kthread_worker_fn+0x190/0x190

The driver in question is drivers/pinctrl/intel/pinctrl-intel.c and the
corresponding code which triggers this is below:

static void intel_gpio_irq_handler(unsigned irq, struct irq_desc *desc)
{
        struct gpio_chip *gc = irq_desc_get_handler_data(desc);
        struct intel_pinctrl *pctrl = gpiochip_to_pinctrl(gc);
        struct irq_chip *chip = irq_desc_get_chip(desc);
        int i;

        chained_irq_enter(chip, desc);

        /* Need to check all communities for pending interrupts */
        for (i = 0; i < pctrl->ncommunities; i++)
                intel_gpio_community_irq_handler(gc, &pctrl->communities[i]);

        chained_irq_exit(chip, desc);
}

So once chained_irq_exit() is called, it in turn calls chip->irq_eoi()
which in this case is ioapic_ack_level().

I'm sure such crash did not happen when pinctrl-intel.c was developed so I
started to investigate what might have changed. Note that it may be the
driver does something wrong and the crash is expected. However, many other
drivers do pretty much the same (with the exception that the parent IRQ
chip is not IO-APIC in most cases).

To me assert_raw_spin_locked(&desc->lock) looks valid as the function goes
and modifies desc right after the check. Also in intel_gpio_irq_handler()
desc->lock is not locked (not sure if it needs to be).

After "manual" bisection I found the commit that causes the crash to be
triggered:

	commit aa5cb97f14a2dd5aefabed6538c35ebc087d7c24
	Author: Jiang Liu <jiang.liu@...ux.intel.com>
	Date:   Tue Apr 14 10:29:42 2015 +0800

	    x86/irq: Remove x86_io_apic_ops.set_affinity and related interfaces
	    
	    Now there is no user of x86_io_apic_ops.set_affinity anymore, so remove
	    it.

It says that there are no users for x86_io_apic_ops.set_affinity but then
it does this:

-               x86_io_apic_ops.set_affinity(idata, mask, false);
+               irq_set_affinity(irq, mask);

The difference is that x86_io_apic_ops.set_affinity() programs affinity
directly to the hardware (if I understand it right) but irq_set_affinity()
calls irqd_set_move_pending() which defers programming the hardware later.

Now when an interrupt triggers we end up calling irq_move_masked_irq() with
unlocked descriptor.

Without the above change we never do that and the crash does not happen.

Since I don't know much about genirq and IO-APIC code in particular, I
would like to get your input on how to solve the problem. Do I need to
change the pinctrl driver somehow to lock the descriptor or maybe the
commit above misses something?

It is really easy to trigger so please let me know if further debugging is
needed.

Thanks in advance.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ