linux-kernel - Re: Question related to patch: x86/irq: Remove x86_io_apic_ops.set

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <55EED805.7060902@linux.intel.com>
Date:	Tue, 8 Sep 2015 20:43:49 +0800
From:	Jiang Liu <jiang.liu@...ux.intel.com>
To:	Mika Westerberg <mika.westerberg@...ux.intel.com>,
	Thomas Gleixner <tglx@...utronix.de>
Cc:	jarkko.nikula@...ux.intel.com, linux-kernel@...r.kernel.org
Subject: Re: Question related to patch: x86/irq: Remove
 x86_io_apic_ops.set_affinity and related interfaces

On 2015/9/8 18:40, Mika Westerberg wrote:
> Hi Jiang and Thomas,
> 
> With recent kernels I'm getting crash when a GPIO interrupt triggers. For
> unknown reasons I'm not able to capture the crash on the serial console so I
> did following change to irq_move_masked_irq():
> 
> 	assert_raw_spin_locked(&desc->lock);
> 
> to
> 
> 	WARN_ON(!raw_spin_is_locked(&desc->lock));
> 
> The backtrace looks like:
> 
> [    6.733640] WARNING: CPU: 0 PID: 5 at kernel/irq/migration.c:32 irq_move_masked_irq+0xb8/0xc0() 
> [    6.768765] Modules linked in: x86_pkg_temp_thermal i2c_hid(+)
> [    6.775425] CPU: 0 PID: 5 Comm: kworker/0:0H Not tainted 4.2.0+ #124
> [    6.782639]  ffffffff81747e59 ffff8801c3c03e18 ffffffff815d6165 0000000000000000
> [    6.791110]  ffff8801c3c03e50 ffffffff81050c9d ffff8801bf00a600 ffff8801beb594c0
> [    6.799540]  ffff8801beb594c0 0000000000000002 0000000042000000 ffff8801c3c03e60
> [    6.807989] Call Trace:
> [    6.810755]  <IRQ>  [<ffffffff815d6165>] dump_stack+0x4e/0x79
> [    6.817307]  [<ffffffff81050c9d>] warn_slowpath_common+0x7d/0xb0
> [    6.824121]  [<ffffffff81050d85>] warn_slowpath_null+0x15/0x20
> [    6.830736]  [<ffffffff810a0a88>] irq_move_masked_irq+0xb8/0xc0
> [    6.837450]  [<ffffffff8103c161>] ioapic_ack_level+0x111/0x130
> [    6.844079]  [<ffffffff812bbfe8>] intel_gpio_irq_handler+0x148/0x1c0
> [    6.851306]  [<ffffffff81006bfb>] handle_irq+0xab/0x180
> [    6.857235]  [<ffffffff812a8a87>] ? debug_smp_processor_id+0x17/0x20
> [    6.864440]  [<ffffffff810063c2>] do_IRQ+0x52/0xe0
> [    6.869877]  [<ffffffff815dcfbf>] common_interrupt+0x7f/0x7f
> [    6.876292]  <EOI>  [<ffffffff815dbf6d>] ? _raw_write_unlock_irq+0xd/0x30
> [    6.884008]  [<ffffffff815dbf99>] _raw_spin_unlock_irq+0x9/0x10
> [    6.890721]  [<ffffffff810745bc>] finish_task_switch+0x7c/0x1a0
> [    6.897438]  [<ffffffff815d7b9b>] __schedule+0x33b/0xb20
> [    6.903461]  [<ffffffff8107cc77>] ? __enqueue_entity+0x67/0x70
> [    6.910079]  [<ffffffff815d8408>] schedule+0x38/0x90
> [    6.915711]  [<ffffffff815db3a8>] schedule_timeout+0x158/0x250
> [    6.922328]  [<ffffffff812a8a87>] ? debug_smp_processor_id+0x17/0x20
> [    6.929534]  [<ffffffff815d933f>] wait_for_completion_killable+0xaf/0x220
> [    6.937231]  [<ffffffff81076d80>] ? wake_up_q+0x60/0x60
> [    6.943158]  [<ffffffff810682d0>] ? process_one_work+0x4a0/0x4a0
> [    6.949970]  [<ffffffff8106d5c2>] kthread_create_on_node+0xd2/0x170
> [    6.957079]  [<ffffffff8129ac39>] ? snprintf+0x39/0x40
> [    6.962907]  [<ffffffff810665b9>] create_worker+0xb9/0x190
> [    6.969130]  [<ffffffff810685d3>] worker_thread+0x303/0x4b0
> [    6.975452]  [<ffffffff810682d0>] ? process_one_work+0x4a0/0x4a0
> [    6.982262]  [<ffffffff8106d8bf>] kthread+0xcf/0xf0
> [    6.987797]  [<ffffffff815dbf99>] ? _raw_spin_unlock_irq+0x9/0x10
> [    6.994722]  [<ffffffff8106d7f0>] ? kthread_worker_fn+0x190/0x190
> [    7.001636]  [<ffffffff815dc86f>] ret_from_fork+0x3f/0x70
> [    7.007765]  [<ffffffff8106d7f0>] ? kthread_worker_fn+0x190/0x190
> 
> The driver in question is drivers/pinctrl/intel/pinctrl-intel.c and the
> corresponding code which triggers this is below:
> 
> static void intel_gpio_irq_handler(unsigned irq, struct irq_desc *desc)
> {
>         struct gpio_chip *gc = irq_desc_get_handler_data(desc);
>         struct intel_pinctrl *pctrl = gpiochip_to_pinctrl(gc);
>         struct irq_chip *chip = irq_desc_get_chip(desc);
>         int i;
> 
>         chained_irq_enter(chip, desc);
> 
>         /* Need to check all communities for pending interrupts */
>         for (i = 0; i < pctrl->ncommunities; i++)
>                 intel_gpio_community_irq_handler(gc, &pctrl->communities[i]);
> 
>         chained_irq_exit(chip, desc);
> }
> 
> So once chained_irq_exit() is called, it in turn calls chip->irq_eoi()
> which in this case is ioapic_ack_level().
> 
> I'm sure such crash did not happen when pinctrl-intel.c was developed so I
> started to investigate what might have changed. Note that it may be the
> driver does something wrong and the crash is expected. However, many other
> drivers do pretty much the same (with the exception that the parent IRQ
> chip is not IO-APIC in most cases).
> 
> To me assert_raw_spin_locked(&desc->lock) looks valid as the function goes
> and modifies desc right after the check. Also in intel_gpio_irq_handler()
> desc->lock is not locked (not sure if it needs to be).
> 
> After "manual" bisection I found the commit that causes the crash to be
> triggered:
> 
> 	commit aa5cb97f14a2dd5aefabed6538c35ebc087d7c24
> 	Author: Jiang Liu <jiang.liu@...ux.intel.com>
> 	Date:   Tue Apr 14 10:29:42 2015 +0800
> 
> 	    x86/irq: Remove x86_io_apic_ops.set_affinity and related interfaces
> 	    
> 	    Now there is no user of x86_io_apic_ops.set_affinity anymore, so remove
> 	    it.
> 
> It says that there are no users for x86_io_apic_ops.set_affinity but then
> it does this:
> 
> -               x86_io_apic_ops.set_affinity(idata, mask, false);
> +               irq_set_affinity(irq, mask);
> 
> The difference is that x86_io_apic_ops.set_affinity() programs affinity
> directly to the hardware (if I understand it right) but irq_set_affinity()
> calls irqd_set_move_pending() which defers programming the hardware later.

Hi Mika,
	I feel this is the root cause and will investigate it tomorrow.
Thanks!
Gerry

> 
> Now when an interrupt triggers we end up calling irq_move_masked_irq() with
> unlocked descriptor.
> 
> Without the above change we never do that and the crash does not happen.
> 
> Since I don't know much about genirq and IO-APIC code in particular, I
> would like to get your input on how to solve the problem. Do I need to
> change the pinctrl driver somehow to lock the descriptor or maybe the
> commit above misses something?
> 
> It is really easy to trigger so please let me know if further debugging is
> needed.
> 
> Thanks in advance.
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/