[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1AE640813FDE7649BE1B193DEA596E88026A36FB@SHSMSX101.ccr.corp.intel.com>
Date: Wed, 19 Nov 2014 08:55:25 +0000
From: "Zheng, Lv" <lv.zheng@...el.com>
To: "Rafael J. Wysocki" <rjw@...ysocki.net>,
"Kirill A. Shutemov" <kirill@...temov.name>
CC: "Wysocki, Rafael J" <rafael.j.wysocki@...el.com>,
"Brown, Len" <len.brown@...el.com>, Lv Zheng <zetalog@...il.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-acpi@...r.kernel.org" <linux-acpi@...r.kernel.org>
Subject: RE: [PATCH 1/6] ACPI/EC: Introduce STARTED/STOPPED flags to replace
BLOCKED flag.
Hi, Rafael
I think you know this issue.
[PATCH 1] can trigger this dead lock because it is actually based on another GPE dead lock fixing series.
I have fixed the dead lock in acpi_ev_gpe_detect() or acpi_ev_gpe_dispatch().
The problem is they haven't been upstreamed to ACPICA, so I couldn't post them here.
I was thinking we can work this around by applying the acpi_os_wait_events_complete() enhancement support prior than applying this because it can only happen in suspend.
But it seems this can also be triggered during boot.
So we can have 3 choices here in order to merge this series:
1. Merging the GPE dead lock fix before it is merged in the ACPICA upstream.
2. Changing [PATCH 1] and do not hold EC lock currently (though it is racy, it is currently racy).
3. Reverting [PATCH 1-4] and wait until GPE dead lock fixed in ACPICA upstream.
Which one do you prefer?
IMO, we have several issues, their fixes form a dependency circle:
1. GPE dead lock: it may depends on DISPATCH_METHOD flushing (we shouldn't bump enabling status up in acpi_ev_asynch_enable_gpe())
2. EC transaction flushing: it depends on the GPE dead lock
3. EC event polling: it depends on the EC transaction flushing, this is required to support EC event draining as mentioned in bugzilla 44161
4. DISPATCH_METHOD flushing: it depends on EC event polling, if we don't move EC query from the _Lxx/_Exx work queue, then it may block DISPATCH_METHOD flushing.
So it seems we need to determine which one should be merged first.
IMO, the GPE dead lock fix is the most basic one.
Thanks and best regards
-Lv
> From: Rafael J. Wysocki [mailto:rjw@...ysocki.net]
> Sent: Wednesday, November 19, 2014 5:20 AM
> To: Kirill A. Shutemov
> Cc: Zheng, Lv; Wysocki, Rafael J; Brown, Len; Lv Zheng; linux-kernel@...r.kernel.org; linux-acpi@...r.kernel.org
> Subject: Re: [PATCH 1/6] ACPI/EC: Introduce STARTED/STOPPED flags to replace BLOCKED flag.
>
> On Tuesday, November 18, 2014 03:23:28 PM Kirill A. Shutemov wrote:
> > On Wed, Nov 05, 2014 at 02:52:36AM +0000, Zheng, Lv wrote:
>
> [cut]
>
> >
> > Here's lockdep warning I see on -next:
>
> Is patch [1/6] sufficient to trigger this or do you need all [1-4/6]?
>
>
> > [ 0.510159] ======================================================
> > [ 0.510171] [ INFO: possible circular locking dependency detected ]
> > [ 0.510185] 3.18.0-rc4-next-20141117-07404-g9dad2ab6df8b #66 Not tainted
> > [ 0.510197] -------------------------------------------------------
> > [ 0.510209] swapper/3/0 is trying to acquire lock:
> > [ 0.510219] (&(&ec->lock)->rlock){-.....}, at: [<ffffffff814d533e>] acpi_ec_gpe_handler+0x21/0xfc
> > [ 0.510254]
> > [ 0.510254] but task is already holding lock:
> > [ 0.510266] (&(*(&acpi_gbl_gpe_lock))->rlock){-.....}, at: [<ffffffff814cd67e>] acpi_os_acquire_lock+0xe/0x10
> > [ 0.510296]
> > [ 0.510296] which lock already depends on the new lock.
> > [ 0.510296]
> > [ 0.510312]
> > [ 0.510312] the existing dependency chain (in reverse order) is:
> > [ 0.510327]
> > [ 0.510327] -> #1 (&(*(&acpi_gbl_gpe_lock))->rlock){-.....}:
> > [ 0.510344] [<ffffffff81158f4f>] lock_acquire+0xdf/0x2d0
> > [ 0.510364] [<ffffffff81b08010>] _raw_spin_lock_irqsave+0x50/0x70
> > [ 0.510381] [<ffffffff814cd67e>] acpi_os_acquire_lock+0xe/0x10
> > [ 0.510398] [<ffffffff814e31e8>] acpi_enable_gpe+0x22/0x68
> > [ 0.510416] [<ffffffff814d5b24>] acpi_ec_start+0x66/0x87
> > [ 0.510432] [<ffffffff81afc771>] ec_install_handlers+0x41/0xa4
> > [ 0.510449] [<ffffffff823e72b9>] acpi_ec_ecdt_probe+0x1a9/0x1ea
> > [ 0.510466] [<ffffffff823e6ae3>] acpi_init+0x8b/0x26e
> > [ 0.510480] [<ffffffff81002148>] do_one_initcall+0xd8/0x210
> > [ 0.510496] [<ffffffff8239f1dc>] kernel_init_freeable+0x1f5/0x282
> > [ 0.510513] [<ffffffff81af1a1e>] kernel_init+0xe/0xf0
> > [ 0.510527] [<ffffffff81b08cfc>] ret_from_fork+0x7c/0xb0
> > [ 0.510542]
> > [ 0.510542] -> #0 (&(&ec->lock)->rlock){-.....}:
> > [ 0.510558] [<ffffffff811585ef>] __lock_acquire+0x210f/0x2220
> > [ 0.510574] [<ffffffff81158f4f>] lock_acquire+0xdf/0x2d0
> > [ 0.510589] [<ffffffff81b08010>] _raw_spin_lock_irqsave+0x50/0x70
> > [ 0.510604] [<ffffffff814d533e>] acpi_ec_gpe_handler+0x21/0xfc
> > [ 0.510620] [<ffffffff814e02c2>] acpi_ev_gpe_dispatch+0xd2/0x143
> > [ 0.510636] [<ffffffff814e03fb>] acpi_ev_gpe_detect+0xc8/0x10f
> > [ 0.510652] [<ffffffff814e23b6>] acpi_ev_sci_xrupt_handler+0x22/0x38
> > [ 0.510669] [<ffffffff814cc8ee>] acpi_irq+0x16/0x31
> > [ 0.510684] [<ffffffff8116eccf>] handle_irq_event_percpu+0x6f/0x540
> > [ 0.510702] [<ffffffff8116f1e1>] handle_irq_event+0x41/0x70
> > [ 0.510718] [<ffffffff81171ef6>] handle_fasteoi_irq+0x86/0x140
> > [ 0.510733] [<ffffffff81075a22>] handle_irq+0x22/0x40
> > [ 0.510748] [<ffffffff81b0beaf>] do_IRQ+0x4f/0xf0
> > [ 0.510762] [<ffffffff81b09bb2>] ret_from_intr+0x0/0x1a
> > [ 0.510777] [<ffffffff8107e783>] default_idle+0x23/0x260
> > [ 0.510792] [<ffffffff8107f35f>] arch_cpu_idle+0xf/0x20
> > [ 0.510806] [<ffffffff8114a99b>] cpu_startup_entry+0x36b/0x5b0
> > [ 0.510821] [<ffffffff810a8d04>] start_secondary+0x1a4/0x1d0
> > [ 0.510840]
> > [ 0.510840] other info that might help us debug this:
> > [ 0.510840]
> > [ 0.510856] Possible unsafe locking scenario:
> > [ 0.510856]
> > [ 0.510868] CPU0 CPU1
> > [ 0.510877] ---- ----
> > [ 0.510886] lock(&(*(&acpi_gbl_gpe_lock))->rlock);
> > [ 0.510898] lock(&(&ec->lock)->rlock);
> > [ 0.510912] lock(&(*(&acpi_gbl_gpe_lock))->rlock);
> > [ 0.510927] lock(&(&ec->lock)->rlock);
> > [ 0.510938]
> > [ 0.510938] *** DEADLOCK ***
> > [ 0.510938]
> > [ 0.510953] 1 lock held by swapper/3/0:
> > [ 0.510961] #0: (&(*(&acpi_gbl_gpe_lock))->rlock){-.....}, at: [<ffffffff814cd67e>] acpi_os_acquire_lock+0xe/0x10
> > [ 0.510990]
> > [ 0.510990] stack backtrace:
> > [ 0.511004] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 3.18.0-rc4-next-20141117-07404-g9dad2ab6df8b #66
> > [ 0.511021] Hardware name: LENOVO 3460CC6/3460CC6, BIOS G6ET93WW (2.53 ) 02/04/2013
> > [ 0.511035] ffffffff82cb2f70 ffff88011e2c3bb8 ffffffff81afc316 0000000000000011
> > [ 0.511055] ffffffff82cb2f70 ffff88011e2c3c08 ffffffff81afae11 0000000000000001
> > [ 0.511074] ffff88011e2c3c68 ffff88011e2c3c08 ffff8801193f92d0 ffff8801193f9b20
> > [ 0.511094] Call Trace:
> > [ 0.511101] <IRQ> [<ffffffff81afc316>] dump_stack+0x4c/0x6e
> > [ 0.511125] [<ffffffff81afae11>] print_circular_bug+0x2b2/0x2c3
> > [ 0.511142] [<ffffffff811585ef>] __lock_acquire+0x210f/0x2220
> > [ 0.511159] [<ffffffff81158f4f>] lock_acquire+0xdf/0x2d0
> > [ 0.511176] [<ffffffff814d533e>] ? acpi_ec_gpe_handler+0x21/0xfc
> > [ 0.511192] [<ffffffff81b08010>] _raw_spin_lock_irqsave+0x50/0x70
> > [ 0.511209] [<ffffffff814d533e>] ? acpi_ec_gpe_handler+0x21/0xfc
> > [ 0.511225] [<ffffffff814ea192>] ? acpi_hw_write+0x4b/0x52
> > [ 0.511241] [<ffffffff814d533e>] acpi_ec_gpe_handler+0x21/0xfc
> > [ 0.511258] [<ffffffff814e02c2>] acpi_ev_gpe_dispatch+0xd2/0x143
> > [ 0.511274] [<ffffffff814e03fb>] acpi_ev_gpe_detect+0xc8/0x10f
> > [ 0.511292] [<ffffffff814e23b6>] acpi_ev_sci_xrupt_handler+0x22/0x38
> > [ 0.511309] [<ffffffff814cc8ee>] acpi_irq+0x16/0x31
> > [ 0.511325] [<ffffffff8116eccf>] handle_irq_event_percpu+0x6f/0x540
> > [ 0.511342] [<ffffffff8116f1e1>] handle_irq_event+0x41/0x70
> > [ 0.511357] [<ffffffff81171e98>] ? handle_fasteoi_irq+0x28/0x140
> > [ 0.511372] [<ffffffff81171ef6>] handle_fasteoi_irq+0x86/0x140
> > [ 0.511388] [<ffffffff81075a22>] handle_irq+0x22/0x40
> > [ 0.511402] [<ffffffff81b0beaf>] do_IRQ+0x4f/0xf0
> > [ 0.511417] [<ffffffff81b09bb2>] common_interrupt+0x72/0x72
> > [ 0.511428] <EOI> [<ffffffff810b8986>] ? native_safe_halt+0x6/0x10
> > [ 0.511454] [<ffffffff81154f3d>] ? trace_hardirqs_on+0xd/0x10
> > [ 0.511468] [<ffffffff8107e783>] default_idle+0x23/0x260
> > [ 0.511482] [<ffffffff8107f35f>] arch_cpu_idle+0xf/0x20
> > [ 0.511496] [<ffffffff8114a99b>] cpu_startup_entry+0x36b/0x5b0
> > [ 0.511512] [<ffffffff810a8d04>] start_secondary+0x1a4/0x1d0
> >
> >
> >
>
> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.
Powered by blists - more mailing lists