[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87tuzdrgm5.fsf@gmx.net>
Date: Sun, 14 Jun 2020 14:12:18 +0200
From: Stephen Berman <stephen.berman@....net>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc: Thomas Gleixner <tglx@...utronix.de>,
Peter Zijlstra <peterz@...radead.org>,
linux-kernel@...r.kernel.org, linux-acpi@...r.kernel.org
Subject: Re: power-off delay/hang due to commit 6d25be57 (mainline)
On Fri, 12 Jun 2020 13:01:22 +0200 Sebastian Andrzej Siewior <bigeasy@...utronix.de> wrote:
> + ACPI in case the ACPI folks see something obvious.
[...]
>> The "acpi_os_execute_deferred" messages were repeated many times in the
>> above line, then every 20-30 seconds again for several minutes. Then
>> suddenly a call trace appeared which was similar but not identical to
>> the one I posted upthread, and each line of the trace was followed by
>> the line ", acpi_os_execute_deferred". This went by quite quickly even
>> with the printk_delay I added, and I was unable to photograph the start
>> of it and couldn't get all of the subsequent output, but the screenshots
>> show some of it. After several minutes of this output I again did a
>> hard reboot.
>
> All good. I though that you will have one worker that is blocking but
> you have a lot of them. It appears that one is active and stuck and more
> are waiting.
>
> Could you look at acpi in /proc/interrupts 10 secs apart to see if it
> increments?
>
> grep -E 'acpi|smbus' /proc/interrupts
I tried this several times, 10 seconds apart and longer, but saw no
change, it was always this:
steve [ ~ ]$ grep -E 'acpi|smbus' /proc/interrupts
9: 0 5 0 0 0 0 0 0 0 0 0 0 IO-APIC 9-fasteoi acpi
16: 0 0 0 0 0 0 0 0 0 0 0 0 IO-APIC 16-fasteoi i801_smbus
> You could also do "echo t > /proc/sysrq-trigger" which gives you a lot
> of task state information, but at the end you will also see "howing busy
> workqueues and worker pools:" regarding the workqueue state. I am
> curious to see if you already have worker stuck in kacpid_notify with
> acpi_os_execute_deferred.
What am I supposed to do after "echo t > /proc/sysrq-trigger"? Both
before and after doing that I get an error trying to open the file:
root [ ~ ]# cat /proc/sysrq-trigger
cat: /proc/sysrq-trigger: Input/output error
> Now that we know that know that acpi_os_execute_deferred() is stuck,
> lets shed some light in what it is trying to do. The patch at the end
> will dump this information into the console buffer (The `dmesg' command
> will print the whole kernel buffer). I have no idea if this starts
> printing while the system is running or during shutdown. I would expect
> to see the Start line in acpi_os_execute_deferred() but not the End one.
>
> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
> index 41168c027a5a4..0e983c558bcb5 100644
> --- a/drivers/acpi/osl.c
> +++ b/drivers/acpi/osl.c
> @@ -840,7 +840,9 @@ static void acpi_os_execute_deferred(struct work_struct *work)
> {
> struct acpi_os_dpc *dpc = container_of(work, struct acpi_os_dpc, work);
>
> + pr_err("%s(%d) Start %px %pF(%px)\n", __func__, __LINE__, dpc, dpc->function, dpc->context);
> dpc->function(dpc->context);
> + pr_err("%s(%d) End %px %pF(%px)\n", __func__, __LINE__, dpc, dpc->function, dpc->context);
> kfree(dpc);
> }
>
> @@ -1096,6 +1098,8 @@ acpi_status acpi_os_execute(acpi_execute_type type,
> */
> if (type == OSL_NOTIFY_HANDLER) {
> queue = kacpi_notify_wq;
> + pr_err("%s(%d) Adding %pS %px\n", __func__, __LINE__, function,
> + context);
> INIT_WORK(&dpc->work, acpi_os_execute_deferred);
> } else if (type == OSL_GPE_HANDLER) {
> queue = kacpid_wq;
I applied this patch to 5.6.4 and recompiled, and on the next boot with
that kernel, the kernel buffer (and kernel and system logs) began to get
flooded with these messages:
Jun 14 10:37:13 strobe-jhalfs kernel: [ 5.943987] acpi_os_execute_deferred(843) Start ffff8fb82c7b6500 000000003edf1e05(ffff8fb82c492990)
Jun 14 10:37:13 strobe-jhalfs kernel: [ 5.944102] acpi_os_execute(1101) Adding acpi_ev_asynch_enable_gpe+0x0/0x2f ffff8fb82c492990
Jun 14 10:37:13 strobe-jhalfs kernel: [ 5.944104] acpi_os_execute_deferred(845) End ffff8fb82c7b6500 000000003edf1e05(ffff8fb82c492990)
Jun 14 10:37:13 strobe-jhalfs kernel: [ 5.944105] acpi_os_execute_deferred(843) Start ffff8fb82b844800 000000002ba560ea(ffff8fb82c492990)
Jun 14 10:37:13 strobe-jhalfs kernel: [ 5.944124] acpi_os_execute_deferred(845) End ffff8fb82b844800 000000002ba560ea(ffff8fb82c492990)
Jun 14 10:37:13 strobe-jhalfs kernel: [ 5.944288] acpi_os_execute_deferred(843) Start ffff8fb82c7b6540 000000003edf1e05(ffff8fb82c492990)
Jun 14 10:37:13 strobe-jhalfs kernel: [ 5.944387] acpi_os_execute(1101) Adding acpi_ev_asynch_enable_gpe+0x0/0x2f ffff8fb82c492990
and so on without stopping. I could start X and avoid seeing the
messages, but was afraid the logs would fill up the root partition if I
let it keep going, so I rebooted with another kernel.
Was this message flood because I booted with "ignore_loglevel
initcall_debug"? In the logs there are also lots of messages like this:
Jun 14 10:37:13 strobe-jhalfs kernel: [ 6.237628] initcall i915_init+0x0/0x66 returned 0 after 288657 usecs
Jun 14 10:37:13 strobe-jhalfs kernel: [ 6.237644] calling cn_proc_init+0x0/0x35 @ 1
Jun 14 10:37:13 strobe-jhalfs kernel: [ 6.237646] initcall cn_proc_init+0x0/0x35 returned 0 after 0 usecs
Jun 14 10:37:13 strobe-jhalfs kernel: [ 6.237648] calling _nvm_misc_init+0x0/0xc @ 1
Jun 14 10:37:13 strobe-jhalfs kernel: [ 6.237687] initcall _nvm_misc_init+0x0/0xc returned 0 after 36 usecs
Jun 14 10:37:13 strobe-jhalfs kernel: [ 6.237690] calling topology_sysfs_init+0x0/0x30 @ 1
Jun 14 10:37:13 strobe-jhalfs kernel: [ 6.237720] initcall topology_sysfs_init+0x0/0x30 returned 0 after 28 usecs
and so on.
Steve Berman
Powered by blists - more mailing lists