lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 21 Dec 2008 09:29:47 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	Frans Pop <elendil@...net.nl>, Len Brown <lenb@...nel.org>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Yinghai Lu <yinghai@...nel.org>,
	Suresh Siddha <suresh.b.siddha@...el.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	"H. Peter Anvin" <hpa@...or.com>,
	"Maciej W. Rozycki" <macro@...ux-mips.org>,
	"Pallipadi, Venkatesh" <venkatesh.pallipadi@...el.com>,
	lenb@...nel.org, "Rafael J. Wysocki" <rjw@...k.pl>,
	Greg KH <greg@...ah.com>, jbarnes@...tuousgeek.org,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	tiwai@...e.de, Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: "APIC error on CPU1: 00(40)" during resume


* Frans Pop <elendil@...net.nl> wrote:

> On Wednesday 10 December 2008, Ingo Molnar wrote:
> > regarding those APIC error messages:
> > >        ACPI: Waking up from system sleep state S3
> > >        APIC error on CPU1: 00(40)
> > >        ACPI: EC: non-query interrupt received, switching to interrupt
> >
> > that does suggest that the APIC was re-enabled (we dont get any APIC
> > error exceptions otherwise!), and its LVT was programmed as well, but
> > somehow we got an erroneous APIC message from an illegal vector.
> 
> I wonder if this may help tracing the cause. Today I got a KERN_ERR in the 
> middle of those messages:
> 
> ACPI: Waking up from system sleep state S3
> BUG: sleeping function called from invalid context at kernel/sched.c:5571
> in_atomic(): 0, irqs_disabled(): 1, pid: 70, name: kacpid
> Pid: 70, comm: kacpid Not tainted 2.6.28-rc7-rjw #77
> Call Trace:
>  [<ffffffff80345013>] ? acpi_os_release_object+0x9/0xd
>  [<ffffffff8022d588>] __might_sleep+0xcf/0xd1
>  [<ffffffff80234e32>] __cond_resched+0x15/0x4b
>  [<ffffffff8043b1d7>] _cond_resched+0x2d/0x38
>  [<ffffffff80357bc9>] acpi_ps_complete_op+0x235/0x24b
>  [<ffffffff803582de>] acpi_ps_parse_loop+0x6ff/0x859
>  [<ffffffff803574db>] acpi_ps_parse_aml+0x7c/0x2bb
>  [<ffffffff80358a35>] acpi_ps_execute_method+0x144/0x213
>  [<ffffffff80354e9e>] acpi_ns_evaluate+0x152/0x230
>  [<ffffffff803452d0>] ? acpi_os_execute_deferred+0x0/0x39
>  [<ffffffff8034c6a6>] acpi_ev_asynch_execute_gpe_method+0xc1/0x119
>  [<ffffffff803452fc>] acpi_os_execute_deferred+0x2c/0x39
>  [<ffffffff802480c7>] run_workqueue+0x95/0x12a
>  [<ffffffff80248251>] worker_thread+0xf5/0x109
>  [<ffffffff8024b9cb>] ? autoremove_wake_function+0x0/0x38
>  [<ffffffff8024815c>] ? worker_thread+0x0/0x109
>  [<ffffffff8024b678>] kthread+0x49/0x76
>  [<ffffffff8020d009>] child_rip+0xa/0x11
>  [<ffffffff8022da41>] ? pick_next_task_fair+0x8b/0x93
>  [<ffffffff8024b62f>] ? kthread+0x0/0x76
>  [<ffffffff8020cfff>] ? child_rip+0x0/0x11
> APIC error on CPU1: 00(40)
> ACPI: EC: non-query interrupt received, switching to interrupt mode
> 
> This is the first time I've seen this error. Kernel is based on commit 
> f6f7b52e2f61 (just after -rc7) and includes the final versions of the 
> patches Rafael posted in this thread [1].
> 
> More complete log available on request.

hm, that warning seems to show an ACPI bug (Len Cc:-ed): we preempt in an 
atomic section - right during executing an AML scriptlet. Executing ACPI 
AMLs is a rather fragile moment of the kernel: they are used by the BIOS 
to indirectly instruct the kernel to tweak lowlevel chipset registers and 
other platform details.

The kernel executes AMLs 'blindly' - they tweak details that Linux 
typically has no knowledge about via any driver - so these things must 
absolutely run atomic, and scheduling away in the wrong moment (which 
means implicitly re-enabling interrupts) can leave the system in an 
inconsistent state.

This 'blindness' and opaqueness of AML execution is perhaps the nastiest 
aspect of the whole ACPI engine (because their opacity makes them 
undebuggable and unfixable in essence). Nevertheless, it still might be 
some unrelated phenomenon to your APIC illegal vector errors.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists