lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 23 Jun 2009 18:56:23 +0200
From:	"Rafael J. Wysocki" <rjw@...k.pl>
To:	Ingo Molnar <mingo@...e.hu>
Cc:	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Arjan van de Ven <arjan@...radead.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Venki Pallipadi <venkatesh.pallipadi@...el.com>,
	Len Brown <lenb@...nel.org>,
	Benjamin Herrenschmidt <benh@...nel.crashing.org>,
	ACPI Devel Maling List <linux-acpi@...r.kernel.org>
Subject: Re: kerneloops.org report for the week of June 14 2009

On Tuesday 23 June 2009, Ingo Molnar wrote:
> 
> * Thomas Gleixner <tglx@...utronix.de> wrote:
> 
> > On Sun, 14 Jun 2009, Arjan van de Ven wrote:
> > > Rank 3: getnstimeofday (warning)
> > > 	Reported 309 times (2446 total reports)
> > > 	[suspend resume] getnstimeofday() is called before timekeeping is
> >         resumed
> > 
> > > Rank 6: hres_timers_resume (warning)
> > > 	Reported 188 times (1024 total reports)
> > > 	[suspend resume] hres_timers_resume() is incorrectly called with
> > >       interrupts on
> > 
> > Both have the same root cause. Something enables interrupts in the 
> > early resume path. IIRC, there was a culprit identified recently. 
> > Rafael ?

Apparently, we have smp_call_function_single() called from cpufreq_suspend
via acpi_cpufreq somehow, but I'm still to figure out how this happens.

> This can be debugged automatically today, using lockdep, by using a 
> 'helper lock':
> 
>   static DEFINE_PER_CPU(struct lockdep_map, helper_lock);
> 
> Then mark the lock irq-safe by doing something like:
> 
> static void mark_lock_irqsafe(void)
> {
> 	unsigned long flags;
> 	int cpu;
> 
> 	local_irq_save(flags);
> 	irq_enter(0);
> 
> 	for_each_online_cpu(cpu) {
> 		lock_acquire(&per_cpu(helper_lock, cpu), 0, 0, 0, 0, NULL, 0);
> 		lock_release(&per_cpu(helper_lock, cpu), 0, 0, 0, 0, NULL, 0);
> 	}
> 
> 	irq_exit(0);
> 	local_irq_restore(flags);
> }
> 
> Then, the resume path, when it disables irqs, you can disallow 
> irq-enable via:
> 
> 	local_irq_disable();
> 	lock_acquire(&__get_cpu_var(helper_lock), 0, 0, 0, 0, NULL, 0);
> 	...
> 	<extensive suspend or resume codepaths, callbacks> 
> 	...
> 	lock_release(&__get_cpu_var(helper_lock), 0, 0, 0, 0, NULL, 0);
> 	local_irq_enable();
> 
> And lockdep will warn if any function inbetween enables IRQs, by 
> emitting a splat about incorrectly enabled hardirqs. It will warn 
> about the specific place and will emit a relevant backtrace, - not 
> just the handler in general.
> 
> This should work just fine with current lockdep facilities.
> 
> Rafael?

We have some debug code for checking interrupts disabled in sysdev_suspend
and sysdev_resume already and these reports are from 2.6.29 where that code
was not present.

The long term solution for the issue at hand is to clean up the suspend-resume
support in cpufreq so that it doesn't do stupid things like calling
smp_call_function_single() with interrupts disabled, but that requires someone
(I can do it, but I need to dig through the cpufreq code for this purpose) to
figure out how to fix it.

I'm not quite sure if there's an acceptable short term solution, though.

In principle we can do

local_irq_save()
...
local_irq_restore()

around each sysdevs ->susend() and ->resume() in addition to checking the
status of interrupts.  Would that work?

Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ