lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 26 Feb 2008 22:15:10 +0100
From:	Karsten Wiese <fzu@...gehoertderstaat.de>
To:	paulmck@...ux.vnet.ibm.com
Cc:	linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
	Rafael Wysocki <rjw@...k.pl>,
	Steven Rostedt <rostedt@...dmis.org>, akpm@...l.org
Subject: Re: 2.6.25-rc2 rcupreempt WARN after suspend to ram

Am Dienstag, 26. Februar 2008 schrieb Karsten Wiese:
> Am Sonntag, 24. Februar 2008 schrieb Paul E. McKenney:
> > On Sun, Feb 24, 2008 at 04:38:15PM +0100, Karsten Wiese wrote:
> > > Am Samstag, 23. Februar 2008 schrieb Karsten Wiese:
> > > > Am Samstag, 23. Februar 2008 schrieb Paul E. McKenney:
> > > > > On Sat, Feb 23, 2008 at 01:41:02PM +0100, Karsten Wiese wrote:
> > > > > > Hi,
> > > > > > 
> > > > > > This appeared in dmesg after
> > > > > > 	$ echo core > /sys/power/pm_test
> > > > > > followed by 3 cycles of
> > > > > > 	$ echo mem > /sys/power/state
> > > > > > . .config attached.
> > > > > > 
> > > > > > dmesg excerpt (, full ~1MByte available):
> > > > > 
> > > > > Does this tree have http://lkml.org/lkml/2008/1/29/208 applied?
> > > > 
> > > > Yes. This tree was linus' git head as of yesterday or the day before.
> > > 
> > > Updated to git-head of today, same test and .config, different symptoms
> > > like in this thread: http://lkml.org/lkml/2008/2/23/260
> > > Later in this thread, Alan Cox said it looked like irq problems.
> > > Maybe also the rcupreemt related WARN_ON I saw are caused by irq problems.
> > 
> > Might be, but am taking a closer look at the interaction between irq,
> > dynticks, and rcupreempt in any case.
> 
> [Added Rafael, Thomas and Steven to CC]
> 
> The "different symptoms" above are indeed unrelated and solved by reverting
> 	"commit 559bbe6cbd0d8c68d40076a5f7dc98e3bf5864b2
>     	 power_state: get rid of write-only variable in SATA"
> 
> The cpu_hotplug code used by suspend together with hr_timer and nohz looks
> suspicious:
> 	$ echo 0 > /sys/devices/system/cpu/cpu1/online
> 	$ echo 1 > /sys/devices/system/cpu/cpu1/online
> 	(repeat until dmesg|tail shows WARNs; here it took 2 iterations)
> causes symptoms like in 1st message of this thread again.

[Added Andrew]

This fixes it, not sure if its not just papering over, though it seams to be
likely for an offlined CPU to have ticks stopped.

Thanks,
      Karsten
---

Don't touch an offlined CPU's ts->tick_stopped in tick_cancel_sched_timer()

Silences WARN_ONs in rcu_enter_nohz() and rcu_exit_nohz(), which appeared
before caused by (repeated) calls to:
        $ echo 0 > /sys/devices/system/cpu/cpu1/online
        $ echo 1 > /sys/devices/system/cpu/cpu1/online

Signed-off-by: Karsten Wiese <fzu@...gehoertderstaat.de>
---
 kernel/time/tick-sched.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 2968298..686da82 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -640,7 +640,7 @@ void tick_cancel_sched_timer(int cpu)
 
 	if (ts->sched_timer.base)
 		hrtimer_cancel(&ts->sched_timer);
-	ts->tick_stopped = 0;
+
 	ts->nohz_mode = NOHZ_MODE_INACTIVE;
 }
 #endif /* HIGH_RES_TIMERS */
-- 
1.5.3.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ