[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8db1092f0906211313x73ac9340n9af5775b56cfd189@mail.gmail.com>
Date: Sun, 21 Jun 2009 22:13:21 +0200
From: Maciej Rutecki <maciej.rutecki@...il.com>
To: Andi Kleen <ak@...ux.intel.com>
Cc: Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
"H. Peter Anvin" <hpa@...or.com>, seto.hidetoshi@...fujitsu.com,
"Rafael J. Wysocki" <rjw@...k.pl>
Subject: Re: 2.6.30-git(16 and 17) system hangs after resume from suspend to
disk, mce related?
2009/6/21 Andi Kleen <ak@...ux.intel.com>:
> I assume it runs stable for hours without resume from disk?
I only test for 40 minutes. latest git hangs 4-5 minutes after resume
from s2disk
> And you made sure you don't use stale data from
> a different kernel for resume from disk?
I'm sure
>
> It is strange that resume from disk affects machine check.
> How is your resume setup?
You ask about "resume" kernel option?
maciek@...m:~$ cat /proc/cmdline
root=/dev/sda2 ro resume=/dev/sda3 selinux=0
> Do you have any init scripts that change machine check state
> before the resume from disk runs?
No. I use default Debian instalation. I use this script, to do s2disk:
#!/bin/sh
umount /mnt/vista
umount /mnt/drugi
governor0=`cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor`
governor1=`cat /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor`
f_min_0=`cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq`
f_min_1=`cat /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq`
f_max_0=`cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq`
f_max_1=`cat /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq`
#rmmod snd_hda_intel
sync
hdparm -F /dev/sda
hdparm -F /dev/sdb
sleep 1
# hibernate
echo -n platform > /sys/power/disk
echo -n disk > /sys/power/state
echo $governor0 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo $governor1 > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
echo $f_min_0 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
echo $f_min_1 > /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq
echo $f_max_0 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
echo $f_max_1 > /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq
#modprobe snd_hda_intel model=3stack-dig
sleep 1
/etc/init.d/hdparm restart
mount /mnt/vista
mount /mnt/drugi
>
> I assume you have CONFIG_X86_NEW_MCE enabled, correct?
maciek@...m:~$ cat /boot/config-2.6.30-git17 | grep MCE
CONFIG_X86_MCE=y
# CONFIG_X86_OLD_MCE is not set
CONFIG_X86_NEW_MCE=y
CONFIG_X86_MCE_INTEL=y
# CONFIG_X86_MCE_AMD is not set
# CONFIG_X86_ANCIENT_MCE is not set
CONFIG_X86_MCE_THRESHOLD=y
CONFIG_X86_MCE_INJECT=m
> Does it still happen with CONFIG_X86_OLD_MCE instead?
I will check tomorrow.
>
> Also a "a few minutes" suggest something might be going wrong
> with the poll handler. Does the problem still happen
> with you use CONFIG_X86_NEW_MCE again, but before
> resume do
>
> echo 0 > /sys/device/system/machinecheck/machinecheck0/check_interval
>
> On the other hand you should get a crash very fast with
>
> echo 1 > /sys/device/system/machinecheck/machinecheck0/check_interval
I didn't instructions from above, but I found something else. After
normal boot I try:
echo 1 > /sys/devices/system/machinecheck/machinecheck0/check_interval
I I found this in dmesg:
[ 141.704025] ------------[ cut here ]------------
[ 141.704039] WARNING: at arch/x86/kernel/cpu/mcheck/mce.c:1102
mcheck_timer+0xf5/0x100()
[ 141.704044] Hardware name: G31M-S2L
[ 141.704047] Modules linked in: i915 drm i2c_algo_bit video
backlight output ppdev lp rfcomm l2cap xt_tcpudp xt_limit xt_state
iptable_filter nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables
x_tables fuse dm_crypt dm_mod coretemp it87 hwmon_vid loop usbhid hid
btusb bluetooth snd_hda_codec_realtek snd_hda_intel snd_hda_codec
snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_dummy snd_seq_oss
snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer
snd_seq_device snd uhci_hcd ehci_hcd soundcore parport_pc parport
psmouse r8169 usbcore 8139too 8139cp mii i2c_i801 button rtc_cmos
rtc_core rtc_lib snd_page_alloc intel_agp agpgart evdev
[ 141.704139] Pid: 0, comm: swapper Not tainted 2.6.30-git17 #1
[ 141.704143] Call Trace:
[ 141.704152] [<c039382c>] ? printk+0x18/0x1c
[ 141.704158] [<c010f715>] ? mcheck_timer+0xf5/0x100
[ 141.704165] [<c013212c>] warn_slowpath_common+0x6c/0xc0
[ 141.704170] [<c010f715>] ? mcheck_timer+0xf5/0x100
[ 141.704176] [<c0132195>] warn_slowpath_null+0x15/0x20
[ 141.704182] [<c010f715>] mcheck_timer+0xf5/0x100
[ 141.704188] [<c013b99d>] run_timer_softirq+0x12d/0x1f0
[ 141.704194] [<c010f620>] ? mcheck_timer+0x0/0x100
[ 141.704199] [<c010f620>] ? mcheck_timer+0x0/0x100
[ 141.704206] [<c01372da>] __do_softirq+0x9a/0x130
[ 141.704212] [<c014b0ce>] ? hrtimer_interrupt+0xde/0x230
[ 141.704217] [<c039642f>] ? _spin_unlock+0xf/0x30
[ 141.704224] [<c01373a5>] do_softirq+0x35/0x40
[ 141.704229] [<c01375ad>] irq_exit+0x6d/0x90
[ 141.704235] [<c01167e8>] smp_apic_timer_interrupt+0x58/0x90
[ 141.704241] [<c0103856>] apic_timer_interrupt+0x2a/0x30
[ 141.704248] [<c010a662>] ? mwait_idle+0x62/0x70
[ 141.704253] [<c0101ee5>] cpu_idle+0x55/0x90
[ 141.704259] [<c0390b0b>] start_secondary+0x184/0x1f9
[ 141.704264] ---[ end trace 54c5f0d77c70ea21 ]---
[ 142.701022] ------------[ cut here ]------------
[ 142.701036] WARNING: at arch/x86/kernel/cpu/mcheck/mce.c:1102
mcheck_timer+0xf5/0x100()
[ 142.701041] Hardware name: G31M-S2L
[ 142.701044] Modules linked in: i915 drm i2c_algo_bit video
backlight output ppdev lp rfcomm l2cap xt_tcpudp xt_limit xt_state
iptable_filter nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables
x_tables fuse dm_crypt dm_mod coretemp it87 hwmon_vid loop usbhid hid
btusb bluetooth snd_hda_codec_realtek snd_hda_intel snd_hda_codec
snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_dummy snd_seq_oss
snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer
snd_seq_device snd uhci_hcd ehci_hcd soundcore parport_pc parport
psmouse r8169 usbcore 8139too 8139cp mii i2c_i801 button rtc_cmos
rtc_core rtc_lib snd_page_alloc intel_agpagpgart evdev
[ 142.701138] Pid: 0, comm: swapper Tainted: G W 2.6.30-git17 #1
[ 142.701142] Call Trace:
[ 142.701151] [<c039382c>] ? printk+0x18/0x1c
[ 142.701156] [<c010f715>] ? mcheck_timer+0xf5/0x100
[ 142.701163] [<c013212c>] warn_slowpath_common+0x6c/0xc0
[ 142.701169] [<c010f715>] ? mcheck_timer+0xf5/0x100
[ 142.701174] [<c0132195>] warn_slowpath_null+0x15/0x20
[ 142.701180] [<c010f715>] mcheck_timer+0xf5/0x100
[ 142.701186] [<c013b99d>] run_timer_softirq+0x12d/0x1f0
[ 142.701192] [<c010f620>] ? mcheck_timer+0x0/0x100
[ 142.701197] [<c010f620>] ? mcheck_timer+0x0/0x100
[ 142.701204] [<c01372da>] __do_softirq+0x9a/0x130
[ 142.701210] [<c014b0ce>] ? hrtimer_interrupt+0xde/0x230
[ 142.701216] [<c039642f>] ? _spin_unlock+0xf/0x30
[ 142.701222] [<c01373a5>] do_softirq+0x35/0x40
[ 142.701228] [<c01375ad>] irq_exit+0x6d/0x90
[ 142.701234] [<c01167e8>] smp_apic_timer_interrupt+0x58/0x90
[ 142.701240] [<c0103856>] apic_timer_interrupt+0x2a/0x30
[ 142.701247] [<c010a662>] ? mwait_idle+0x62/0x70
[ 142.701252] [<c0101ee5>] cpu_idle+0x55/0x90
[ 142.701258] [<c0390b0b>] start_secondary+0x184/0x1f9
[ 142.701264] ---[ end trace 54c5f0d77c70ea22 ]---
It's stop when I do echo 0...
> Your dmesg also doesn't have anything related to resume from disk?
Dmesg after resume, but before hangs:
http://unixy.pl/maciek/download/kernel/2.6.30-git17/pc/dmesg-2.6.30-git17-after-resume.txt
Nothing weird.
>
> Thanks,
>
> -Andi
>
Thanks for ansfer.
--
Maciej Rutecki
http://www.maciek.unixy.pl
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists