[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20081125150328.GA18784@elte.hu>
Date: Tue, 25 Nov 2008 16:03:28 +0100
From: Ingo Molnar <mingo@...e.hu>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Thomas Gleixner <tglx@...utronix.de>,
linux-kernel <linux-kernel@...r.kernel.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
socketcan-users@...ts.berlios.de
Subject: Re: [RFC PATCH] hrtimer: removing all ur callback modes
* Ingo Molnar <mingo@...e.hu> wrote:
> > 9 files changed, 37 insertions(+), 293 deletions(-)
>
> very nice :-)
>
> Applied to tip/timers/hrtimers and started testing it, lets see what
> happens ;-)
-tip testing found a CPU hotplug related locking bug on 64-bit x86:
[ 61.728012] [ INFO: possible circular locking dependency detected ]
[ 61.728012] 2.6.28-rc6-tip #1851
[ 61.728012] -------------------------------------------------------
[ 61.728012] S99local/1576 is trying to acquire lock:
[ 61.728012] (&rq->lock){++..}, at: [<ffffffff80244379>] task_rq_lock+0x69/0xb0
[ 61.728012]
[ 61.728012] but task is already holding lock:
[ 61.728012] (&cpu_base->lock/1){....}, at: [<ffffffff80d59014>] hrtimer_cpu_notify+0xa4/0x1d0
Config attached. Find the full bootlog below. Reproducable via:
echo 0 > /sys/devices/system/cpu/cpu1/online
sleep 1
echo 1 > /sys/devices/system/cpu/cpu1/online
sleep 1
echo 0 > /sys/devices/system/cpu/cpu1/online
sleep 1
echo 1 > /sys/devices/system/cpu/cpu1/online
sleep 1
Ingo
------------->
[ 42.760373] eth0: link down
[ 50.976447] EXT3 FS on sda6, internal journal
[ 51.102308] kjournald starting. Commit interval 5 seconds
[ 51.102618] EXT3 FS on sda5, internal journal
[ 51.102634] EXT3-fs: mounted filesystem with ordered data mode.
[ 52.074153] rc.sysinit used greatest stack depth: 3384 bytes left
[ 53.420996] warning: `dbus-daemon' uses 32-bit capabilities (legacy support in use)
[ 57.703594] ssh used greatest stack depth: 3328 bytes left
[ 58.381277] CPU 1 is now offline
[ 58.384393] lockdep: fixing up alternatives.
[ 58.388643] SMP alternatives: switching to UP code
[ 58.680235] CPU0 attaching NULL sched-domain.
[ 58.684623] CPU1 attaching NULL sched-domain.
[ 58.790843] CPU0 attaching NULL sched-domain.
[ 59.911584] lockdep: fixing up alternatives.
[ 59.915778] SMP alternatives: switching to SMP code
[ 60.158681] Booting processor 1 APIC 0x1 ip 0x6000
[ 58.679283] Initializing CPU#1
[ 58.679283] masked ExtINT on CPU#1
[ 58.679283] Calibrating delay using timer specific routine.. 4020.52 BogoMIPS (lpj=8041048)
[ 58.679283] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
[ 58.679283] CPU: L2 Cache: 512K (64 bytes/line)
[ 58.679283] CPU 1/0x1 -> Node 0
[ 58.679283] CPU: Physical Processor ID: 0
[ 58.679283] CPU: Processor Core ID: 1
[ 58.679283] x86 PAT enabled: cpu 1, old 0x7040600070406, new 0x7010600070106
[ 60.253877] CPU1: <7>Switched to high resolution mode on CPU 1
[ 60.303705] AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ stepping 02
[ 60.310237] CPU0 attaching NULL sched-domain.
[ 60.422058] CPU0 attaching sched-domain:
[ 60.425931] domain 0: span 0-1 level CPU
[ 60.429823] groups: 0 1
[ 60.432423] domain 1: span 0-1 level NODE
[ 60.436600] groups: 0-1
[ 60.439277] CPU1 attaching sched-domain:
[ 60.443192] domain 0: span 0-1 level CPU
[ 60.447155] groups: 1 0
[ 60.449757] domain 1: span 0-1 level NODE
[ 60.453919] groups: 0-1
[ 61.507492] CPU 1 is now offline
[ 61.510627] lockdep: fixing up alternatives.
[ 61.514877] SMP alternatives: switching to UP code
[ 61.727932]
[ 61.727934] =======================================================
[ 61.728012] [ INFO: possible circular locking dependency detected ]
[ 61.728012] 2.6.28-rc6-tip #1851
[ 61.728012] -------------------------------------------------------
[ 61.728012] S99local/1576 is trying to acquire lock:
[ 61.728012] (&rq->lock){++..}, at: [<ffffffff80244379>] task_rq_lock+0x69/0xb0
[ 61.728012]
[ 61.728012] but task is already holding lock:
[ 61.728012] (&cpu_base->lock/1){....}, at: [<ffffffff80d59014>] hrtimer_cpu_notify+0xa4/0x1d0
[ 61.728012]
[ 61.728012] which lock already depends on the new lock.
[ 61.728012]
[ 61.728012]
[ 61.728012] the existing dependency chain (in reverse order) is:
[ 61.728012]
[ 61.728012] -> #3 (&cpu_base->lock/1){....}:
[ 61.728012] [<ffffffff8027b8bd>] validate_chain+0xccd/0x12f0
[ 61.728012] [<ffffffff8027d401>] __lock_acquire+0x381/0x620
[ 61.728012] [<ffffffff8027d741>] lock_acquire+0xa1/0xe0
[ 61.728012] [<ffffffff80d5efd4>] _spin_lock_nested+0x34/0x70
[ 61.728012] [<ffffffff80d59014>] hrtimer_cpu_notify+0xa4/0x1d0
[ 61.728012] [<ffffffff8026f3be>] notifier_call_chain+0x5e/0xa0
[ 61.728012] [<ffffffff8026f4be>] __raw_notifier_call_chain+0xe/0x10
[ 61.728012] [<ffffffff8026f4d6>] raw_notifier_call_chain+0x16/0x20
[ 61.728012] [<ffffffff80cec091>] _cpu_down+0x151/0x2a0
[ 61.728012] [<ffffffff80cec236>] cpu_down+0x56/0x70
[ 61.728012] [<ffffffff80cede2e>] store_online+0x4e/0xa0
[ 61.728012] [<ffffffff80687490>] sysdev_store+0x20/0x30
[ 61.728012] [<ffffffff80328e85>] sysfs_write_file+0xf5/0x150
[ 61.728012] [<ffffffff802d4718>] vfs_write+0xc8/0x140
[ 61.728012] [<ffffffff802d5075>] sys_write+0x55/0x90
[ 61.728012] [<ffffffff8021275a>] system_call_fastpath+0x16/0x1b
[ 61.728012] [<ffffffffffffffff>] 0xffffffffffffffff
[ 61.728012]
[ 61.728012] -> #2 (&cpu_base->lock){++..}:
[ 61.728012] [<ffffffff8027b8bd>] validate_chain+0xccd/0x12f0
[ 61.728012] [<ffffffff8027d401>] __lock_acquire+0x381/0x620
[ 61.728012] [<ffffffff8027d741>] lock_acquire+0xa1/0xe0
[ 61.728012] [<ffffffff80d5f1c8>] _spin_lock_irqsave+0x58/0xa0
[ 61.728012] [<ffffffff8026d981>] lock_hrtimer_base+0x31/0x60
[ 61.728012] [<ffffffff8026e1d6>] hrtimer_start_range_ns+0x36/0x1a0
[ 61.728012] [<ffffffff8024985f>] __enqueue_rt_entity+0x13f/0x1a0
[ 61.728012] [<ffffffff8024990b>] enqueue_task_rt+0x4b/0x70
[ 61.728012] [<ffffffff80243ae5>] enqueue_task+0x55/0x70
[ 61.728012] [<ffffffff80243b28>] activate_task+0x28/0x40
[ 61.728012] [<ffffffff8024c253>] try_to_wake_up+0x273/0x2d0
[ 61.728012] [<ffffffff8024c305>] wake_up_process+0x15/0x20
[ 61.728012] [<ffffffff80d58047>] migration_call+0xd7/0x5b0
[ 61.728012] [<ffffffff814432c3>] migration_init+0x43/0x60
[ 61.728012] [<ffffffff8020a038>] do_one_initcall+0x38/0x1b0
[ 61.728012] [<ffffffff81427ec0>] kernel_init+0x70/0x1a0
[ 61.728012] [<ffffffff80213aa9>] child_rip+0xa/0x21
[ 61.728012] [<ffffffffffffffff>] 0xffffffffffffffff
[ 61.728012]
[ 61.728012] -> #1 (&rt_b->rt_runtime_lock){.+..}:
[ 61.728012] [<ffffffff8027b8bd>] validate_chain+0xccd/0x12f0
[ 61.728012] [<ffffffff8027d401>] __lock_acquire+0x381/0x620
[ 61.728012] [<ffffffff8027d741>] lock_acquire+0xa1/0xe0
[ 61.728012] [<ffffffff80d5f046>] _spin_lock+0x36/0x70
[ 61.728012] [<ffffffff80249810>] __enqueue_rt_entity+0xf0/0x1a0
[ 61.728012] [<ffffffff8024990b>] enqueue_task_rt+0x4b/0x70
[ 61.728012] [<ffffffff80243ae5>] enqueue_task+0x55/0x70
[ 61.728012] [<ffffffff80243b28>] activate_task+0x28/0x40
[ 61.728012] [<ffffffff8024c253>] try_to_wake_up+0x273/0x2d0
[ 61.728012] [<ffffffff8024c305>] wake_up_process+0x15/0x20
[ 61.728012] [<ffffffff80d58047>] migration_call+0xd7/0x5b0
[ 61.728012] [<ffffffff814432c3>] migration_init+0x43/0x60
[ 61.728012] [<ffffffff8020a038>] do_one_initcall+0x38/0x1b0
[ 61.728012] [<ffffffff81427ec0>] kernel_init+0x70/0x1a0
[ 61.728012] [<ffffffff80213aa9>] child_rip+0xa/0x21
[ 61.728012] [<ffffffffffffffff>] 0xffffffffffffffff
[ 61.728012]
[ 61.728012] -> #0 (&rq->lock){++..}:
[ 61.728012] [<ffffffff8027b443>] validate_chain+0x853/0x12f0
[ 61.728012] [<ffffffff8027d401>] __lock_acquire+0x381/0x620
[ 61.728012] [<ffffffff8027d741>] lock_acquire+0xa1/0xe0
[ 61.728012] [<ffffffff80d5f046>] _spin_lock+0x36/0x70
[ 61.728012] [<ffffffff80244379>] task_rq_lock+0x69/0xb0
[ 61.728012] [<ffffffff8024c05f>] try_to_wake_up+0x7f/0x2d0
[ 61.728012] [<ffffffff8024c305>] wake_up_process+0x15/0x20
[ 61.728012] [<ffffffff8026d742>] hrtimer_wakeup+0x22/0x30
[ 61.728012] [<ffffffff8026d50e>] __run_hrtimer+0x7e/0xe0
[ 61.728012] [<ffffffff8026d690>] enqueue_hrtimer+0x120/0x160
[ 61.728012] [<ffffffff80d59087>] hrtimer_cpu_notify+0x117/0x1d0
[ 61.728012] [<ffffffff8026f3be>] notifier_call_chain+0x5e/0xa0
[ 61.728012] [<ffffffff8026f4be>] __raw_notifier_call_chain+0xe/0x10
[ 61.728012] [<ffffffff8026f4d6>] raw_notifier_call_chain+0x16/0x20
[ 61.728012] [<ffffffff80cec091>] _cpu_down+0x151/0x2a0
[ 61.728012] [<ffffffff80cec236>] cpu_down+0x56/0x70
[ 61.728012] [<ffffffff80cede2e>] store_online+0x4e/0xa0
[ 61.728012] [<ffffffff80687490>] sysdev_store+0x20/0x30
[ 61.728012] [<ffffffff80328e85>] sysfs_write_file+0xf5/0x150
[ 61.728012] [<ffffffff802d4718>] vfs_write+0xc8/0x140
[ 61.728012] [<ffffffff802d5075>] sys_write+0x55/0x90
[ 61.728012] [<ffffffff8021275a>] system_call_fastpath+0x16/0x1b
[ 61.728012] [<ffffffffffffffff>] 0xffffffffffffffff
[ 61.728012]
[ 61.728012] other info that might help us debug this:
[ 61.728012]
[ 61.728012] 4 locks held by S99local/1576:
[ 61.728012] #0: (&buffer->mutex){--..}, at: [<ffffffff80328dd8>] sysfs_write_file+0x48/0x150
[ 61.728012] #1: (cpu_add_remove_lock){--..}, at: [<ffffffff80255227>] cpu_maps_update_begin+0x17/0x20
[ 61.728012] #2: (&cpu_hotplug.lock){--..}, at: [<ffffffff802552a7>] cpu_hotplug_begin+0x27/0x60
[ 61.728012] #3: (&cpu_base->lock/1){....}, at: [<ffffffff80d59014>] hrtimer_cpu_notify+0xa4/0x1d0
[ 61.728012]
[ 61.728012] stack backtrace:
[ 61.728012] Pid: 1576, comm: S99local Not tainted 2.6.28-rc6-tip #1851
[ 61.728012] Call Trace:
[ 61.728012] [<ffffffff8027a8d4>] print_circular_bug_tail+0x94/0xe0
[ 61.728012] [<ffffffff8027b443>] validate_chain+0x853/0x12f0
[ 61.728012] [<ffffffff8027d401>] __lock_acquire+0x381/0x620
[ 61.728012] [<ffffffff8027d741>] lock_acquire+0xa1/0xe0
[ 61.728012] [<ffffffff80244379>] ? task_rq_lock+0x69/0xb0
[ 61.728012] [<ffffffff80d5f046>] _spin_lock+0x36/0x70
[ 61.728012] [<ffffffff80244379>] ? task_rq_lock+0x69/0xb0
[ 61.728012] [<ffffffff80244379>] task_rq_lock+0x69/0xb0
[ 61.728012] [<ffffffff8024c05f>] try_to_wake_up+0x7f/0x2d0
[ 61.728012] [<ffffffff80212456>] ? ftrace_call+0x5/0x2b
[ 61.728012] [<ffffffff8026d720>] ? hrtimer_wakeup+0x0/0x30
[ 61.728012] [<ffffffff8024c305>] wake_up_process+0x15/0x20
[ 61.728012] [<ffffffff8026d742>] hrtimer_wakeup+0x22/0x30
[ 61.728012] [<ffffffff8026d50e>] __run_hrtimer+0x7e/0xe0
[ 61.728012] [<ffffffff8026d690>] enqueue_hrtimer+0x120/0x160
[ 61.728012] [<ffffffff80d59087>] hrtimer_cpu_notify+0x117/0x1d0
[ 61.728012] [<ffffffff8026f3be>] notifier_call_chain+0x5e/0xa0
[ 61.728012] [<ffffffff8026f4be>] __raw_notifier_call_chain+0xe/0x10
[ 61.728012] [<ffffffff8026f4d6>] raw_notifier_call_chain+0x16/0x20
[ 61.728012] [<ffffffff80cec091>] _cpu_down+0x151/0x2a0
[ 61.728012] [<ffffffff80d5c8c0>] ? wait_for_common+0x120/0x180
[ 61.728012] [<ffffffff80cec236>] cpu_down+0x56/0x70
[ 61.728012] [<ffffffff80cede2e>] store_online+0x4e/0xa0
[ 61.728012] [<ffffffff80687490>] sysdev_store+0x20/0x30
[ 61.728012] [<ffffffff80328e85>] sysfs_write_file+0xf5/0x150
[ 61.728012] [<ffffffff802d4718>] vfs_write+0xc8/0x140
[ 61.728012] [<ffffffff802d5075>] sys_write+0x55/0x90
[ 61.728012] [<ffffffff8021275a>] system_call_fastpath+0x16/0x1b
[ 62.507029] CPU0 attaching NULL sched-domain.
[ 62.511307] CPU1 attaching NULL sched-domain.
[ 62.546392] CPU0 attaching NULL sched-domain.
[ 63.606365] lockdep: fixing up alternatives.
[ 63.610548] SMP alternatives: switching to SMP code
[ 63.831320] Booting processor 1 APIC 0x1 ip 0x6000
[ 61.727801] Initializing CPU#1
[ 61.727801] masked ExtINT on CPU#1
[ 61.727801] Calibrating delay using timer specific routine.. 4020.66 BogoMIPS (lpj=8041322)
[ 61.727801] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
[ 61.727801] CPU: L2 Cache: 512K (64 bytes/line)
[ 61.727801] CPU 1/0x1 -> Node 0
[ 61.727801] CPU: Physical Processor ID: 0
[ 61.727801] CPU: Processor Core ID: 1
[ 61.727801] x86 PAT enabled: cpu 1, old 0x7040600070406, new 0x7010600070106
[ 63.925801] CPU1: <7>Switched to high resolution mode on CPU 1
[ 63.975613] AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ stepping 02
[ 63.982138] CPU0 attaching NULL sched-domain.
[ 64.016774] CPU0 attaching sched-domain:
[ 64.020609] domain 0: span 0-1 level CPU
[ 64.024556] groups: 0 1
[ 64.027140] domain 1: span 0-1 level NODE
[ 64.031301] groups: 0-1
[ 64.033992] CPU1 attaching sched-domain:
[ 64.037902] domain 0: span 0-1 level CPU
[ 64.041873] groups: 1 0
[ 64.044475] domain 1: span 0-1 level NODE
[ 64.048648] groups: 0-1
View attachment "config" of type "text/plain" (56575 bytes)
Powered by blists - more mailing lists