lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20081125150328.GA18784@elte.hu>
Date:	Tue, 25 Nov 2008 16:03:28 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Thomas Gleixner <tglx@...utronix.de>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	socketcan-users@...ts.berlios.de
Subject: Re: [RFC PATCH] hrtimer: removing all ur callback modes


* Ingo Molnar <mingo@...e.hu> wrote:

> >  9 files changed, 37 insertions(+), 293 deletions(-)
> 
> very nice :-)
> 
> Applied to tip/timers/hrtimers and started testing it, lets see what 
> happens ;-)

-tip testing found a CPU hotplug related locking bug on 64-bit x86:

[   61.728012] [ INFO: possible circular locking dependency detected ]
[   61.728012] 2.6.28-rc6-tip #1851
[   61.728012] -------------------------------------------------------
[   61.728012] S99local/1576 is trying to acquire lock:
[   61.728012]  (&rq->lock){++..}, at: [<ffffffff80244379>] task_rq_lock+0x69/0xb0
[   61.728012] 
[   61.728012] but task is already holding lock:
[   61.728012]  (&cpu_base->lock/1){....}, at: [<ffffffff80d59014>] hrtimer_cpu_notify+0xa4/0x1d0

Config attached. Find the full bootlog below. Reproducable via:

 echo 0 > /sys/devices/system/cpu/cpu1/online
 sleep 1
 echo 1 > /sys/devices/system/cpu/cpu1/online
 sleep 1
 echo 0 > /sys/devices/system/cpu/cpu1/online
 sleep 1
 echo 1 > /sys/devices/system/cpu/cpu1/online
 sleep 1

	Ingo

------------->
[   42.760373] eth0: link down
[   50.976447] EXT3 FS on sda6, internal journal
[   51.102308] kjournald starting.  Commit interval 5 seconds
[   51.102618] EXT3 FS on sda5, internal journal
[   51.102634] EXT3-fs: mounted filesystem with ordered data mode.
[   52.074153] rc.sysinit used greatest stack depth: 3384 bytes left
[   53.420996] warning: `dbus-daemon' uses 32-bit capabilities (legacy support in use)
[   57.703594] ssh used greatest stack depth: 3328 bytes left
[   58.381277] CPU 1 is now offline
[   58.384393] lockdep: fixing up alternatives.
[   58.388643] SMP alternatives: switching to UP code
[   58.680235] CPU0 attaching NULL sched-domain.
[   58.684623] CPU1 attaching NULL sched-domain.
[   58.790843] CPU0 attaching NULL sched-domain.
[   59.911584] lockdep: fixing up alternatives.
[   59.915778] SMP alternatives: switching to SMP code
[   60.158681] Booting processor 1 APIC 0x1 ip 0x6000
[   58.679283] Initializing CPU#1
[   58.679283] masked ExtINT on CPU#1
[   58.679283] Calibrating delay using timer specific routine.. 4020.52 BogoMIPS (lpj=8041048)
[   58.679283] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
[   58.679283] CPU: L2 Cache: 512K (64 bytes/line)
[   58.679283] CPU 1/0x1 -> Node 0
[   58.679283] CPU: Physical Processor ID: 0
[   58.679283] CPU: Processor Core ID: 1
[   58.679283] x86 PAT enabled: cpu 1, old 0x7040600070406, new 0x7010600070106
[   60.253877] CPU1: <7>Switched to high resolution mode on CPU 1
[   60.303705] AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ stepping 02
[   60.310237] CPU0 attaching NULL sched-domain.
[   60.422058] CPU0 attaching sched-domain:
[   60.425931]  domain 0: span 0-1 level CPU
[   60.429823]   groups: 0 1
[   60.432423]   domain 1: span 0-1 level NODE
[   60.436600]    groups: 0-1
[   60.439277] CPU1 attaching sched-domain:
[   60.443192]  domain 0: span 0-1 level CPU
[   60.447155]   groups: 1 0
[   60.449757]   domain 1: span 0-1 level NODE
[   60.453919]    groups: 0-1
[   61.507492] CPU 1 is now offline
[   61.510627] lockdep: fixing up alternatives.
[   61.514877] SMP alternatives: switching to UP code
[   61.727932] 
[   61.727934] =======================================================
[   61.728012] [ INFO: possible circular locking dependency detected ]
[   61.728012] 2.6.28-rc6-tip #1851
[   61.728012] -------------------------------------------------------
[   61.728012] S99local/1576 is trying to acquire lock:
[   61.728012]  (&rq->lock){++..}, at: [<ffffffff80244379>] task_rq_lock+0x69/0xb0
[   61.728012] 
[   61.728012] but task is already holding lock:
[   61.728012]  (&cpu_base->lock/1){....}, at: [<ffffffff80d59014>] hrtimer_cpu_notify+0xa4/0x1d0
[   61.728012] 
[   61.728012] which lock already depends on the new lock.
[   61.728012] 
[   61.728012] 
[   61.728012] the existing dependency chain (in reverse order) is:
[   61.728012] 
[   61.728012] -> #3 (&cpu_base->lock/1){....}:
[   61.728012]        [<ffffffff8027b8bd>] validate_chain+0xccd/0x12f0
[   61.728012]        [<ffffffff8027d401>] __lock_acquire+0x381/0x620
[   61.728012]        [<ffffffff8027d741>] lock_acquire+0xa1/0xe0
[   61.728012]        [<ffffffff80d5efd4>] _spin_lock_nested+0x34/0x70
[   61.728012]        [<ffffffff80d59014>] hrtimer_cpu_notify+0xa4/0x1d0
[   61.728012]        [<ffffffff8026f3be>] notifier_call_chain+0x5e/0xa0
[   61.728012]        [<ffffffff8026f4be>] __raw_notifier_call_chain+0xe/0x10
[   61.728012]        [<ffffffff8026f4d6>] raw_notifier_call_chain+0x16/0x20
[   61.728012]        [<ffffffff80cec091>] _cpu_down+0x151/0x2a0
[   61.728012]        [<ffffffff80cec236>] cpu_down+0x56/0x70
[   61.728012]        [<ffffffff80cede2e>] store_online+0x4e/0xa0
[   61.728012]        [<ffffffff80687490>] sysdev_store+0x20/0x30
[   61.728012]        [<ffffffff80328e85>] sysfs_write_file+0xf5/0x150
[   61.728012]        [<ffffffff802d4718>] vfs_write+0xc8/0x140
[   61.728012]        [<ffffffff802d5075>] sys_write+0x55/0x90
[   61.728012]        [<ffffffff8021275a>] system_call_fastpath+0x16/0x1b
[   61.728012]        [<ffffffffffffffff>] 0xffffffffffffffff
[   61.728012] 
[   61.728012] -> #2 (&cpu_base->lock){++..}:
[   61.728012]        [<ffffffff8027b8bd>] validate_chain+0xccd/0x12f0
[   61.728012]        [<ffffffff8027d401>] __lock_acquire+0x381/0x620
[   61.728012]        [<ffffffff8027d741>] lock_acquire+0xa1/0xe0
[   61.728012]        [<ffffffff80d5f1c8>] _spin_lock_irqsave+0x58/0xa0
[   61.728012]        [<ffffffff8026d981>] lock_hrtimer_base+0x31/0x60
[   61.728012]        [<ffffffff8026e1d6>] hrtimer_start_range_ns+0x36/0x1a0
[   61.728012]        [<ffffffff8024985f>] __enqueue_rt_entity+0x13f/0x1a0
[   61.728012]        [<ffffffff8024990b>] enqueue_task_rt+0x4b/0x70
[   61.728012]        [<ffffffff80243ae5>] enqueue_task+0x55/0x70
[   61.728012]        [<ffffffff80243b28>] activate_task+0x28/0x40
[   61.728012]        [<ffffffff8024c253>] try_to_wake_up+0x273/0x2d0
[   61.728012]        [<ffffffff8024c305>] wake_up_process+0x15/0x20
[   61.728012]        [<ffffffff80d58047>] migration_call+0xd7/0x5b0
[   61.728012]        [<ffffffff814432c3>] migration_init+0x43/0x60
[   61.728012]        [<ffffffff8020a038>] do_one_initcall+0x38/0x1b0
[   61.728012]        [<ffffffff81427ec0>] kernel_init+0x70/0x1a0
[   61.728012]        [<ffffffff80213aa9>] child_rip+0xa/0x21
[   61.728012]        [<ffffffffffffffff>] 0xffffffffffffffff
[   61.728012] 
[   61.728012] -> #1 (&rt_b->rt_runtime_lock){.+..}:
[   61.728012]        [<ffffffff8027b8bd>] validate_chain+0xccd/0x12f0
[   61.728012]        [<ffffffff8027d401>] __lock_acquire+0x381/0x620
[   61.728012]        [<ffffffff8027d741>] lock_acquire+0xa1/0xe0
[   61.728012]        [<ffffffff80d5f046>] _spin_lock+0x36/0x70
[   61.728012]        [<ffffffff80249810>] __enqueue_rt_entity+0xf0/0x1a0
[   61.728012]        [<ffffffff8024990b>] enqueue_task_rt+0x4b/0x70
[   61.728012]        [<ffffffff80243ae5>] enqueue_task+0x55/0x70
[   61.728012]        [<ffffffff80243b28>] activate_task+0x28/0x40
[   61.728012]        [<ffffffff8024c253>] try_to_wake_up+0x273/0x2d0
[   61.728012]        [<ffffffff8024c305>] wake_up_process+0x15/0x20
[   61.728012]        [<ffffffff80d58047>] migration_call+0xd7/0x5b0
[   61.728012]        [<ffffffff814432c3>] migration_init+0x43/0x60
[   61.728012]        [<ffffffff8020a038>] do_one_initcall+0x38/0x1b0
[   61.728012]        [<ffffffff81427ec0>] kernel_init+0x70/0x1a0
[   61.728012]        [<ffffffff80213aa9>] child_rip+0xa/0x21
[   61.728012]        [<ffffffffffffffff>] 0xffffffffffffffff
[   61.728012] 
[   61.728012] -> #0 (&rq->lock){++..}:
[   61.728012]        [<ffffffff8027b443>] validate_chain+0x853/0x12f0
[   61.728012]        [<ffffffff8027d401>] __lock_acquire+0x381/0x620
[   61.728012]        [<ffffffff8027d741>] lock_acquire+0xa1/0xe0
[   61.728012]        [<ffffffff80d5f046>] _spin_lock+0x36/0x70
[   61.728012]        [<ffffffff80244379>] task_rq_lock+0x69/0xb0
[   61.728012]        [<ffffffff8024c05f>] try_to_wake_up+0x7f/0x2d0
[   61.728012]        [<ffffffff8024c305>] wake_up_process+0x15/0x20
[   61.728012]        [<ffffffff8026d742>] hrtimer_wakeup+0x22/0x30
[   61.728012]        [<ffffffff8026d50e>] __run_hrtimer+0x7e/0xe0
[   61.728012]        [<ffffffff8026d690>] enqueue_hrtimer+0x120/0x160
[   61.728012]        [<ffffffff80d59087>] hrtimer_cpu_notify+0x117/0x1d0
[   61.728012]        [<ffffffff8026f3be>] notifier_call_chain+0x5e/0xa0
[   61.728012]        [<ffffffff8026f4be>] __raw_notifier_call_chain+0xe/0x10
[   61.728012]        [<ffffffff8026f4d6>] raw_notifier_call_chain+0x16/0x20
[   61.728012]        [<ffffffff80cec091>] _cpu_down+0x151/0x2a0
[   61.728012]        [<ffffffff80cec236>] cpu_down+0x56/0x70
[   61.728012]        [<ffffffff80cede2e>] store_online+0x4e/0xa0
[   61.728012]        [<ffffffff80687490>] sysdev_store+0x20/0x30
[   61.728012]        [<ffffffff80328e85>] sysfs_write_file+0xf5/0x150
[   61.728012]        [<ffffffff802d4718>] vfs_write+0xc8/0x140
[   61.728012]        [<ffffffff802d5075>] sys_write+0x55/0x90
[   61.728012]        [<ffffffff8021275a>] system_call_fastpath+0x16/0x1b
[   61.728012]        [<ffffffffffffffff>] 0xffffffffffffffff
[   61.728012] 
[   61.728012] other info that might help us debug this:
[   61.728012] 
[   61.728012] 4 locks held by S99local/1576:
[   61.728012]  #0:  (&buffer->mutex){--..}, at: [<ffffffff80328dd8>] sysfs_write_file+0x48/0x150
[   61.728012]  #1:  (cpu_add_remove_lock){--..}, at: [<ffffffff80255227>] cpu_maps_update_begin+0x17/0x20
[   61.728012]  #2:  (&cpu_hotplug.lock){--..}, at: [<ffffffff802552a7>] cpu_hotplug_begin+0x27/0x60
[   61.728012]  #3:  (&cpu_base->lock/1){....}, at: [<ffffffff80d59014>] hrtimer_cpu_notify+0xa4/0x1d0
[   61.728012] 
[   61.728012] stack backtrace:
[   61.728012] Pid: 1576, comm: S99local Not tainted 2.6.28-rc6-tip #1851
[   61.728012] Call Trace:
[   61.728012]  [<ffffffff8027a8d4>] print_circular_bug_tail+0x94/0xe0
[   61.728012]  [<ffffffff8027b443>] validate_chain+0x853/0x12f0
[   61.728012]  [<ffffffff8027d401>] __lock_acquire+0x381/0x620
[   61.728012]  [<ffffffff8027d741>] lock_acquire+0xa1/0xe0
[   61.728012]  [<ffffffff80244379>] ? task_rq_lock+0x69/0xb0
[   61.728012]  [<ffffffff80d5f046>] _spin_lock+0x36/0x70
[   61.728012]  [<ffffffff80244379>] ? task_rq_lock+0x69/0xb0
[   61.728012]  [<ffffffff80244379>] task_rq_lock+0x69/0xb0
[   61.728012]  [<ffffffff8024c05f>] try_to_wake_up+0x7f/0x2d0
[   61.728012]  [<ffffffff80212456>] ? ftrace_call+0x5/0x2b
[   61.728012]  [<ffffffff8026d720>] ? hrtimer_wakeup+0x0/0x30
[   61.728012]  [<ffffffff8024c305>] wake_up_process+0x15/0x20
[   61.728012]  [<ffffffff8026d742>] hrtimer_wakeup+0x22/0x30
[   61.728012]  [<ffffffff8026d50e>] __run_hrtimer+0x7e/0xe0
[   61.728012]  [<ffffffff8026d690>] enqueue_hrtimer+0x120/0x160
[   61.728012]  [<ffffffff80d59087>] hrtimer_cpu_notify+0x117/0x1d0
[   61.728012]  [<ffffffff8026f3be>] notifier_call_chain+0x5e/0xa0
[   61.728012]  [<ffffffff8026f4be>] __raw_notifier_call_chain+0xe/0x10
[   61.728012]  [<ffffffff8026f4d6>] raw_notifier_call_chain+0x16/0x20
[   61.728012]  [<ffffffff80cec091>] _cpu_down+0x151/0x2a0
[   61.728012]  [<ffffffff80d5c8c0>] ? wait_for_common+0x120/0x180
[   61.728012]  [<ffffffff80cec236>] cpu_down+0x56/0x70
[   61.728012]  [<ffffffff80cede2e>] store_online+0x4e/0xa0
[   61.728012]  [<ffffffff80687490>] sysdev_store+0x20/0x30
[   61.728012]  [<ffffffff80328e85>] sysfs_write_file+0xf5/0x150
[   61.728012]  [<ffffffff802d4718>] vfs_write+0xc8/0x140
[   61.728012]  [<ffffffff802d5075>] sys_write+0x55/0x90
[   61.728012]  [<ffffffff8021275a>] system_call_fastpath+0x16/0x1b
[   62.507029] CPU0 attaching NULL sched-domain.
[   62.511307] CPU1 attaching NULL sched-domain.
[   62.546392] CPU0 attaching NULL sched-domain.
[   63.606365] lockdep: fixing up alternatives.
[   63.610548] SMP alternatives: switching to SMP code
[   63.831320] Booting processor 1 APIC 0x1 ip 0x6000
[   61.727801] Initializing CPU#1
[   61.727801] masked ExtINT on CPU#1
[   61.727801] Calibrating delay using timer specific routine.. 4020.66 BogoMIPS (lpj=8041322)
[   61.727801] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
[   61.727801] CPU: L2 Cache: 512K (64 bytes/line)
[   61.727801] CPU 1/0x1 -> Node 0
[   61.727801] CPU: Physical Processor ID: 0
[   61.727801] CPU: Processor Core ID: 1
[   61.727801] x86 PAT enabled: cpu 1, old 0x7040600070406, new 0x7010600070106
[   63.925801] CPU1: <7>Switched to high resolution mode on CPU 1
[   63.975613] AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ stepping 02
[   63.982138] CPU0 attaching NULL sched-domain.
[   64.016774] CPU0 attaching sched-domain:
[   64.020609]  domain 0: span 0-1 level CPU
[   64.024556]   groups: 0 1
[   64.027140]   domain 1: span 0-1 level NODE
[   64.031301]    groups: 0-1
[   64.033992] CPU1 attaching sched-domain:
[   64.037902]  domain 0: span 0-1 level CPU
[   64.041873]   groups: 1 0
[   64.044475]   domain 1: span 0-1 level NODE
[   64.048648]    groups: 0-1

View attachment "config" of type "text/plain" (56575 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ