linux-kernel - timer/hrtimer & cpu hotplug lockdep complaints

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20070227153051.GD7911@osiris.boeblingen.de.ibm.com>
Date:	Tue, 27 Feb 2007 16:30:51 +0100
From:	Heiko Carstens <heiko.carstens@...ibm.com>
To:	linux-kernel@...r.kernel.org
Cc:	Ingo Molnar <mingo@...e.hu>, Thomas Gleixner <tglx@...utronix.de>,
	Martin Schwidefsky <schwidefsky@...ibm.com>,
	john stultz <johnstul@...ibm.com>
Subject: timer/hrtimer & cpu hotplug lockdep complaints

lockdep tells us that we have a possible circular locking dependency.
The output below is incomplete since our console driver deadlocked on
del_timer()...

    <4>Processor 4 spun down
    <4>
    <4>=======================================================
    <4>[ INFO: possible circular locking dependency detected ]
    <4>2.6.20-23.x.20070222-s390xdebug #1
    <4>-------------------------------------------------------
    <4>bash/1447 is trying to acquire lock:
    <4> (base_lock_keys + cpu#5){++..}, at: [<000000000013b686>] timer_cpu_notify+0xd2/0x3a4
    <4>
    <4>but task is already holding lock:
    <4> (base_lock_keys + cpu#7){++..}, at: [<000000000013b67c>] timer_cpu_notify+0xc8/0x3a4
    <4>
    <4>which lock already depends on the new lock.
    <4>
    <4>
    <4>the existing dependency chain (in reverse order) is:
    <4>
    <4>-> #1 (base_lock_keys + cpu#7){++..}:
    <4>       [<0000000000158380>] __lock_acquire+0xe40/0xf78
    <4>       [<000000000015854a>] lock_acquire+0x92/0xb8
    <4>       [<0000000000441eea>] _spin_lock+0x52/0x6c
    <4>       [<000000000013b686>] timer_cpu_notify+0xd2/0x3a4
    <4>       [<0000000000443c44>] notifier_call_chain+0x64/0x88
    <4>       [<000000000014200a>] raw_notifier_call_chain+0x26/0x38
    <4>       [<000000000015deee>] cpu_down+0x25e/0x350
    <4>       [<00000000002cb650>] store_online+0x64/0xb8

The problem in this case is the cpu hotplug code in kernel/timer.c:

	migrate_timers(...)
	...
	old_base = per_cpu(tvec_bases, cpu);
	new_base = get_cpu_var(tvec_bases);

	local_irq_disable();
	spin_lock(&new_base->lock);
	spin_lock(&old_base->lock);

takes per-cpu locks once in A-B order and once in B-A order, depending
on from which cpu another one gets killed. This should be ok, since
migrate_timers() can only be active on one cpu and therefore no deadlock
should occur...
The same problem is present in kernel/hrtimer.c btw.
Any idea how to fix this? Just adding a lockdep_off/on between the two
spin_lock()s will fix this, but I don't like it.

Btw.: this is the call trace that lead to the deadlock when printing the
lockdep message. Looks like it is a bad idea if the console driver needs
to take a lock that lockdep wants to complain about. But I don't think
this needs to be fixed.

 0 _raw_spin_lock+326 [0x2a90ee]
 1 _spin_lock_irqsave+108 [0x442198]
 2 lock_timer_base+64 [0x13c758]
 3 del_timer+84 [0x13cfc4]
 4 raw3215_write+610 [0x340fb6]
 5 con3215_write+96 [0x3410a4]
 6 __call_console_drivers+194 [0x12d95a]
 7 _call_console_drivers+142 [0x12da06]
 8 release_console_sem+460 [0x12dfc4]
 9 vprintk+546 [0x12e922]
10 printk+82 [0x12da9e]
11 __print_symbol+220 [0x162f40]
12 print_stack_trace+136 [0x152664]
13 print_circular_bug_entry+104 [0x154f84]
14 print_circular_bug_header+260 [0x1560fc]
15 check_noncircular+354 [0x15752e]
16 __lock_acquire+2970 [0x1580da]
17 lock_acquire+146 [0x15854a]
18 _spin_lock+82 [0x441eea]
19 timer_cpu_notify+210 [0x13b686]
20 notifier_call_chain+100 [0x443c44]
21 raw_notifier_call_chain+38 [0x14200a]
22 cpu_down+606 [0x15deee]
23 store_online+100 [0x2cb650]
24 sysdev_store+72 [0x2c56d0]
25 sysfs_write_file+286 [0x20ea52]
26 vfs_write+210 [0x1ad74e]
27 sys_write+86 [0x1adf7a]
28 sysc_noemu+16 [0x111ad0]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/