lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <26020.1410476074@warthog.procyon.org.uk>
Date:	Thu, 11 Sep 2014 23:54:34 +0100
From:	David Howells <dhowells@...hat.com>
To:	Frederic Weisbecker <fweisbec@...il.com>
cc:	dhowells@...hat.com, Ingo Molnar <mingo@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	linux-kernel@...r.kernel.org
Subject: Deadlock in vtime_account_user() vs itself across a page fault


Whilst trying to use docker, I'm occasionally seeing the attached deadlock in
user time accounting, with a page fault in the middle.  The relevant lines
from the pre-fault bits of stack:

	[<ffffffff8106d954>] ? cpuacct_account_field+0x65/0x9a
	(gdb) i li *0xffffffff8106d954
	Line 272 of "../kernel/sched/cpuacct.c"

		kcpustat->cpustat[index] += val;

	[<ffffffff81060d41>] account_user_time+0x62/0x95
	(gdb) i li *0xffffffff81060d41
	Line 151 of "../kernel/sched/cputime.c"

		acct_account_cputime(p);

	[<ffffffff81061254>] vtime_account_user+0x62/0x8d
	(gdb) i li *0xffffffff81061254
	Line 264 of "../include/linux/seqlock.h"

		in write_seqcount_end():
		seqcount_release(&s->dep_map, 1, _RET_IP_);

I can't see any particular reason there should be a page fault occurring,
except that there's a duff kernel pointer, but I don't get to find out because
the page fault handling doesn't get that far:-/

David
---
=============================================
[ INFO: possible recursive locking detected ]
3.17.0-rc4-fsdevel+ #706 Tainted: G        W     
---------------------------------------------
NetworkManager/2305 is trying to acquire lock:
 (&(&(&p->vtime_seqlock)->lock)->rlock){-.-.-.}, at: [<ffffffff8106120d>] vtime_account_user+0x1b/0x8d

but task is already holding lock:
 (&(&(&p->vtime_seqlock)->lock)->rlock){-.-.-.}, at: [<ffffffff8106120d>] vtime_account_user+0x1b/0x8d

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(&(&p->vtime_seqlock)->lock)->rlock);
  lock(&(&(&p->vtime_seqlock)->lock)->rlock);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

3 locks held by NetworkManager/2305:
 #0:  (&(&(&p->vtime_seqlock)->lock)->rlock){-.-.-.}, at: [<ffffffff8106120d>] vtime_account_user+0x1b/0x8d
 #1:  (&(&p->vtime_seqlock)->seqcount){-----.}, at: [<ffffffff810df2f9>] context_tracking_user_exit+0x54/0xb7
 #2:  (rcu_read_lock){......}, at: [<ffffffff8106d8ef>] cpuacct_account_field+0x0/0x9a

stack backtrace:
CPU: 0 PID: 2305 Comm: NetworkManager Tainted: G        W      3.17.0-rc4-fsdevel+ #706
Hardware name:                  /DG965RY, BIOS MQ96510J.86A.0816.2006.0716.2308 07/16/2006
 0000000000000000 ffff8800389bfbe0 ffffffff815063fd ffffffff8235c880
 ffff8800389bfcc0 ffffffff810717f5 ffff8800389bfcd0 ffffffff81071a90
 0000000000000000 ffffffff8106d85d 0000000000000001 ffffffff81061200
Call Trace:
 [<ffffffff815063fd>] dump_stack+0x4d/0x66
 [<ffffffff810717f5>] __lock_acquire+0x7d7/0x1a2a
 [<ffffffff81071a90>] ? __lock_acquire+0xa72/0x1a2a
 [<ffffffff8106d85d>] ? cpuacct_css_alloc+0x93/0x93
 [<ffffffff81061200>] ? vtime_account_user+0xe/0x8d
 [<ffffffff81071a90>] ? __lock_acquire+0xa72/0x1a2a
 [<ffffffff810730fc>] lock_acquire+0x8b/0x101
 [<ffffffff810730fc>] ? lock_acquire+0x8b/0x101
 [<ffffffff8106120d>] ? vtime_account_user+0x1b/0x8d
 [<ffffffff8150bc4b>] _raw_spin_lock+0x2b/0x3a
 [<ffffffff8106120d>] ? vtime_account_user+0x1b/0x8d
 [<ffffffff8106120d>] vtime_account_user+0x1b/0x8d
 [<ffffffff810df2f9>] context_tracking_user_exit+0x54/0xb7
 [<ffffffff81030682>] do_page_fault+0x3a/0x54
 [<ffffffff8150e462>] page_fault+0x22/0x30
 [<ffffffff8106d954>] ? cpuacct_account_field+0x65/0x9a
 [<ffffffff81060d41>] account_user_time+0x62/0x95
 [<ffffffff81061254>] vtime_account_user+0x62/0x8d
 [<ffffffff810df2f9>] ? context_tracking_user_exit+0x54/0xb7
 [<ffffffff810df2f9>] context_tracking_user_exit+0x54/0xb7
 [<ffffffff8100e105>] syscall_trace_enter+0x1da/0x21a
 [<ffffffff8150c985>] tracesys+0x7e/0xe2
------------[ cut here ]------------
WARNING: CPU: 0 PID: 2305 at ../kernel/watchdog.c:267 watchdog_overflow_callback+0x96/0xa2()
Watchdog detected hard LOCKUP on cpu 0
Modules linked in:
CPU: 0 PID: 2305 Comm: NetworkManager Tainted: G        W      3.17.0-rc4-fsdevel+ #706
Hardware name:                  /DG965RY, BIOS MQ96510J.86A.0816.2006.0716.2308 07/16/2006
 0000000000000000 ffff88003da06c28 ffffffff815063fd ffff88003da06c70
 ffff88003da06c60 ffffffff8103a77a ffffffff810bbc06 ffff88003be2b800
 0000000000000000 ffff88003da06d80 ffff88003da06ef8 ffff88003da06cc8
Call Trace:
 <NMI>  [<ffffffff815063fd>] dump_stack+0x4d/0x66
 [<ffffffff8103a77a>] warn_slowpath_common+0x7a/0x93
 [<ffffffff810bbc06>] ? watchdog_overflow_callback+0x96/0xa2
 [<ffffffff8103a7d6>] warn_slowpath_fmt+0x43/0x4b
 [<ffffffff810bbc06>] watchdog_overflow_callback+0x96/0xa2
 [<ffffffff810db43d>] __perf_event_overflow+0x17b/0x276
 [<ffffffff810d88ad>] ? perf_event_update_userpage+0x39/0x13f
 [<ffffffff810d899c>] ? perf_event_update_userpage+0x128/0x13f
 [<ffffffff810db999>] perf_event_overflow+0x14/0x16
 [<ffffffff810195ac>] intel_pmu_handle_irq+0x2f7/0x37d
 [<ffffffff81013210>] perf_event_nmi_handler+0x25/0x3e
 [<ffffffff81005b4a>] nmi_handle+0x80/0x140
 [<ffffffff81028eb2>] ? default_send_IPI_mask_allbutself_phys+0xd4/0xd4
 [<ffffffff81005cc3>] do_nmi+0xb9/0x2d2
 [<ffffffff8150e80a>] end_repeat_nmi+0x1e/0x2e
 [<ffffffff8126a43f>] ? delay_tsc+0x1c/0x65
 [<ffffffff8126a43f>] ? delay_tsc+0x1c/0x65
 [<ffffffff8126a43f>] ? delay_tsc+0x1c/0x65
 <<EOE>>  [<ffffffff8126a3b4>] __delay+0xa/0xc
 [<ffffffff81075970>] do_raw_spin_lock+0xad/0x10b
 [<ffffffff8150bc53>] _raw_spin_lock+0x33/0x3a
 [<ffffffff8106120d>] ? vtime_account_user+0x1b/0x8d
 [<ffffffff8106120d>] vtime_account_user+0x1b/0x8d
 [<ffffffff810df2f9>] context_tracking_user_exit+0x54/0xb7
 [<ffffffff81030682>] do_page_fault+0x3a/0x54
 [<ffffffff8150e462>] page_fault+0x22/0x30
 [<ffffffff8106d954>] ? cpuacct_account_field+0x65/0x9a
 [<ffffffff81060d41>] account_user_time+0x62/0x95
 [<ffffffff81061254>] vtime_account_user+0x62/0x8d
 [<ffffffff810df2f9>] ? context_tracking_user_exit+0x54/0xb7
 [<ffffffff810df2f9>] context_tracking_user_exit+0x54/0xb7
 [<ffffffff8100e105>] syscall_trace_enter+0x1da/0x21a
 [<ffffffff8150c985>] tracesys+0x7e/0xe2
---[ end trace cfc02f46dcb212bc ]---


View attachment ".config" of type "text/plain" (81859 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ