linux-kernel - Re: [PATCH v3 1/3] lockdep: Make LOCKDEP_CROSSRELEASE configs all part of PROVE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170821154600.asyzqs2zg6w6o4pg@hirez.programming.kicks-ass.net>
Date:   Mon, 21 Aug 2017 17:46:00 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Byungchul Park <byungchul.park@....com>
Cc:     mingo@...nel.org, linux-kernel@...r.kernel.org,
        kernel-team@....com, Arnaldo Carvalho de Melo <acme@...nel.org>,
        Dave Chinner <david@...morbit.com>, Tejun Heo <tj@...nel.org>
Subject: Re: [PATCH v3 1/3] lockdep: Make LOCKDEP_CROSSRELEASE configs all
 part of PROVE_LOCKING



Booting the very latest -tip on my test machine gets me the below splat.

Dave, TJ, FYI, lockdep grew annotations for completions; it remembers
which locks were taken before we complete() and checks none of those are
held while we wait_for_completion().

That is, something like:

					mutex_lock(&A);
					mutex_unlock(&A);
					complete(&C);

	mutex_lock(&A);
	wait_for_completion(&C);

Is now considered a problem. Note that in order for C to link to A it
needs to have observed the complete() thread acquiring it after a
wait_for_completion(), something like:

	wait_for_completion(&C)
					mutex_lock(&A);
					mutex_unlock(&A);
					complete(&C);

That is, only locks observed taken between C's wait_for_completion() and
it's complete() are considered.

Now given the above observance rule and the fact that the below report
is from the complete, the thing that happened appears to be:


	lockdep_map_acquire(&work->lockdep_map)
	down_write(&A)

			down_write(&A)
			wait_for_completion(&C)

					lockdep_map_acquire(&work_lockdep_map);
					complete(&C)

Which lockdep then puked over because both sides saw the same work
class.

Byungchul; should we not exclude the work class itself, it appears to me
the workqueue code is explicitly parallel, or am I confused again?


[   38.882358] ======================================================
[   38.889256] WARNING: possible circular locking dependency detected
[   38.896155] 4.13.0-rc6-00609-g645737361ab6-dirty #3 Not tainted
[   38.902752] ------------------------------------------------------
[   38.909650] kworker/0:0/3 is trying to acquire lock:
[   38.915189]  ((&ioend->io_work)){+.+.}, at: [<ffffffff8110ed5f>] process_one_work+0x1ef/0x6b0
[   38.924715] 
[   38.924715] but now in release context of a crosslock acquired at the following:
[   38.934618]  ((complete)&bp->b_iowait){+.+.}, at: [<ffffffff8151ce42>] xfs_buf_submit_wait+0xb2/0x590
[   38.944919] 
[   38.944919] which lock already depends on the new lock.
[   38.944919] 
[   38.954046] 
[   38.954046] the existing dependency chain (in reverse order) is:
[   38.962397] 
[   38.962397] -> #2 ((complete)&bp->b_iowait){+.+.}:
[   38.969401]        __lock_acquire+0x10a1/0x1100
[   38.974460]        lock_acquire+0xea/0x1f0
[   38.979036]        wait_for_completion+0x3b/0x130
[   38.984285]        xfs_buf_submit_wait+0xb2/0x590
[   38.989535]        _xfs_buf_read+0x23/0x30
[   38.994108]        xfs_buf_read_map+0x14f/0x320
[   38.999169]        xfs_trans_read_buf_map+0x170/0x5d0
[   39.004812]        xfs_read_agf+0x97/0x1d0
[   39.009386]        xfs_alloc_read_agf+0x60/0x240
[   39.014541]        xfs_alloc_fix_freelist+0x32a/0x3d0
[   39.020180]        xfs_free_extent_fix_freelist+0x6b/0xa0
[   39.026206]        xfs_free_extent+0x48/0x120
[   39.031068]        xfs_trans_free_extent+0x57/0x200
[   39.036512]        xfs_extent_free_finish_item+0x26/0x40
[   39.042442]        xfs_defer_finish+0x174/0x770
[   39.047494]        xfs_itruncate_extents+0x126/0x470
[   39.053035]        xfs_setattr_size+0x2a1/0x310
[   39.058093]        xfs_vn_setattr_size+0x57/0x160
[   39.063342]        xfs_vn_setattr+0x50/0x70
[   39.068015]        notify_change+0x2ee/0x420
[   39.072785]        do_truncate+0x5d/0x90
[   39.077165]        path_openat+0xab2/0xc00
[   39.081737]        do_filp_open+0x8a/0xf0
[   39.086213]        do_sys_open+0x123/0x200
[   39.090787]        SyS_open+0x1e/0x20
[   39.094876]        entry_SYSCALL_64_fastpath+0x23/0xc2
[   39.100611] 
[   39.100611] -> #1 (&xfs_nondir_ilock_class){++++}:
[   39.107612]        __lock_acquire+0x10a1/0x1100
[   39.112660]        lock_acquire+0xea/0x1f0
[   39.117224]        down_write_nested+0x26/0x60
[   39.122184]        xfs_ilock+0x166/0x220
[   39.126563]        __xfs_setfilesize+0x30/0x200
[   39.131612]        xfs_setfilesize_ioend+0x7f/0xb0
[   39.136958]        xfs_end_io+0x49/0xf0
[   39.141240]        process_one_work+0x273/0x6b0
[   39.146288]        worker_thread+0x48/0x3f0
[   39.150960]        kthread+0x147/0x180
[   39.155146]        ret_from_fork+0x2a/0x40
[   39.159708] 
[   39.159708] -> #0 ((&ioend->io_work)){+.+.}:
[   39.166126]        process_one_work+0x244/0x6b0
[   39.171184]        worker_thread+0x48/0x3f0
[   39.175845]        kthread+0x147/0x180
[   39.180020]        ret_from_fork+0x2a/0x40
[   39.184593]        0xffffffffffffffff
[   39.188677] 
[   39.188677] other info that might help us debug this:
[   39.188677] 
[   39.197611] Chain exists of:
[   39.197611]   (&ioend->io_work) --> &xfs_nondir_ilock_class --> (complete)&bp->b_iowait
[   39.197611] 
[   39.211399]  Possible unsafe locking scenario by crosslock:
[   39.211399] 
[   39.219268]        CPU0                    CPU1
[   39.224321]        ----                    ----
[   39.229377]   lock(&xfs_nondir_ilock_class);
[   39.234135]   lock((complete)&bp->b_iowait);
[   39.238902]                                lock((&ioend->io_work));
[   39.245889]                                unlock((complete)&bp->b_iowait);
[   39.253660] 
[   39.253660]  *** DEADLOCK ***
[   39.253660] 
[   39.260267] 3 locks held by kworker/0:0/3:
[   39.264838]  #0:  ("xfs-buf/%s"mp->m_fsname){++++}, at: [<ffffffff8110ed5f>] process_one_work+0x1ef/0x6b0
[   39.275523]  #1:  ((&bp->b_ioend_work)){+.+.}, at: [<ffffffff8110ed5f>] process_one_work+0x1ef/0x6b0
[   39.285721]  #2:  (&x->wait#17){....}, at: [<ffffffff8113daed>] complete+0x1d/0x60
[   39.294176] 
[   39.294176] stack backtrace:
[   39.299039] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.13.0-rc6-00609-g645737361ab6-dirty #3
[   39.308749] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
[   39.320209] Workqueue: xfs-buf/sdb1 xfs_buf_ioend_work
[   39.325945] Call Trace:
[   39.328677]  dump_stack+0x86/0xcf
[   39.332377]  print_circular_bug+0x203/0x2f0
[   39.337048]  check_prev_add+0x3be/0x700
[   39.341331]  ? print_bfs_bug+0x40/0x40
[   39.345518]  lock_commit_crosslock+0x40d/0x5a0
[   39.350479]  ? lock_commit_crosslock+0x40d/0x5a0
[   39.355633]  ? xfs_buf_ioend_work+0x15/0x20
[   39.360304]  complete+0x29/0x60
[   39.363810]  xfs_buf_ioend+0x15e/0x470
[   39.367987]  xfs_buf_ioend_work+0x15/0x20
[   39.372463]  process_one_work+0x273/0x6b0
[   39.376931]  worker_thread+0x48/0x3f0
[   39.381021]  kthread+0x147/0x180
[   39.384624]  ? process_one_work+0x6b0/0x6b0
[   39.389294]  ? kthread_create_on_node+0x40/0x40
[   39.394351]  ret_from_fork+0x2a/0x40