linux-kernel - Re: [PATCH v3 1/3] lockdep: Make LOCKDEP_CROSSRELEASE configs all part of PROVE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170822144602.uh5jzkkchvdgzs3s@hirez.programming.kicks-ass.net>
Date:   Tue, 22 Aug 2017 16:46:02 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Byungchul Park <byungchul.park@....com>
Cc:     mingo@...nel.org, linux-kernel@...r.kernel.org,
        kernel-team@....com, Arnaldo Carvalho de Melo <acme@...nel.org>,
        Dave Chinner <david@...morbit.com>, Tejun Heo <tj@...nel.org>,
        johannes@...solutions.net, Oleg Nesterov <oleg@...hat.com>
Subject: Re: [PATCH v3 1/3] lockdep: Make LOCKDEP_CROSSRELEASE configs all
 part of PROVE_LOCKING

On Tue, Aug 22, 2017 at 03:49:22PM +0200, Peter Zijlstra wrote:
> Now, this means I also have to consider the existing
> lock_map_acquire_read() users and if they really wanted to be recursive
> or not. When I change lock_map_acquire_read() to use
> lock_acquire_shared() this annotation no longer suffices and the splat
> comes back.
> 
> 
> Also, the acquire_read() annotation will (obviously) no longer work to
> cure this problem when we switch to normal read (1), because then the
> generated chain:
> 
> 	W(1) -> A(0) -> C(0) -> W(1)
> 
> spells deadlock, since W isn't allowed to recurse.
> 
> 
> /me goes dig through commit:
> 
>   e159489baa71 ("workqueue: relax lockdep annotation on flush_work()")
> 
> to figure out wth the existing users really want.

Yep, they really want recursive, the pattern there is one work flushing
another on the same workqueue, which ends up being:

Work W1:		Work W2:	Task:

AR(Q)			AR(Q)		M(A)
A(W1)			A(W2)		flush_workqueue(Q)
  flush_work(W2)	  M(A)		  A(Q)
    A(W2)		R(W2)		  R(Q)
    R(W2)		R(Q)
    AR(Q)
    R(Q)
R(W1)
R(Q)

should spell deadlock (AQ-QA), and W1 takes Q recursively.

I am however slightly puzzled by the need of flush_work() to take Q,
what deadlock potential is there?

Task:			Work-W1:	Work-W2:

M(A)			AR(Q)		AR(Q)
flush_work(W1)		A(W1)		A(W2)
 A(W1)					  M(A)
 R(W1)
 AR(Q)
 R(Q)

Spells deadlock on AQ-QA, but why? Why is flush_work() linked to any lock
taken inside random other works. If we can get rid of flush_work()'s
usage of Q, we can drop the recursive nature.

It was added by Oleg in commit:

  a67da70dc095 ("workqueues: lockdep annotations for flush_work()")

Which has a distinct lack of Changelog. However, that is still very much
the old workqueue code, where I think the annotation makes sense because
that was a single thread running the works consecutively. But I don't
see it making sense for the current workqueue that runs works
concurrently.

TJ, Oleg, can we agree flush_work() no longer needs the dependency on Q?

Also, TJ, what protects @pwq in start_flush_work() at the time of
lock_map_*() ?

Also^2, TJ, what's the purpose of using atomic_long_t for work->data?
All it ever seems to do is atomic_long_read() and atomic_long_set(),
neither of which provides anything stronger than
READ_ONCE()/WRITE_ONCE() respectively.