linux-kernel - Re: [PATCH] Revert "workqueue: re-add lockdep dependencies for flushing"

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <b61979ff-918b-a27e-3d3b-34e9cfc23083@grimberg.me>
Date:   Tue, 23 Oct 2018 14:30:52 -0700
From:   Sagi Grimberg <sagi@...mberg.me>
To:     Johannes Berg <johannes@...solutions.net>,
        Bart Van Assche <bvanassche@....org>,
        Tejun Heo <tj@...nel.org>
Cc:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Christoph Hellwig <hch@....de>,
        "linux-nvme @ lists . infradead . org" 
        <linux-nvme@...ts.infradead.org>
Subject: Re: [PATCH] Revert "workqueue: re-add lockdep dependencies for
 flushing"


> 
> I also commented over there regarding this specific problem.
> 
> I think my wording was inaccurate perhaps. It's not so much direct
> recursion, but a dependency chain connecting the handler_mutex class
> back to itself. This is how lockdep finds problems, it doesn't consider
> actual *instances* but only *classes*.
> 
> As Sagi mentioned on the other thread, in this specific instance you
> happen to know out-of-band that this is safe, (somehow) the code ensures
> that you can't actually recursively connect back to yourself. Perhaps
> there's some kind of layering involved, I don't know. (*)
> 
> As I said over there, this type of thing can be solved with
> mutex_lock_nested(), though it's possible that in this case it should
> really be flush_workqueue_nested() or flush_work_nested() or so.

I guess that could make this specific complaint complaint go away...

> johannes
> 
> (*) note that just checking "obj != self" might not be enough, depending
> on how you pick candidates for "obj" you might recurse down another
> level and end up back at yourself, just two levels down, and actually
> have a dependency chain leading back to yourself.

Its not a simple check, any removal invocation is serialized under a
dedicated workqueue and in order to be queued there, the connection need
to first be connected which comes later (after the flush takes place).