[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <7ecc0429-fff8-513c-07a1-9aeaeb37fb00@fb.com>
Date: Wed, 6 Jul 2022 00:51:26 +0000
From: Ioannis Angelakopoulos <iangelak@...com>
To: Jan Kara <jack@...e.cz>
CC: "mingo@...hat.com" <mingo@...hat.com>,
"jack@...e.com" <jack@...e.com>, "boris@....io" <boris@....io>,
"josef@...icpanda.com" <josef@...icpanda.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: Modeling wait events with Lockdep
On 7/1/22 4:59 AM, Jan Kara wrote:
> Hello!
>
> On Thu 30-06-22 23:05:07, Ioannis Angelakopoulos wrote:
>> I would like to ask some questions regarding modeling waiting for events
>> (i.e the wait_event) in Linux using Lockdep.
>> I am trying to model these events in btrfs since there are deadlocks
>> detected involving waiting for events and Lockdep is not currently able
>> to address them (e.g.,
>> https://lore.kernel.org/linux-btrfs/cover.1655147296.git.josef@toxicpanda.com/).
>>
>> I am very new to Lockdep so I would like to know, what would be the
>> correct way of implementing these models using Lockdep?
>>
>> I noticed that JBD2 uses a read-write lockdep map. It takes the read
>> lockdep map when it creates a transaction handle and unlocks the read
>> lockdep map when it frees the handle. Also, every time the thread has to
>> wait for resources (e.g., transaction credits) and the handle is not
>> supposed to be alive, the thread locks and unlocks immediately the write
>> lockdep map before the waiting event (maybe I understood something wrong
>> here?).
>
> No this is correct.
>
>> Is this the only Lockdep model that can be used for these
>> waiting events?
>
> We've used this model because what jbd2 with transaction handles is that
> essentially every existing journal handle is a reference to the running
> transaction - this reference is modeled by 'read acquisition' - and
> transaction commit and consequently places waiting for more journal space
> has to wait for all outstanding handles - this wait is modeled by the
> 'write acquisition'.
>
> But certainly there are different wait-wake schemes that could be modeled
> differently with lockdep.
>
>> For your reference, here are 2 examples that we are trying to annotate
>> with Lockdep and we would like to know if we are on the correct track.
>>
>> In the first example it makes sense to use the JBD2 model, however we
>> are not sure how to apply the model in the second case. The comments
>> indicate our concerns.
>>
>> ------------------------------
>> Simple Case:
>>
>> TA
>> rwsem_acquire_read(lockdep_map);
>> cond=false
>> do_work()
>> cond=true
>> rwsem_release_read(lockdep_map);
>> signal(w)
>>
>> TB
>> rwsem_acquire(lockdep_map);
>> rswem_release(lockdep_map);
>> wait_event(w, cond==true)
>>
>> Advanced Case:
>>
>> TA
>> rwsem_acquire_read(lockdep_map)
>> cond=false
>> // exits while holding the lock
>>
>> TB
>> cond=true
>> rwsem_release_read(lockdep_map) // We do not know that we hold the lock
>> signal(w)
>>
>> TC
>> rwsem_acquire(lockdep_map);
>> rswem_release(lockdep_map);
>> wait_event(w, cond==true)
>
> So this is difficult to track with lockdep because lockdep supports only
> task-local locking so when "resource ownership" moves between tasks things
> are difficult to track. How we usually do this (e.g. we did something
> similar in fs/aio.c where filesystem freeze protection is acquired on IO
> submission and we release it on IO completion from a different task /
> context) is that we do:
>
> TA
> rwsem_acquire_read(lockdep_map)
> cond=false
> // push this as far as it is reasonably possible in TA to allow lockdep to
> // track what needs to be done in TA while waiting for TB to do work
> rwsem_release_read(lockdep_map)
>
> TB
> // Tell lockdep TB has inherited the resource, push this as early as
> // reasonably possible to allow lockdep track most dependencies
> rwsem_acquire_read(lockdep_map)
> cond=true
> signal(w)
> rwsem_release_read(lockdep_map)
>
> It is not perfect and some dependencies may be missed but it's better than
> nothing.
>
> Honza
Thank you so much for the clarification and your illustrative example!
Powered by blists - more mailing lists