[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220701115928.62kr7lpfs4i4ivrt@quack3.lan>
Date:   Fri, 1 Jul 2022 13:59:28 +0200
From:   Jan Kara <jack@...e.cz>
To:     Ioannis Angelakopoulos <iangelak@...com>
Cc:     "mingo@...hat.com" <mingo@...hat.com>,
        "jack@...e.com" <jack@...e.com>, "boris@....io" <boris@....io>,
        "josef@...icpanda.com" <josef@...icpanda.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: Modeling wait events with Lockdep
Hello!
On Thu 30-06-22 23:05:07, Ioannis Angelakopoulos wrote:
> I would like to ask some questions regarding modeling waiting for events 
> (i.e the wait_event) in Linux using Lockdep.
> I am trying to model these events in btrfs since there are deadlocks 
> detected involving waiting for events and Lockdep is not currently able 
> to address them (e.g., 
> https://lore.kernel.org/linux-btrfs/cover.1655147296.git.josef@toxicpanda.com/).
> 
> I am very new to Lockdep so I would like to know, what would be the 
> correct way of implementing these models using Lockdep?
>
> I noticed that JBD2 uses a read-write lockdep map. It takes the read 
> lockdep map when it creates a transaction handle and unlocks the read 
> lockdep map when it frees the handle. Also, every time the thread has to 
> wait for resources (e.g., transaction credits) and the handle is not 
> supposed to be alive, the thread locks and unlocks immediately the write 
> lockdep map before the waiting event (maybe I understood something wrong 
> here?).
No this is correct.
> Is this the only Lockdep model that can be used for these 
> waiting events?
We've used this model because what jbd2 with transaction handles is that
essentially every existing journal handle is a reference to the running
transaction - this reference is modeled by 'read acquisition' - and
transaction commit and consequently places waiting for more journal space
has to wait for all outstanding handles - this wait is modeled by the
'write acquisition'.
But certainly there are different wait-wake schemes that could be modeled
differently with lockdep.
> For your reference, here are 2 examples that we are trying to annotate 
> with Lockdep and we would like to know if we are on the correct track.
> 
> In the first example it makes sense to use the JBD2 model, however we 
> are not sure how to apply the model in the second case. The comments 
> indicate our concerns.
> 
> ------------------------------
> Simple Case:
> 
> TA
> rwsem_acquire_read(lockdep_map);
> cond=false
> do_work()
> cond=true
> rwsem_release_read(lockdep_map);
> signal(w)
> 
> TB
> rwsem_acquire(lockdep_map);
> rswem_release(lockdep_map);
> wait_event(w, cond==true)
> 
> Advanced Case:
> 
> TA
> rwsem_acquire_read(lockdep_map)
> cond=false
> // exits while holding the lock
> 
> TB
> cond=true
> rwsem_release_read(lockdep_map) // We do not know that we hold the lock
> signal(w)
> 
> TC
> rwsem_acquire(lockdep_map);
> rswem_release(lockdep_map);
> wait_event(w, cond==true)
So this is difficult to track with lockdep because lockdep supports only
task-local locking so when "resource ownership" moves between tasks things
are difficult to track. How we usually do this (e.g. we did something
similar in fs/aio.c where filesystem freeze protection is acquired on IO
submission and we release it on IO completion from a different task /
context) is that we do:
TA
rwsem_acquire_read(lockdep_map)
cond=false
// push this as far as it is reasonably possible in TA to allow lockdep to
// track what needs to be done in TA while waiting for TB to do work
rwsem_release_read(lockdep_map)
TB
// Tell lockdep TB has inherited the resource, push this as early as
// reasonably possible to allow lockdep track most dependencies
rwsem_acquire_read(lockdep_map)
cond=true
signal(w)
rwsem_release_read(lockdep_map)
It is not perfect and some dependencies may be missed but it's better than
nothing.
								Honza
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR
Powered by blists - more mailing lists
 
