[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201221173538.GQ3579531@ZenIV.linux.org.uk>
Date: Mon, 21 Dec 2020 17:35:38 +0000
From: Al Viro <viro@...iv.linux.org.uk>
To: Liangyan <liangyan.peng@...ux.alibaba.com>
Cc: Joseph Qi <joseph.qi@...ux.alibaba.com>,
Miklos Szeredi <miklos@...redi.hu>,
linux-unionfs@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] ovl: fix dentry leak in ovl_get_redirect
On Tue, Dec 22, 2020 at 12:51:27AM +0800, Liangyan wrote:
> This is the race scenario based on call trace we captured which cause the
> dentry leak.
>
>
> CPU 0 CPU 1
> ovl_set_redirect lookup_fast
> ovl_get_redirect __d_lookup
> dget_dlock
> //no lock protection here spin_lock(&dentry->d_lock)
> dentry->d_lockref.count++ dentry->d_lockref.count++
>
>
> If we use dget_parent instead, we may have this race.
>
>
> CPU 0 CPU 1
> ovl_set_redirect lookup_fast
> ovl_get_redirect __d_lookup
> dget_parent
> raw_seqcount_begin(&dentry->d_seq) spin_lock(&dentry->d_lock)
> lockref_get_not_zero(&ret->d_lockref) dentry->d_lockref.count++
And?
lockref_get_not_zero() will observe ->d_lock held and fall back to
taking it.
The whole point of lockref is that counter and spinlock are next to each
other. Fastpath in lockref_get_not_zero is cmpxchg on both, and
it is taken only if ->d_lock is *NOT* locked. And the slow path
there will do spin_lock() around the manipulations of ->count.
Note that ->d_lock is simply ->d_lockref.lock; ->d_seq has nothing
to do with the whole thing.
The race in mainline is real; if you can observe anything of that
sort with dget_parent(), we have much worse problem. Consider
dget() vs. lookup_fast() - no overlayfs weirdness in sight and the
same kind of concurrent access.
Again, lockref primitives can be safely mixed with other threads
doing operations on ->count while holding ->lock.
Powered by blists - more mailing lists