[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140522132556.GE7999@quack.suse.cz>
Date: Thu, 22 May 2014 15:25:56 +0200
From: Jan Kara <jack@...e.cz>
To: Niu Yawei <yawei.niu@...il.com>
Cc: linux-fsdevel@...r.kernel.org, linux-ext4@...r.kernel.org,
yawei.niu@...el.com, andreas.dilger@...el.com, jack@...e.cz,
lai.siyao@...el.com
Subject: Re: [PATCH] quota: remove dqptr_sem for scalability
Hello,
thanks for the work! I have some comments below.
On Thu 22-05-14 18:47:22, Niu Yawei wrote:
> There are several global locks in the VFS quota code which hurts
> performance a lot when quota accounting enabled, dqptr_sem is the major one.
>
> This patch tries to make the VFS quota code scalable with minimal changes.
>
> Following tests (mdtest & dbench) were running over ext4 fs in a
> centos6.5 vm (8 cpus, 4G mem, kenrel: 3.15.0-rc5+), and the result shows
> the patch relieved the lock congestion a lot.
>
Snipped performance results - thanks for those but first let's concentrate
on correctness side of things.
> [PATCH] quota: remove dqptr_sem for scalability
>
> Remove dqptr_sem (but kept in struct quota_info to keep kernel ABI
> unchanged), and the functionality of this lock is implemented by
> other locks:
> * i_dquot is protected by i_lock, however only this pointer, the
> content of this struct is by dq_data_lock.
> * Q_GETFMT is now protected with dqonoff_mutex instead of dqptr_sem.
> * Small changes in __dquot_initialize() to avoid unnecessary
> dqget()/dqput() calls.
Each of these three steps should be a separate patch please.
Now regarding each of these steps: Using i_lock for protection of dquot
pointers doesn't quite work. You have e.g.:
> @@ -1636,12 +1646,12 @@ int __dquot_alloc_space(struct inode *inode, qsize_t number, int flags)
> }
> inode_incr_space(inode, number, reserve);
> spin_unlock(&dq_data_lock);
> + spin_unlock(&inode->i_lock);
>
> if (reserve)
> goto out_flush_warn;
> mark_all_dquot_dirty(dquots);
> out_flush_warn:
> - up_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
> flush_warnings(warn);
> out:
> return ret;
So you release protection of dquot pointers from inode before calling
mark_all_dquot_dirty(). So dquot pointers can be removed by
remove_inode_dquot_ref() while mark_all_dquot_dirty() works on them. That's
wrong and can lead to use after free.
Quota code uses dqptr_sem to provide exclusion for three cases:
* dquot_init()
* dquot_transfer()
* various places just reading dquot pointers to update allocation
information
* remove_dquot_ref() (called from quotaoff code)
Now exclusion against remove_dquot_ref() is relatively easy to deal with.
We can create srcu for dquots, whoever looks at dquot pointers from inode
takes srcu_read_lock() (so you basically convert read side of dqptr_sem
and write side in dquot_init() and dquot_transfer() to that) and use
synchronize_srcu() in remove_dquot_ref() to make sure all users are done
before starting to remove dquot pointers. You'll need to move
dquot_active() checks inside srcu_read_lock() to make this reliable but that
should be easy.
What remains to deal with is an exclusion between the remaining places.
dquot_init(). dq_data_lock spinlock should already provide the necessary
exclusion between readers of allocation pointers and dquot_transfer() (that
spinlock would actually be a good candidate to a conversion to per-inode
one - using i_lock for this should noticeably reduce the contention - but
that's the next step). dquot_init() doesn't take the spinlock so far but
probably we can make it to take it for simplicity for now.
> static void __dquot_initialize(struct inode *inode, int type)
> {
> - int cnt;
> - struct dquot *got[MAXQUOTAS];
> + int cnt, dq_get = 0;
> + struct dquot *got[MAXQUOTAS] = { NULL, NULL };
> struct super_block *sb = inode->i_sb;
> qsize_t rsv;
>
> - /* First test before acquiring mutex - solves deadlocks when we
> - * re-enter the quota code and are already holding the mutex */
> if (!dquot_active(inode))
> return;
>
> - /* First get references to structures we might need. */
> + /* In most case, the i_dquot should have been initialized, except
> + * the newly allocated one. We'd always try to skip the dqget() and
> + * dqput() calls to avoid unnecessary global lock contention. */
> + if (!(inode->i_state & I_NEW))
> + goto init_idquot;
The optimization is a good idea but dquot_init() is often called for
!I_NEW inodes when the initialization is necessary. So I'd rather first
check whether relevant i_dquot[] pointers are != NULL before taking any
lock. If yes, we are done, otherwise we grab pointers to dquots, take the
lock and update the pointers.
Honza
--
Jan Kara <jack@...e.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists