linux-kernel - Re: [PATCH 17/18] fs: icache remove inode

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20101015205045.GA20476@amd>
Date:	Sat, 16 Oct 2010 07:50:45 +1100
From:	Nick Piggin <npiggin@...nel.dk>
To:	Dave Chinner <david@...morbit.com>
Cc:	Nick Piggin <npiggin@...nel.dk>,
	Christoph Hellwig <hch@...radead.org>,
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 17/18] fs: icache remove inode_lock

On Fri, Oct 15, 2010 at 09:59:43PM +1100, Dave Chinner wrote:
> On Fri, Oct 15, 2010 at 05:41:50PM +1100, Nick Piggin wrote:
> > You're worried about mere mortals reviewing and understanding it...
> > I don't really know. If you understand inode locking today, you
> > can understand the inode scaling series quite easily. Ditto for
> > dcache. rcu-walk path walking is trickier, but it is described in
> > detail in documentation and changelog.
> > 
> > And you can understand the high level approach without exactly
> > digesting every detail at once. The inode locking work goes to
> > break up all global locks:
> > 
> > - a single inode object is protected (to the same level as
> >   inode_lock) with i_lock. This makes it really trivial for
> >   filesystems to lock down the object without taking a global
> >   lock.
> 
> Which is unnecessarily wide, and results in i_lock  having to have
> list locks nested inside it, and that leads to the lock
> inversion try-lock mess that several people have complained about.

Gee, you keep repeating this so often that you have me doubting
myself a tiny bit, so I have to check.

$ grep spin_trylock fs/inode.c fs/fs-writeback.c
fs/inode.c:             if (!spin_trylock(&inode->i_lock)) {
fs/inode.c:             if (!spin_trylock(&old->i_lock)) {
fs/inode.c:             if (!spin_trylock(&old->i_lock)) {
fs/fs-writeback.c:      if (!spin_trylock(&inode->i_lock)) {
fs/fs-writeback.c:      if (!spin_trylock(&inode->i_lock)) {
fs/fs-writeback.c:      if (!spin_trylock(&inode->i_lock)) {

This is your unmaintainable mess?  You decided to rewrite your own
vfs-scale tree because I wanted i_lock to protect the icache state (and
offered you very good reason for)?

Well, surely they must be horrible complex and unmaintainable
beasts....

repeat:
	/*
	 * Don't use RCU walks, common case of no old inode
	 * found requires hash lock.
	 */
        spin_lock_bucket(b);
        hlist_bl_for_each_entry(old, node, &b->head, i_hash) {
                if (old->i_ino != ino)
                        continue;
                if (old->i_sb != sb)
                        continue;
                if (old->i_state & (I_FREEING|I_WILL_FREE))
                        continue;
                if (!spin_trylock(&old->i_lock)) {
                        spin_unlock_bucket(b);
                        cpu_relax();
                        goto repeat;
                }

Nope, no big deal. The rest are much the same. So thanks for the
repeated suggestion, but I'll actually prefer to keep my regular i_lock
locking scheme where you don't need to look up the documentation and
think hard about coherency between protected and unprotected parts of
the inode whenever you use it. I didn't stumble upon my locking design
by chance.

If you think a few trylocks == impending doom, then xfs is looking
pretty poorly at this point. So I would ask that you stop making things
up about my patch series. If you dislike the trylocks so much that you
think it is worth breaking the i_lock regularity or using RCU or
whatever, then please propose them as incremental patches to the end
of my series where you can see they logically will fit. You know I will
argue that locking consistency is more important for maintainability
than these few trylocks, however.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/