[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20101008101549.GC4681@dastard>
Date: Fri, 8 Oct 2010 21:15:49 +1100
From: Dave Chinner <david@...morbit.com>
To: Al Viro <viro@...IV.linux.org.uk>
Cc: linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 09/18] fs: rework icount to be a locked variable
On Fri, Oct 08, 2010 at 10:32:02AM +0100, Al Viro wrote:
> On Fri, Oct 08, 2010 at 04:21:23PM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@...hat.com>
> >
> > The inode reference count is currently an atomic variable so that it can be
> > sampled/modified outside the inode_lock. However, the inode_lock is still
> > needed to synchronise the final reference count and checks against the inode
> > state.
> >
> > To avoid needing the protection of the inode lock, protect the inode reference
> > count with the per-inode i_lock and convert it to a normal variable. To avoid
> > existing out-of-tree code accidentally compiling against the new method, rename
> > the i_count field to i_ref. This is relatively straight forward as there
> > are limited external references to the i_count field remaining.
>
> You are overdoing the information hiding here; _way_ too many small
> functions that don't buy you anything so far, AFAICS.
See akpm's comments on the previous version of the series.
> Moreover, why
> the hell not make them static inlines and get rid of the exports?
Yes, that is probably sensible.
>
> > - if (atomic_add_unless(&inode->i_count, -1, 1))
> > + /* XXX: filesystems should not play refcount games like this */
> > + spin_lock(&inode->i_lock);
> > + if (inode->i_ref > 1) {
> > + inode->i_ref--;
> > + spin_unlock(&inode->i_lock);
> > return;
> > + }
> > + spin_unlock(&inode->i_lock);
>
> ... or, perhaps, they needs a helper along the lines of "try to do iput()
> if it's known to hit easy case".
>
> I really don't like the look of code around -ENOSPC returns, though.
> What exactly is going on there? Can it e.g. interfere with that
> delayed iput stuff?
I have no idea what the btrfs code is doing, hence I haven't tried
to clean it up or provide any helpers for it. It looks like a hack
around a problem in the btrfs reference counting model to me...
>
> > void iref(struct inode *inode)
> > {
> > spin_lock(&inode_lock);
> > + spin_lock(&inode->i_lock);
> > iref_locked(inode);
> > + spin_unlock(&inode->i_lock);
> > spin_unlock(&inode_lock);
> > }
>
> *cringe*
>
> > int iref_read(struct inode *inode)
> > {
> > - return atomic_read(&inode->i_count);
> > + int ref;
> > +
> > + spin_lock(&inode->i_lock);
> > + ref = inode->i_ref;
> > + spin_unlock(&inode->i_lock);
> > + return ref;
>
> What's the point of locking here?
It can be replaced with a memory barrier, right?
> > @@ -1324,8 +1359,16 @@ void iput(struct inode *inode)
> > if (inode) {
> > BUG_ON(inode->i_state & I_CLEAR);
> >
> > - if (atomic_dec_and_lock(&inode->i_count, &inode_lock))
> > + spin_lock(&inode_lock);
> > + spin_lock(&inode->i_lock);
> > + inode->i_ref--;
> > + if (inode->i_ref == 0) {
> > + spin_unlock(&inode->i_lock);
> > iput_final(inode);
> > + return;
> > + }
>
> *UGH* So you take inode_lock on every damn iput()?
Only until the inode_lock is removed completely.
> > state->owner = owner;
> > atomic_inc(&owner->so_count);
> > list_add(&state->inode_states, &nfsi->open_states);
> > - state->inode = igrab(inode);
> > spin_unlock(&inode->i_lock);
> > + state->inode = igrab(inode);
>
> Why is that safe?
Why wouldn't it be? This is code inherited from Nick's patches, so
I haven't looked this particular hunk in great detail. I've made the
assumption that if the inode passed in doesn't already have a
reference, then that code is already broken.
Instead, it probably should be converted to a iref_locked() call
instead of igrab().
>
> > --- a/fs/notify/inode_mark.c
> > +++ b/fs/notify/inode_mark.c
> > @@ -257,7 +257,8 @@ void fsnotify_unmount_inodes(struct list_head *list)
> > * actually evict all unreferenced inodes from icache which is
> > * unnecessarily violent and may in fact be illegal to do.
> > */
> > - if (!iref_read(inode))
> > + spin_lock(&inode->i_lock);
> > + if (!inode->i_ref)
> > continue;
>
> Really?
Good catch. It looks like a change split across 2 patches - it is correct when
all patches are applied. Will fix.
Cheers,
Dave.
--
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists