lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sat, 16 Oct 2010 04:33:45 +1100
From:	Nick Piggin <npiggin@...nel.dk>
To:	Nick Piggin <npiggin@...nel.dk>
Cc:	Dave Chinner <david@...morbit.com>,
	Christoph Hellwig <hch@...radead.org>,
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 17/18] fs: icache remove inode_lock

On Sat, Oct 16, 2010 at 12:29:38AM +1100, Nick Piggin wrote:
> On Sat, Oct 16, 2010 at 12:03:00AM +1100, Nick Piggin wrote:
> > On Fri, Oct 15, 2010 at 09:59:43PM +1100, Dave Chinner wrote:
> > > My series uses i_lock only to protect i_state and i_ref. It does not
> > > need to protect any more of the inode than that as other locks
> > > protect the other list fields. As a result, it's still the inermost
> > > lock and there are no trylocks in the code at all.
> > We discussed it and I didn't think latencies would be any worse
> > a problem than they are today. I agreed it may be an issue and
> > pointed out there are ways forward to fix it.
> 
> BTW. if a few trylocks are your biggest issue, this is a joke. I told
> you how they can be fixed with incremental patches on top of the series
> (which basically whittle down the lock coverage of the old inode_lock,
> and so IMO need to be done in small chunks well bisectable and with
> good rationale). So why you didn't submit a couple of incremental
> patches to do just that is beyond me.
> 
> I've had prototypes in my tree actually to do that from a while back,
> but actually I'm thinking that using RCU may be a better way to go
> now that Linus has agreed on it and we have sketched a design to do
> slab-free-RCU.
> 
> Either way, it's much easier to compare pros and cons of each, when
> they are done incrementally on top of the existing base.

To illustrate my point, if it needs to be spelled out. If you rcu lock a
list then locking can be quite changed. Starting with my basic first
transform which has inode protected by i_lock:

/* pass1, dumb */
spin_lock(&inode->i_lock);
spin_lock(&list_lock);
list_del(&inode->list);

...

spin_lock(&list_lock);
list_for_each_entry(inode, list) {
    if (inode->blah ...) {
      if (!spin_trylock(&inode->i_lock))
          goto repeat; /* oh no, it's unmaintainable */
      do_something(inode);
   }
}

Common pattern, used by several of the inode lists, OK?

/* pass2a, RCU optimised */
rcu_read_lock()
list_for_each_entry(inode, list) {
    if (inode->blah ...) {
      spin_lock(&inode->i_lock); /* no trylock. easy. */
      if (unlikely(list_empty(&inode->i_list)))
        continue;
      do_something(inode);
   }
}


/* pass2b, reduced i_lock width */
/* no i_lock */
spin_lock(&list_lock);
list_del(&inode->list);

...

spin_lock(&list_lock);
list_for_each_entry(inode, list) {
    if (inode->blah ...) {
      do_something(inode);
   }
}

OK? Do you see a problem yet? Try turning into RCU.

rcu_read_lock();
list_for_each_entry(inode, list) {
    if (inode->blah ...) {
      spin_lock(&list_lock);
      if (unlikely(list_empty(&inode->i_list)))
        continue;
      do_something(inode);
   }
}

So you've saved one (extremely fine-grained, cache hot) lock in the
slowpath, but you've exchanged a fine grained, cache hot lock in the
fastpath for a more expensive lock. You've also lost the really nice
(for maintainability and simplicity) ability to freeze all modifications
to the inode with the single i_lock.

So... you just happened to decide for yourself that your approach is
better, just because. All your extensive trials and experiemnts with
different locking orders lead you to believe that my preferred RCU
approach is wrong and yours is right? Can you please share your working
out with the rest of the class?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ