lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sat, 13 Nov 2010 10:17:05 +1100
From:	Nick Piggin <npiggin@...nel.dk>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Nick Piggin <npiggin@...nel.dk>, Nick Piggin <npiggin@...il.com>,
	Eric Dumazet <eric.dumazet@...il.com>,
	Al Viro <viro@...iv.linux.org.uk>,
	linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
	Dave Chinner <dchinner@...hat.com>
Subject: Re: [patch 1/6] fs: icache RCU free inodes

On Fri, Nov 12, 2010 at 09:33:11AM -0800, Linus Torvalds wrote:
> On Thu, Nov 11, 2010 at 10:49 PM, Nick Piggin <npiggin@...nel.dk> wrote:
> >
> > In reality, it's likely to be well under 0.1% in any real workload, even
> > an inode intensive one. So I much prefer to err on the side of less
> > complexity, to start with. There just isn't much risk of regression
> > AFAIKS, and much more risk of becoming unmaintainable too complex.
> 
> Well, I have to say that if we don't get this lockless path lookup
> thing merged in the next merge window (ir 38-rc1), I'm going to be
> personally very disappointed (*).

I'm trying to piece things together. I'll hopefully be able to post
patches again soon for review.


> So yes, the "initial complexity" argument is certainly acceptable to
> me. It does make me suspect something is wrong, though, because quite
> frankly, the actual accesses to the inode during the lockless walk
> should be very _very_ controlled anyway. And it's trivial to do a "is
> this inode still the same one I started with" with zero locking, by
> just checking that "dentry->d_inode" is the same after-the-fact and
> checking that the dentry is still hashed. The inode type had better
> _NOT_ change if the dentry pointer is still there.
> 
> So even if the type or i_ops changes, none of that should matter in
> the least. Nobody should _care_. We might get two wildly different
> results, but we have a trivial way to check whether the inode was
> stable after-the-fact, and just punt if it wasn't. So it really smells
> like if this is an issue, there's something wrong going on.

Yes you are very right about that, it is actually possible to use
seqlocks and re-checking things to verify it after the fact. And
this is why I'm optimisic that we can tackle any and all regressions
that come up.

An example of where it can get more complicated:

A filesystem has an ->op function which gets the sb from inode->i_sb,
and then does the container_of thing, to get the filesystem specific
superblock so it can check flags to determine something (eg. whether
it is case sensitive or not).

If the inode goes away and i_sb can change, this can oops. We basically
just need to further tighten rules and further audit everyone. I'm not
saying it can't be done, I'm just saying it's not _totally_ trivail like
the usual DESTROY_BY_RCU pattern, so let's just see what incremental
patches look like.

I'm glad you agree at this point (and if it does turn out to be much
simpler than I anticipate, then hey that's great, we can just move to
DESTROY_BY_RCU even quicker).

Thanks,
Nick
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ