linux-kernel - Re: [PATCH v3 1/1] dcache: Translating dentry into pathname without taking rename

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20130907180724.GE13318@ZenIV.linux.org.uk>
Date:	Sat, 7 Sep 2013 19:07:24 +0100
From:	Al Viro <viro@...IV.linux.org.uk>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Waiman Long <Waiman.Long@...com>,
	linux-fsdevel <linux-fsdevel@...r.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	"Chandramouleeswaran, Aswin" <aswin@...com>,
	"Norton, Scott J" <scott.norton@...com>,
	George Spelvin <linux@...izon.com>,
	John Stoffel <john@...ffel.org>
Subject: Re: [PATCH v3 1/1] dcache: Translating dentry into pathname without
 taking rename_lock

On Sat, Sep 07, 2013 at 10:52:02AM -0700, Linus Torvalds wrote:

> So I think we could make a more complicated data structure that looks
> something like this:
> 
>    struct seqlock_retry {
>       unsigned int seq_no;
>       int state;
>    };
> 
> and pass that around. Gcc should do pretty well, especially if we
> inline things (but even if not, small structures that fit in 64 bytes
> generate reasonable code even on 32-bit targets, because gcc knows
> about using two registers for passing data around)..
> 
> Then you can make "state" have a retry counter in it, and have a
> negative value mean "I hold the lock for writing". Add a couple of
> helper functions, and you can fairly easily handle the mixed "try for
> reading first, then fall back to writing".
> 
> That said, __d_lookup() still shows up as very performance-critical on
> some loads (symlinks in particular cause us to fall out of the RCU
> cases) so I'd like to keep that using the simple pure read case. I
> don't believe you can livelock it, as mentioned. But the other ones
> might well be worth moving to a "fall back to write-locking after <n>
> tries" model. They might all traverse user-specified paths of fairly
> arbitrary depth, no?
> 
> So this "seqlock_retry" thing wouldn't _replace_ bare seqlocks, it
> would just be a helper thing for this kind of behavior where we want
> to normally do things with just the read-lock, but want to guarantee
> that we don't live-lock.
> 
> Sounds reasonable?

More or less; I just wonder if we are overdesigning here - if we don't
do "repeat more than once", we can simply use the lower bit of seq -
read_seqlock() always returns an even value.  So we could do something
like seqretry_and_lock(lock, &seq):
	if ((*seq & 1) || !read_seqretry(lock, *seq))
		return true;
	*seq |= 1;
	write_seqlock(lock);
	return false;
and seqretry_done(lock, seq):
	if (seq & 1)
		write_sequnlock(lock);
with these loops turning into
	seq = read_seqlock(&rename_lock);
	...
	if (!seqretry_and_lock(&rename_lock, &seq))
		goto again;
	...
	seqretry_done(&rename_lock);

But I'd really like to understand the existing zoo - in particular, ceph and
cifs users can't be converted to anything of that kind (blocking kmalloc()
can't live under write_seqlock()) and they are _easier_ to livelock than
d_path(), due to the same kmalloc() widening the window.  Guys, do we really
care about precisely-sized allocations there?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/