linux-kernel - Re: [PATCH v3 1/1] dcache: Translating dentry into pathname without taking rename

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Date:	Mon, 09 Sep 2013 10:31:36 -0400
From:	Waiman Long <waiman.long@...com>
To:	Al Viro <viro@...IV.linux.org.uk>
CC:	Linus Torvalds <torvalds@...ux-foundation.org>,
	linux-fsdevel <linux-fsdevel@...r.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	"Chandramouleeswaran, Aswin" <aswin@...com>,
	"Norton, Scott J" <scott.norton@...com>,
	George Spelvin <linux@...izon.com>,
	John Stoffel <john@...ffel.org>
Subject: Re: [PATCH v3 1/1] dcache: Translating dentry into pathname without
 taking rename_lock

On 09/07/2013 02:07 PM, Al Viro wrote:
> On Sat, Sep 07, 2013 at 10:52:02AM -0700, Linus Torvalds wrote:
>
>> So I think we could make a more complicated data structure that looks
>> something like this:
>>
>>     struct seqlock_retry {
>>        unsigned int seq_no;
>>        int state;
>>     };
>>
>> and pass that around. Gcc should do pretty well, especially if we
>> inline things (but even if not, small structures that fit in 64 bytes
>> generate reasonable code even on 32-bit targets, because gcc knows
>> about using two registers for passing data around)..
>>
>> Then you can make "state" have a retry counter in it, and have a
>> negative value mean "I hold the lock for writing". Add a couple of
>> helper functions, and you can fairly easily handle the mixed "try for
>> reading first, then fall back to writing".
>>
>> That said, __d_lookup() still shows up as very performance-critical on
>> some loads (symlinks in particular cause us to fall out of the RCU
>> cases) so I'd like to keep that using the simple pure read case. I
>> don't believe you can livelock it, as mentioned. But the other ones
>> might well be worth moving to a "fall back to write-locking after<n>
>> tries" model. They might all traverse user-specified paths of fairly
>> arbitrary depth, no?
>>
>> So this "seqlock_retry" thing wouldn't _replace_ bare seqlocks, it
>> would just be a helper thing for this kind of behavior where we want
>> to normally do things with just the read-lock, but want to guarantee
>> that we don't live-lock.
>>
>> Sounds reasonable?
> More or less; I just wonder if we are overdesigning here - if we don't
> do "repeat more than once", we can simply use the lower bit of seq -
> read_seqlock() always returns an even value.  So we could do something
> like seqretry_and_lock(lock,&seq):
> 	if ((*seq&  1) || !read_seqretry(lock, *seq))
> 		return true;
> 	*seq |= 1;
> 	write_seqlock(lock);
> 	return false;
> and seqretry_done(lock, seq):
> 	if (seq&  1)
> 		write_sequnlock(lock);
> with these loops turning into
> 	seq = read_seqlock(&rename_lock);
> 	...
> 	if (!seqretry_and_lock(&rename_lock,&seq))
> 		goto again;
> 	...
> 	seqretry_done(&rename_lock);

I am fine with try it once and then lock instead of trying it a few 
times. Now are you planning to have these helper functions for the 
dcache layer only or as part of the seqlock infrastructure? If we are 
going to touch the seqlock layer, I would suggest adding a blocking 
reader that takes the lock, but won't update the sequence number so that 
it won't block other sequence readers as my original seqlock patch was 
doing.

-Longman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/