lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160607004427.GI14480@ZenIV.linux.org.uk>
Date:	Tue, 7 Jun 2016 01:44:27 +0100
From:	Al Viro <viro@...IV.linux.org.uk>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Dave Hansen <dave.hansen@...el.com>,
	"Chen, Tim C" <tim.c.chen@...el.com>,
	Ingo Molnar <mingo@...hat.com>,
	Davidlohr Bueso <dbueso@...e.de>,
	"Peter Zijlstra (Intel)" <peterz@...radead.org>,
	Jason Low <jason.low2@...com>,
	Michel Lespinasse <walken@...gle.com>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Waiman Long <waiman.long@...com>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: performance delta after VFS i_mutex=>i_rwsem conversion

On Tue, Jun 07, 2016 at 01:40:58AM +0100, Al Viro wrote:
> On Mon, Jun 06, 2016 at 04:50:59PM -0700, Linus Torvalds wrote:
> > 
> > 
> > On Mon, 6 Jun 2016, Al Viro wrote:
> > > 
> > > True in general, but here we really do a lot under that ->d_lock - all
> > > list traversals are under it.  So I suspect that contention on nested
> > > lock is not an issue in that particular load.  It's certainly a separate
> > > commit, so we'll see how much does it give on its own, but I doubt that
> > > it'll be anywhere near enough.
> > 
> > Hmm. Maybe. 
> > 
> > But at least we can try to minimize everything that happens under the 
> > dentry->d_lock spinlock. 
> > 
> > So how about this patch? It's entirely untested, but it rewrites that 
> > readdir() function to try to do the minimum possible under the d_lock 
> > spinlock.
> > 
> > I say "rewrite", because it really is totally different. It's not just 
> > that the nested "next" locking is gone, it also treats the cursor very 
> > differently and tries to avoid doing any unnecessary cursor list 
> > operations.
> 
> Similar to what I've got here, except that mine has a couple of helper
> functions usable in dcache_dir_lseek() as well:
> 
> next_positive(parent, child, n) - returns nth positive child after that one
> or NULL if there's less than n such.  NULL as the second argument => search
> from the beginning.
> 
> move_cursor(cursor, child) - moves cursor immediately past child *or* to
> the very end if child is NULL.
> 
> The third commit in series will be the lockless replacement for
> for next_positive().  move_cursor() is easy - it became simply
> 	struct dentry *parent = cursor->d_parent;
> 	unsigned n, *seq = &parent->d_inode->i_dir_seq;
> 	spin_lock(&parent->d_lock);
>         for (;;) {
>                 n = *seq;
>                 if (!(n & 1) && cmpxchg(seq, n, n + 1) == n)  
>                         break;
>                 cpu_relax();
>         }
>         __list_del(cursor->d_child.prev, cursor->d_child.next);
>         if (child)
>                 list_add(&cursor->d_child, &child->d_child);
>         else
>                 list_add_tail(&cursor->d_child, &parent->d_subdirs);
> 	smp_store_release(seq, n + 2);
> 	spin_unlock(&parent->d_lock);
> 
> with
> 
> static struct dentry *next_positive(struct dentry *parent,
>                                     struct dentry *child, int count)
> {
>         struct list_head *p = child ? &child->d_child : &parent->d_subdirs;
>         unsigned *seq = &parent->d_inode->i_dir_seq, n;
>         do {
>                 int i = count;
>                 n = smp_load_acquire(seq) & ~1;
>                 rcu_read_lock();
>                 do {
>                         p = p->next;
>                         if (p == &parent->d_subdirs) {
>                                 child = NULL;
>                                 break;
>                         }
>                         child = list_entry(p, struct dentry, d_child);
>                 } while (!simple_positive(child) || --i);
>                 rcu_read_unlock();
>         } while (unlikely(smp_load_acquire(seq) != n));
>         return child;
> }
> as initial attempt at lockless next_positive(); barriers are probably wrong,
> though...

FWIW,
loff_t dcache_dir_lseek(struct file *file, loff_t offset, int whence)
{
        struct dentry *dentry = file->f_path.dentry;
        switch (whence) {
                case 1:
                        offset += file->f_pos;
                case 0:
                        if (offset >= 0)
                                break;
                default:
                        return -EINVAL;
        }
        if (offset != file->f_pos) {
                file->f_pos = offset;
                if (file->f_pos >= 2) {
                        struct dentry *cursor = file->private_data;
                        loff_t n = file->f_pos - 2;

                        inode_lock_shared(dentry->d_inode);
                        move_cursor(cursor, next_positive(dentry, NULL, n));
                        inode_unlock_shared(dentry->d_inode);
                }
        }
        return offset;
}

and
int dcache_readdir(struct file *file, struct dir_context *ctx)
{
        struct dentry *dentry = file->f_path.dentry;
        struct dentry *cursor = file->private_data;
        struct dentry *child, *next;
        bool moved = false;

        if (!dir_emit_dots(file, ctx))
                return 0;

        child = ctx->pos != 2 ? cursor : NULL;
        while ((next = next_positive(dentry, child, 1)) != NULL) {
                if (!dir_emit(ctx, next->d_name.name, next->d_name.len,
                              d_inode(next)->i_ino, dt_type(d_inode(next))))
                        break;
                moved = true;
                child = next;
                ctx->pos++;
        }
        if (moved)
                move_cursor(cursor, child);
        return 0;
}

is what the methods themselves became.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ