lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Tue, 7 Jun 2016 01:44:27 +0100 From: Al Viro <viro@...IV.linux.org.uk> To: Linus Torvalds <torvalds@...ux-foundation.org> Cc: Dave Hansen <dave.hansen@...el.com>, "Chen, Tim C" <tim.c.chen@...el.com>, Ingo Molnar <mingo@...hat.com>, Davidlohr Bueso <dbueso@...e.de>, "Peter Zijlstra (Intel)" <peterz@...radead.org>, Jason Low <jason.low2@...com>, Michel Lespinasse <walken@...gle.com>, "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>, Waiman Long <waiman.long@...com>, LKML <linux-kernel@...r.kernel.org> Subject: Re: performance delta after VFS i_mutex=>i_rwsem conversion On Tue, Jun 07, 2016 at 01:40:58AM +0100, Al Viro wrote: > On Mon, Jun 06, 2016 at 04:50:59PM -0700, Linus Torvalds wrote: > > > > > > On Mon, 6 Jun 2016, Al Viro wrote: > > > > > > True in general, but here we really do a lot under that ->d_lock - all > > > list traversals are under it. So I suspect that contention on nested > > > lock is not an issue in that particular load. It's certainly a separate > > > commit, so we'll see how much does it give on its own, but I doubt that > > > it'll be anywhere near enough. > > > > Hmm. Maybe. > > > > But at least we can try to minimize everything that happens under the > > dentry->d_lock spinlock. > > > > So how about this patch? It's entirely untested, but it rewrites that > > readdir() function to try to do the minimum possible under the d_lock > > spinlock. > > > > I say "rewrite", because it really is totally different. It's not just > > that the nested "next" locking is gone, it also treats the cursor very > > differently and tries to avoid doing any unnecessary cursor list > > operations. > > Similar to what I've got here, except that mine has a couple of helper > functions usable in dcache_dir_lseek() as well: > > next_positive(parent, child, n) - returns nth positive child after that one > or NULL if there's less than n such. NULL as the second argument => search > from the beginning. > > move_cursor(cursor, child) - moves cursor immediately past child *or* to > the very end if child is NULL. > > The third commit in series will be the lockless replacement for > for next_positive(). move_cursor() is easy - it became simply > struct dentry *parent = cursor->d_parent; > unsigned n, *seq = &parent->d_inode->i_dir_seq; > spin_lock(&parent->d_lock); > for (;;) { > n = *seq; > if (!(n & 1) && cmpxchg(seq, n, n + 1) == n) > break; > cpu_relax(); > } > __list_del(cursor->d_child.prev, cursor->d_child.next); > if (child) > list_add(&cursor->d_child, &child->d_child); > else > list_add_tail(&cursor->d_child, &parent->d_subdirs); > smp_store_release(seq, n + 2); > spin_unlock(&parent->d_lock); > > with > > static struct dentry *next_positive(struct dentry *parent, > struct dentry *child, int count) > { > struct list_head *p = child ? &child->d_child : &parent->d_subdirs; > unsigned *seq = &parent->d_inode->i_dir_seq, n; > do { > int i = count; > n = smp_load_acquire(seq) & ~1; > rcu_read_lock(); > do { > p = p->next; > if (p == &parent->d_subdirs) { > child = NULL; > break; > } > child = list_entry(p, struct dentry, d_child); > } while (!simple_positive(child) || --i); > rcu_read_unlock(); > } while (unlikely(smp_load_acquire(seq) != n)); > return child; > } > as initial attempt at lockless next_positive(); barriers are probably wrong, > though... FWIW, loff_t dcache_dir_lseek(struct file *file, loff_t offset, int whence) { struct dentry *dentry = file->f_path.dentry; switch (whence) { case 1: offset += file->f_pos; case 0: if (offset >= 0) break; default: return -EINVAL; } if (offset != file->f_pos) { file->f_pos = offset; if (file->f_pos >= 2) { struct dentry *cursor = file->private_data; loff_t n = file->f_pos - 2; inode_lock_shared(dentry->d_inode); move_cursor(cursor, next_positive(dentry, NULL, n)); inode_unlock_shared(dentry->d_inode); } } return offset; } and int dcache_readdir(struct file *file, struct dir_context *ctx) { struct dentry *dentry = file->f_path.dentry; struct dentry *cursor = file->private_data; struct dentry *child, *next; bool moved = false; if (!dir_emit_dots(file, ctx)) return 0; child = ctx->pos != 2 ? cursor : NULL; while ((next = next_positive(dentry, child, 1)) != NULL) { if (!dir_emit(ctx, next->d_name.name, next->d_name.len, d_inode(next)->i_ino, dt_type(d_inode(next)))) break; moved = true; child = next; ctx->pos++; } if (moved) move_cursor(cursor, child); return 0; } is what the methods themselves became.
Powered by blists - more mailing lists