lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 18 Mar 2013 16:18:11 +0530
From:	Ramkumar Ramachandra <artagnon@...il.com>
To:	linux-kernel@...r.kernel.org
Cc:	Junio C Hamano <gitster@...ox.com>,
	Thomas Rast <trast@....ethz.ch>,
	Duy Nguyễn <pclouds@...il.com>,
	Jeff King <peff@...f.net>,
	Karsten Blees <karsten.blees@...il.com>
Subject: Beyond inotify recursive watches

Hi,

We, the Git folks, were wondering how to speed things up.  In an
strace of "git status" on linux-2.6.git, we found:

  top syscalls sorted     top syscalls sorted
  by acc. time            by number
  ----------------------------------------------
  0.401906 40950 lstat    0.401906 40950 lstat
  0.190484 5343 getdents  0.150055 5374 open
  0.150055 5374 open      0.190484 5343 getdents
  0.074843 2806 close     0.074843 2806 close
  0.003216 157 read       0.003216 157 read

Most of this happens when we try to build the index, querying for
changes in tracked files and discovering untracked files.  It was
suggested that we can use inotify to speed things up: we'll write a
user-wide daemon (like ssh_client) that will set up watches on each
directory of each git repository.  A repository-wide daemon wouldn't
work because /proc/sys/fs/inotify/max_user_instances reads 128 on
typical linux-3.8 systems, and this is problematic.

However, Karsten and Junio point out that our efforts might be futile
as we are trying to do what the VFS caching already does, and doing it
poorly.  Speedups, if any, would be minor and certainly not worth the
effort.

I think inotify is a poorly suited solution for our needs, as setting
up recursive watches is horribly inelegant.  I think it's a
well-suited solution for something like Dropbox, which just executes
something when there's a change in a specified directory.  Also, I
suspect VFS caching works by optimizing filesystem calls for
frequently used directory entries.  A git repository is not a
collection of frequently-used directory entries, but a frequently used
unit.  I know very little about how VFS works, but I'm wondering if we
can make any changes in VFS to make it perform better with git
repositories.  We won't need something as fine-grained as inotify: if
the tree hash of a directory entry changes frequently enough, optimize
all filesystem calls to inodes in the directory recursively.
Recursively optimizing a directory is useless in the general case, and
I would imagine something like a new rwatch() syscall for git to
register the repository with VFS.  All system calls will then be
magically optimized, and few changes need to be made to git.  The
added side-benefit is that all other version control systems can use
it too.

Thanks for reading.

Ram
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ