lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1417154411-5367-1-git-send-email-tytso@mit.edu>
Date:	Fri, 28 Nov 2014 01:00:05 -0500
From:	Theodore Ts'o <tytso@....edu>
To:	Ext4 Developers List <linux-ext4@...r.kernel.org>
Cc:	Linux Filesystem Development List <linux-fsdevel@...r.kernel.org>,
	Theodore Ts'o <tytso@....edu>
Subject: [PATCH-v5 0/5] add support for a lazytime mount option

This is an updated version of what had originally been an
ext4-specific patch which significantly improves performance by lazily
writing timestamp updates (and in particular, mtime updates) to disk.
The in-memory timestamps are always correct, but they are only written
to disk when required for correctness.

This provides a huge performance boost for ext4 due to how it handles
journalling, but it's valuable for all file systems running on flash
storage or drive-managed SMR disks by reducing the metadata write
load.  So upon request, I've moved the functionality to the VFS layer.
Once the /sbin/mount program adds support for MS_LAZYTIME, all file
systems should be able to benefit from this optimization.

There is still an ext4-specific optimization, which may be applicable
for other file systems which store more than one inode in a block, but
it will require file system specific code.  It is purely optional,
however.

Please note the changes to update_time() and the new write_time() inode
operations functions, which impact btrfs and xfs.  The changes are
fairly simple, but I would appreciate confirmation from the btrfs and
xfs teams that I got things right.   Thanks!!

Changes since -v4:
   - Fix ext4 optimization so it does not need to increment (and more
     problematically, decrement) the inode reference count
   - Per Christoph's suggestion, drop support for btrfs and xfs for now,
     issues with how btrfs and xfs handle dirty inode tracking.  We can add
     btrfs and xfs support back later or at the end of this series if we
     want to revisit this decision.
   - Miscellaneous cleanups

Changes since -v3:
   - inodes with I_DIRTY_TIME set are placed on a new bdi list,
        b_dirty_time.  This allows filesystem-level syncs to more
        easily iterate over those inodes that need to have their
        timestamps written to disk.
   - dirty timestamps will be written out asynchronously on the final
        iput, instead of when the inode gets evicted.
   - separate the definition of the new function
        find_active_inode_nowait() to a separate patch
   - create separate flag masks: I_DIRTY_WB and I_DIRTY_INODE, which
       indicate whether the inode needs to be on the write back lists,
       or whether the inode itself is dirty, while I_DIRTY means any one
       of the inode dirty flags are set.  This simplifies the fs
       writeback logic which needs to test for different combinations of
       the inode dirty flags in different places.

Changes since -v2:
   - If update_time() updates i_version, it will not use lazytime (i..e,
       the inode will be marked dirty so the change will be persisted on to
       disk sooner rather than later).  Yes, this eliminates the
       benefits of lazytime if the user is experting the file system via
       NFSv4.  Sad, but NFS's requirements seem to mandate this.
   - Fix time wrapping bug 49 days after the system boots (on a system
        with a 32-bit jiffies).   Use get_monotonic_boottime() instead.
   - Clean up type warning in include/tracing/ext4.h
   - Added explicit parenthesis for stylistic reasons    
   - Added an is_readonly() inode operations method so btrfs doesn't
       have to duplicate code in update_time().

Changes since -v1:
   - Added explanatory comments in update_time() regarding i_ts_dirty_days
   - Fix type used for days_since_boot
   - Improve SMP scalability in update_time and ext4_update_other_inodes_time
   - Added tracepoints to help test and characterize how often and under
         what circumstances inodes have their timestamps lazily updated

Theodore Ts'o (5):
  vfs: add support for a lazytime mount option
  vfs: don't let the dirty time inodes get more than a day stale
  vfs: add lazytime tracepoints for better debugging
  vfs: add find_inode_nowait() function
  ext4: add optimization for the lazytime mount option

 fs/ext4/inode.c             |  66 +++++++++++++++++++++++--
 fs/ext4/super.c             |   9 ++++
 fs/fs-writeback.c           |  66 ++++++++++++++++++++++---
 fs/inode.c                  | 116 +++++++++++++++++++++++++++++++++++++++++---
 fs/libfs.c                  |   2 +-
 fs/logfs/readwrite.c        |   2 +-
 fs/nfsd/vfs.c               |   2 +-
 fs/pipe.c                   |   2 +-
 fs/proc_namespace.c         |   1 +
 fs/sync.c                   |   8 +++
 fs/ufs/truncate.c           |   2 +-
 include/linux/backing-dev.h |   1 +
 include/linux/fs.h          |  17 ++++++-
 include/trace/events/ext4.h |  30 ++++++++++++
 include/trace/events/fs.h   |  56 +++++++++++++++++++++
 include/uapi/linux/fs.h     |   1 +
 mm/backing-dev.c            |  10 +++-
 17 files changed, 367 insertions(+), 24 deletions(-)
 create mode 100644 include/trace/events/fs.h

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ