lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1343229134-29487-1-git-send-email-artem.bityutskiy@linux.intel.com>
Date:	Wed, 25 Jul 2012 18:11:58 +0300
From:	Artem Bityutskiy <artem.bityutskiy@...ux.intel.com>
To:	Al Viro <viro@...IV.linux.org.uk>
Cc:	Linux Kernel Maling List <linux-kernel@...r.kernel.org>,
	Linux FS Maling List <linux-fsdevel@...r.kernel.org>
Subject: R.I.P. pdflush

Now that all file-systems have been modified to not use the '->write_super()'
superblock method, we can kill the last pdflush leftover - the 'sync_supers'
kernel thread.

The sync_supers kernel thread does a very simple thing: wake up every 5
seconds (see [1]), iterate over all superblocks in the system and flush
dirty superblocks by calling their '->write_super()' method.

The problem is that from power-efficiency point of view it is very wasteful
to have a thread which wakes up every 5 seconds in the very core of the
Linux kernel. Indeed, most of the time this thread wakes the CPU from a deep
sleep state just to find out that there are no dirty superblocks. Besides,
modern file-systems like btrfs and ext4 (journalled mode only) do not even
register '->write_super()', so on many modern systems sync_super is completely
useless.

And as usually happens when trying to modify old code like that - removing
sync_supers was a tedious job. It required changing 12 file-systems, including
ancient ones. While changes were not that complex, testing all of them was the
most difficult part. While testing the mainstream file-systems like ext4 was
easy (just run xfstests and wait few hours), testing baroque file-systems was
problematic because they simply oopsed or errored even before I modified them.

For example, reiserfs deadlocked quickly when I tested it using xfstests with
resierfs quota support enabled. I spend several days trying to fix this, but
reiserfs is quite complex and I'd say its locking is crazy (partially because
of the BKL push-down). But I gave up after I realized that the dead-lock is
related to the quota support. I disabled quotas and xfstests passed.

I also had some adventures with affs and few other old file-systems.

The first patch of this patch-set removes the sync_supers thread and it is the
most important one. All the other patches are minor clean-ups and they simply
remove all references to 'write_super' and 'pdflush' from commentaries
and the documentation.

I suggest that all patches go in via Al's tree. However, not before the ext4,
exofs and udf changes are merged, which I expect to happen before v3.6-rc1.
The rest of the file-systems are merged already - here is the summary.

1.  ext4 - changes sit in Ted Ts'o's tree
    git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git dev
2.  exofs - changes sit in Boaz Harrosh's tree
    git://git.open-osd.org/linux-open-osd linux-next
3.  udf - changes sit in Jan Kara's tree:
    git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs for_next
4.  sysv - merged upstream
    9d46be2 fs/sysv: stop using write_super and s_dirt
5.  ufs - merged upstream
    9e9ad5f fs/ufs: get rid of write_super
6.  affs - merged upstream:
    3dd8478 affs: get rid of affs_sync_super
7.  hfs - merged upstream:
    5687b57 hfs: get rid of hfs_sync_super
8.  hfsplus - merged upstream:
    9e6c582 hfsplus: get rid of write_super
9.  ext2 - merged upstream
    f72cf5e ext2: do not register write_super within VFS
10. vfat - merged upstream
    7849118 fat: switch to fsinfo_inode
11. jffs2 - merged upstream
    208b14e jffs2: get rid of jffs2_sync_super
12. reiserfs - merged upstream
    033369d reiserfs: get rid of resierfs_sync_super

These patches are also available here:
git://git.infradead.org/users/dedekind/linux-misc.git sync_supers

And just because this is the final pdflush removal, here is a brief historical
reference.

1. early days...2.6.31 - pdflush is the kernel daemon which periodically
   wakes-up and flushes all dirty inodes and superblocks.
2. 2.6.32 - Jens Axboe introduces per-block device BDI flusher threads which
   are now responsible to flushing dirty inodes [2]. The pdflush thread becomes
   very simple, it is re-named to sync_supers and it periodically wakes-up
   and flushes superblocks. While overall Jens' change was good, it introduced
   a regression: instead of one pdflush thread waking-up every 5 seconds [3]
   we ended up with multiple threads waking up every 5 seconds - sync_supers
   and several flusher threads.
3. 2.6.36 - Artem Bityutskiy :-) fixes the wake-ups regression (see commit
   6467716) and from now on flusher threads do not wake up unless there are
   some dirty data for the corresponding block device.

   Attempts are made to similarly optimize sync_supers, but they are vetoed
   by Al Viro who wants sync_supers to be killed altogether instead [4].
4. 3.6 - the sync_supers is hopefully finally killed. With this the last
   piece of pdflush is also gone.

I'd like to thank Intel OTC for supporting this project, Jan Kara for help
with ext[24], Andrew Morton, Al Viro, Ted Ts'o, Nick Piggin.

[1] 5 seconds is the default setting and major distributions do not change
    it. But it is tunable via /proc/sys/vm/dirty_writeback_centisecs
[2] http://lwn.net/Articles/326552/
[3] pdflush thread was forking itself if there were a lot dirty date, but it
    does not matter in this context.
[4] https://lkml.org/lkml/2010/7/22/96

--
Regards,
Artem Bityutskiy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ