linux-kernel - Re: Linux 2.6.29

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090324175506.GB15524@atrey.karlin.mff.cuni.cz>
Date:	Tue, 24 Mar 2009 18:55:06 +0100
From:	Jan Kara <jack@...e.cz>
To:	Alan Cox <alan@...rguk.ukuu.org.uk>
Cc:	Theodore Tso <tytso@....edu>, Ingo Molnar <mingo@...e.hu>,
	Arjan van de Ven <arjan@...radead.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Nick Piggin <npiggin@...e.de>,
	Jens Axboe <jens.axboe@...cle.com>,
	David Rees <drees76@...il.com>, Jesper Krogh <jesper@...gh.cc>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: Linux 2.6.29

> > They don't solve the problem where there is a *huge* amount of writes
> > going on, though --- if something is dirtying pages at a rate far
> 
> At very high rates other things seem to go pear shaped. I've not traced
> it back far enough to be sure but what I suspect occurs from the I/O at
> disk level is that two people are writing stuff out at once - presumably
> the vm paging pressure and the file system - as I see two streams of I/O
> that are each reasonably ordered but are interleaved.
  There are different problems leading to this:
1) JBD commit code writes ordered data on each transaction commit. This
is done in dirtied-time order which is not necessarily optimal in case
of random access IO. IO scheduler helps here though because we submit a
lot of IO at once. ext4 has at least the randomness part of this problem
"fixed" because it submits ordered data via writepages(). Doing this
change requires non-trivial changes to the journaling layer so I wasn't
brave enough to do it with ext3 and JBD as well (although porting the
patch is trivial).

2) When we do dirty throttling, there are going to be several threads
writing out on the filesystem (if you have more pdflush threads which
translates to having more than one CPU). Jens' per-BDI writeback
threads could help here (but I haven't yet got to reading his patches in
detail to be sure).

  These two problems together result in non-optimal IO pattern. At least
that's where I got to when I was looking into why Berkeley DB is so
slow. I was trying to somehow serialize more pdflush threads on the
filesystem but a stupid solution does not really help much - either I
was starving some throttled thread by other threads doing writeback or
I didn't quite keep the disk busy. So something like Jens' approach
is probably the way to go in the end.

> > don't get *that* bad, even with ext3.  At least, I haven't found a
> > workload that doesn't involve either dd if=/dev/zero or a massive
> > amount of data coming in over the network that will cause fsync()
> > delays in the > 1-2 second category.  Ext3 has been around for a long
> 
> I see it with a desktop when it pages hard and also when doing heavy
> desktop I/O (in my case the repeatable every time case is saving large
> images in the gimp - A4 at 600-1200dpi).
> 
> The other one (#8636) seems to be a bug in the I/O schedulers as it goes
> away if you use a different I/O sched.
> 
> > solve.  Simply mounting an ext3 filesystem using ext4, without making
> > any change to the filesystem format, should solve the problem.
> 
> I will try this experiment but not with production data just yet 8)
> 
> > some other users' data files.  This was the reason for Stephen Tweedie
> > implementing the data=ordered mode, and making it the default.
> 
> Yes and in the server environment or for typical enterprise customers
> this is a *big issue*, especially the risk of it being undetected that
> they just inadvertently did something like put your medical data into the
> end of something public during a crash.
> 
> > Try ext4, I think you'll like it.  :-)
> 
> I need to, so that I can double check none of the open jbd locking bugs
> are there and close more bugzilla entries (#8147)
  This one is still there. I'll have a look at it tomorrow and hopefully
will be able to answer...

									Honza

-- 
Jan Kara <jack@...e.cz>
SuSE CR Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/