lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.2.00.0903251008120.3032@localhost.localdomain>
Date:	Wed, 25 Mar 2009 10:29:48 -0700 (PDT)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Theodore Tso <tytso@....edu>
cc:	Jan Kara <jack@...e.cz>, Andrew Morton <akpm@...ux-foundation.org>,
	Ingo Molnar <mingo@...e.hu>,
	Alan Cox <alan@...rguk.ukuu.org.uk>,
	Arjan van de Ven <arjan@...radead.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Nick Piggin <npiggin@...e.de>,
	Jens Axboe <jens.axboe@...cle.com>,
	David Rees <drees76@...il.com>, Jesper Krogh <jesper@...gh.cc>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: Linux 2.6.29



On Wed, 25 Mar 2009, Theodore Tso wrote:
>
> I still think the fsync() problem is the much bigger deal, and solving 
> the contention problem isn't going to solve the fsync() latency problem 
> with ext3 data=ordered mode.

The fsync() problem is really annoying, but what is doubly annoying is 
that sometimes one process doing fsync() (or sync) seems to cause other 
processes to hickup too. 

Now, I personally solved that problem by moving to (good) SSD's on my 
desktop, and I think that's indeed the long-term solution. But it would be 
good to try to figure out a solution in the short term for people who 
don't have new hardware thrown at them from random companies too.

I suspect it's a combination of filesystem transaction locking, together 
with the VM wanting to write out some unrelated blocks or inodes due to 
the system just being close to the dirty limits. Which is why the 
system-wide hickups then happen especially when writing big files.

The VM _tries_ to do writes in the background, but if the writepage() path 
hits a filesystem-level blocking lock, that background write suddenly 
becomes largely synchronous.

I suspect there is also some possibility of confusion with inter-file 
(false) metadata dependencies. If a filesystem were to think that the file 
size is metadata that should be journaled (in a single journal), and the 
journaling code then decides that it needs to do those meta-data updates 
in the correct order (ie the big file write _before_ the file write that 
wants to be fsync'ed), then the fsync() will be delayed by a totally 
irrelevant large file having to have its data written out (due to 
data=ordered or whatever).

I'd like to think that no filesystem designer would ever be that silly, 
but I'm too scared to try to actually go and check. Because I could well 
imagine that somebody really thought that "size" is metadata.

			Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ