linux-kernel - Re: Linux 2.6.29

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090325150041.GM32307@mit.edu>
Date:	Wed, 25 Mar 2009 11:00:41 -0400
From:	Theodore Tso <tytso@....edu>
To:	Jan Kara <jack@...e.cz>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Ingo Molnar <mingo@...e.hu>,
	Alan Cox <alan@...rguk.ukuu.org.uk>,
	Arjan van de Ven <arjan@...radead.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Nick Piggin <npiggin@...e.de>,
	Jens Axboe <jens.axboe@...cle.com>,
	David Rees <drees76@...il.com>, Jesper Krogh <jesper@...gh.cc>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: Linux 2.6.29

On Wed, Mar 25, 2009 at 01:37:44PM +0100, Jan Kara wrote:
> >     Also, we do have to reliably get a lock on the buffer when moving it
> >     between lists and inspecting its internal state.  Otherwise a competing
> >     read from the underlying block device can trigger an assertion failure,
> >     and a competing write to the underlying block device can confuse ext3
> >     journalling state completely.
>
>   I've looked at this a bit. I suppose you mean the contention arising from
> us taking the buffer lock in do_get_write_access()? But it's not obvious
> to me why we'd be contending there... We call this function only for
> metadata buffers (unless in data=journal mode) so there isn't huge amount
> of these blocks.

There isn't a huge number of those blocks, but if inode #1220 was
modified in the previous transaction which is now being committed, and
we then need to modify and write out inode #1221 in the current
contention, and they share the same inode table block, that would
cause the contention.  That probably doesn't happen that often in a
synchronous code path, but it probably happens more often that you're
thinking.  I still think the fsync() problem is the much bigger deal,
and solving the contention problem isn't going to solve the fsync()
latency problem with ext3 data=ordered mode.

>   Also when I emailed with a few people about these sync problems, they
> wrote that switching to data=writeback mode helps considerably so this
> would indicate that handling of ordered mode data buffers is causing most
> of the slowdown...

Yes, but we need to be clear whether this was an fsync() problem or
some other random delay problem.  If it's the fsync() problem,
obviously data=writeback will solve the fsync() latency delay problem.
(As will using delayed allocation in ext4 or XFS.)

    	       	       		     	  - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/