lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 24 Mar 2009 04:12:49 -0700
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	Ingo Molnar <mingo@...e.hu>
Cc:	Alan Cox <alan@...rguk.ukuu.org.uk>,
	Arjan van de Ven <arjan@...radead.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Nick Piggin <npiggin@...e.de>, Theodore Tso <tytso@....edu>,
	Jens Axboe <jens.axboe@...cle.com>,
	David Rees <drees76@...il.com>, Jesper Krogh <jesper@...gh.cc>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: Linux 2.6.29

On Tue, 24 Mar 2009 11:31:11 +0100 Ingo Molnar <mingo@...e.hu> wrote:

> 
> * Alan Cox <alan@...rguk.ukuu.org.uk> wrote:
> 
> > > > I have not had this problem since I applied Arjan's (for some reason
> > > > repeatedly rejected) patch to change the ioprio of the various writeback
> > > > daemons. Under some loads changing to the noop I/O scheduler also seems
> > > > to help (as do most of the non default ones)
> > > 
> > > (link would be useful)
> > 
> > 
> > "Give kjournald a IOPRIO_CLASS_RT io priority"
> > 
> > October 2007 (yes its that old)
> 
> thx. A more recent submission from Arjan would be:
> 
>     http://lkml.org/lkml/2008/10/1/405
> 
> Resolution was that Tytso indicated it went into some sort of ext4 
> patch queue:
> 
> | I've ported the patch to the ext4 filesystem, and dropped it into 
> | the unstable portion of the ext4 patch queue.
> |
> |   ext4: akpm's locking hack to fix locking delays
> 
> but 6 months down the line and i can find no trace of this upstream 
> anywhere.
> 
> <let-me-rant-too>
> 
> The thing is ... this is a _bad_ ext3 design bug affecting ext3 
> users in the last decade or so of ext3 existence. Why is this issue 
> not handled with the utmost high priority and why wasnt it fixed 5 
> years ago already? :-)
> 
> It does not matter whether we have extents or htrees when there are 
> _trivially reproducible_ basic usability problems with ext3.
> 

It's all there in that Oct 2008 thread.

The proposed tweak to kjournald is a bad fix - partly because it will
elevate the priority of vast amounts of IO whose priority we don't _want_
elevated.

But mainly because the problem lies elsewhere - in an area of contention
between the committing and running transactions which we knowingly and
reluctantly added to fix a bug in 

commit 773fc4c63442fbd8237b4805627f6906143204a8
Author:     akpm <akpm>
AuthorDate: Sun May 19 23:23:01 2002 +0000
Commit:     akpm <akpm>
CommitDate: Sun May 19 23:23:01 2002 +0000

    [PATCH] fix ext3 buffer-stealing
    
    Patch from sct fixes a long-standing (I did it!) and rather complex
    problem with ext3.
    
    The problem is to do with buffers which are continually being dirtied
    by an external agent.  I had code in there (for easily-triggerable
    livelock avoidance) which steals the buffer from checkpoint mode and
    reattaches it to the running transaction.  This violates ext3 ordering
    requirements - it can permit journal space to be reclaimed before the
    relevant data has really been written out.
    
    Also, we do have to reliably get a lock on the buffer when moving it
    between lists and inspecting its internal state.  Otherwise a competing
    read from the underlying block device can trigger an assertion failure,
    and a competing write to the underlying block device can confuse ext3
    journalling state completely.
    

Now this:

> Resolution was that Tytso indicated it went into some sort of ext4 
> patch queue:

was not a fix at all.  It was a known-buggy hack which I proposed simply to
remove that contention point to let us find out if we're on the right
track.  IIRC Ric was going to ask someone to do some performance testing of
that hack, but we never heard back.

The bottom line is that someone needs to do some serious rooting through
the very heart of JBD transaction logic and nobody has yet put their hand
up.  If we do that, and it turns out to be just too hard to fix then yes,
perhaps that's the time to start looking at palliative bandaids.

The number of people who can be looked at to do serious ext3/JBD work is
pretty small now.  Ted, Stephen and I got old and died.  Jan does good work
but is spread thinly.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ