linux-kernel - Re: Linux 2.6.29

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090324205722.1392e3db@hobbes.virtuouswap>
Date:	Tue, 24 Mar 2009 20:57:22 -0700
From:	Jesse Barnes <jbarnes@...tuousgeek.org>
To:	Theodore Tso <tytso@....edu>
Cc:	Ingo Molnar <mingo@...e.hu>, Alan Cox <alan@...rguk.ukuu.org.uk>,
	Arjan van de Ven <arjan@...radead.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Nick Piggin <npiggin@...e.de>,
	Jens Axboe <jens.axboe@...cle.com>,
	David Rees <drees76@...il.com>, Jesper Krogh <jesper@...gh.cc>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: Linux 2.6.29

On Tue, 24 Mar 2009 22:09:15 -0400
Theodore Tso <tytso@....edu> wrote:

> On Tue, Mar 24, 2009 at 04:03:53PM -0700, Jesse Barnes wrote:
> > 
> > You make it sound like this is hard to do...  I was running into
> > this problem *every day* until I moved to XFS recently.  I'm
> > running a fairly beefy desktop (VMware running a crappy Windows
> > install w/AV junk on it, builds, icecream and large mailboxes) and
> > have a lot of RAM, but it became unusable for minutes at a time,
> > which was just totally unacceptable, thus the switch.  Things have
> > been better since, but are still a little choppy.
> > 
> 
> I have 4 gigs of memory on my laptop, and I've never seen it these
> sorts of issues.  So maybe filesystem hackers don't have enough
> memory; or we don't use the right workloads?  It would help if I
> understood how to trigger these disaster cases.  I've had to work
> *really* hard (as in dd if=/dev/zero of=/mnt/dirty-me-harder) in order
> to get even a 30 second fsync() delay.  So understanding what sort of
> things you do that cause that many files data blocks to be dirtied,
> and/or what is causing a major read workload, would be useful.
> 
> It may be that we just need to tune the VM to be much more aggressive
> about pushing dirty pages to the disk sooner.  Understanding how the
> dynamics are working would be the first step.

Well I think that's part of the problem; this is bigger than just
filesystems; I've been using ext3 since before I started seeing this,
so it seems like a bad VM/fs interaction may be to blame.

> > I remember early in the 2.6.x days there was a lot of focus on
> > making interactive performance good, and for a long time it was.
> > But this I/O problem has been around for a *long* time now... What
> > happened?  Do not many people run into this daily?  Do all the
> > filesystem hackers run with special mount options to mitigate the
> > problem?
> 
> All I can tell you is that *I* don't run into them, even when I was
> using ext3 and before I got an SSD in my laptop.  I don't understand
> why; maybe because I don't get really nice toys like systems with
> 32G's of memory.  Or maybe it's because I don't use icecream (whatever
> that is).  What ever it is, it would be useful to get some solid
> reproduction information, with details about hardware configuration,
> and information collecting using sar and scripts that gather
> /proc/meminfo every 5 seconds, and what the applications were doing at
> the time.

icecream is a distributed compiler system.  Like distcc but a bit more
cross-compile & heterogeneous compiler friendly.

> It might also be useful for someone to try reducing the amount of
> memory the system is using by using mem= on the boot line, and see if
> that changes things, and to try simplifying the application workload,
> and/or using iotop to determine what is most contributing to the
> problem.  (And of course, this needs to be done with someone using
> ext3, since both ext4 and XFS use delayed allocation, which will
> largely make this problem go away.)

Yep, and that's where my blame comes in.  I whined about this to a few
people, like Arjan, who provided workarounds, but never got beyond
that.  Some real debugging would be needed to find & fix the root
cause(s).

-- 
Jesse Barnes, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/