[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130423161319.GC2108@suse.de>
Date: Tue, 23 Apr 2013 17:13:19 +0100
From: Mel Gorman <mgorman@...e.de>
To: Theodore Ts'o <tytso@....edu>, Jan Kara <jack@...e.cz>,
linux-ext4@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>,
Linux-MM <linux-mm@...ck.org>, Jiri Slaby <jslaby@...e.cz>
Subject: Re: Excessive stall times on ext4 in 3.9-rc2
On Tue, Apr 23, 2013 at 11:50:19AM -0400, Theodore Ts'o wrote:
> On Tue, Apr 23, 2013 at 04:33:05PM +0100, Mel Gorman wrote:
> > That's a pretty big drop but it gets bad again for the second worst stall --
> > wait_on_page_bit as a result of generic_file_buffered_write.
> >
> > Vanilla kernel 1336064 ms stalled with 109 events
> > Patched kernel 2338781 ms stalled with 164 events
>
> Do you have the stack trace for this stall? I'm wondering if this is
> caused by the waiting for stable pages in write_begin() , or something
> else.
>
[<ffffffff81110238>] wait_on_page_bit+0x78/0x80
[<ffffffff815af294>] kretprobe_trampoline+0x0/0x4c
[<ffffffff81110e84>] generic_file_buffered_write+0x114/0x2a0
[<ffffffff81111ccd>] __generic_file_aio_write+0x1bd/0x3c0
[<ffffffff81111f4a>] generic_file_aio_write+0x7a/0xf0
[<ffffffff811ee639>] ext4_file_write+0x99/0x420
[<ffffffff81174d87>] do_sync_write+0xa7/0xe0
[<ffffffff81175447>] vfs_write+0xa7/0x180
[<ffffffff811758cd>] sys_write+0x4d/0x90
[<ffffffff815b3eed>] system_call_fastpath+0x1a/0x1f
[<ffffffffffffffff>] 0xffffffffffffffff
The processes that stalled in this particular trace are wget, latency-output,
tar and tclsh. Most of these are sequential writers except for tar which
is both a sequential reader and sequential writers.
> If it is blocking caused by stable page writeback that's interesting,
> since it would imply that something in your workload is trying to
> write to a page that has already been modified (i.e., appending to a
> log file, or updating a database file). Does that make sense given
> what your workload might be running?
>
I doubt it is stable write consider the type of processes that are running. I
would expect the bulk of the activity to be sequential readers or writers
of multiple files. The summarised report from the raw data is now at
http://www.csn.ul.ie/~mel/postings/ext4tag-20130423/dstate-summary-vanilla
http://www.csn.ul.ie/~mel/postings/ext4tag-20130423/dstate-summary-ext4tag
It's an aside but the worst of the stalls are incurred by systemd-tmpfile
which were not a deliberate part of the test and yet another thing that
I would not have caught unless I was running tests on my laptop. Looking
closer at that thing, the default configuration is to run the service 15
minutes after boot and after that it runs once a day. It looks like the
bulk of the scanning would be in /var/tmp/ looking at systemds own files
(over 3000 of them) which I'm a little amused by.
My normal test machines would not hit this because they are not systemd
based but the existance of thing thing is worth noting. Any IO-based tests
run on systemd-based distributions may give different results depending
on whether this service triggered during the test or not.
--
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists