linux-ext4 - Re: Excessive stall times on ext4 in 3.9-rc2

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130403151535.GA4908@suse.de>
Date:	Wed, 3 Apr 2013 16:15:35 +0100
From:	Mel Gorman <mgorman@...e.de>
To:	Theodore Ts'o <tytso@....edu>
Cc:	linux-ext4@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>,
	Linux-MM <linux-mm@...ck.org>, Jiri Slaby <jslaby@...e.cz>
Subject: Re: Excessive stall times on ext4 in 3.9-rc2

On Wed, Apr 03, 2013 at 08:05:30AM -0400, Theodore Ts'o wrote:
> On Wed, Apr 03, 2013 at 11:19:25AM +0100, Mel Gorman wrote:
> > 
> > I'm running with -rc5 now. I have not noticed much interactivity problems
> > as such but the stall detection script reported that mutt stalled for
> > 20 seconds opening an inbox and imapd blocked for 59 seconds doing path
> > lookups, imaps blocked again for 12 seconds doing an atime update, an RSS
> > reader blocked for 3.5 seconds writing a file. etc.
> 
> If imaps blocked for 12 seconds during an atime update, combined with
> everything else, at a guess it got caught by something holding up a
> journal commit. 

It's a possibility.

I apologise but I forgot that mail is stored on a crypted partition on
this machine. It's formatted ext4 but dmcrypt could be making this problem
worse if it's stalling ext4 waiting to encrypt/decrypt data due to either
a scheduler or workqueue change.

> Could you try enabling the jbd2_run_stats tracepoint
> and grabbing the trace log?  This will give you statistics on how long
> (in milliseconds) each of the various phases of a jbd2 commit is
> taking, i.e.:
> 
>     jbd2/sdb1-8-327   [002] .... 39681.874661: jbd2_run_stats: dev 8,17 tid 7163786 wait 0 request_delay 0 running 3530 locked 0 flushing 0 logging 0 handle_count 75 blocks 8 blocks_logged 9
>      jbd2/sdb1-8-327   [003] .... 39682.514153: jbd2_run_stats: dev 8,17 tid 7163787 wait 0 request_delay 0 running 640 locked 0 flushing 0 logging 0 handle_count 39 blocks 12 blocks_logged 13
>      jbd2/sdb1-8-327   [000] .... 39687.665609: jbd2_run_stats: dev 8,17 tid 7163788 wait 0 request_delay 0 running 5150 locked 0 flushing 0 logging 0 handle_count 60 blocks 13 blocks_logged 14
>      jbd2/sdb1-8-327   [000] .... 39693.200453: jbd2_run_stats: dev 8,17 tid 7163789 wait 0 request_delay 0 running 4840 locked 0 flushing 0 logging 0 handle_count 53 blocks 10 blocks_logged 11
>      jbd2/sdb1-8-327   [001] .... 39695.061657: jbd2_run_stats: dev 8,17 tid 7163790 wait 0 request_delay 0 running 1860 locked 0 flushing 0 logging 0 handle_count 124 blocks 19 blocks_logged 20
> 

Attached as well as the dstate summary that was recorded at the same
time. It's not quite as compelling but I'll keep the monitor running and
see if something falls out. I didn't find anything useful in the existing
mmtests tests that could be used to bisect this but not many of them are
focused on IO.

> In the above sample each journal commit is running for no more than 5
> seconds or so (since that's the default jbd2 commit timeout; if a
> transaction is running for less than 5 seconds, then either we ran out
> of room in the journal, and the blocks_logged number will be high, or
> a commit was forced by something such as an fsync call).  
> 

I didn't see anything majorly compelling in the jbd tracepoints but I'm
not 100% sure I'm looking for the right thing either. I also recorded
/proc/latency_stat and there were some bad sync latencies from the file
as you can see here

3 4481 1586 jbd2_log_wait_commit ext4_sync_file vfs_fsync sys_msync system_call_fastpath
3 11325 4373 sleep_on_page wait_on_page_bit kretprobe_trampoline filemap_write_and_wait_range ext4_sync_file vfs_fsync sys_msync system_call_fastpath
85 1130707 14904 jbd2_journal_stop jbd2_journal_force_commit ext4_force_commit ext4_sync_file do_fsync sys_fsync system_call_fastpath
1 2161073 2161073 start_this_handle jbd2__journal_start.part.8 jbd2__journal_start __ext4_journal_start_sb ext4_da_writepages do_writepages __filemap_fdatawrite_range filemap_write_and_wait_range ext4_sync_file do_fsync sys_fsync system_call_fastpath
118 7798435 596184 jbd2_log_wait_commit jbd2_journal_stop jbd2_journal_force_commit ext4_force_commit ext4_sync_file do_fsync sys_fsync system_call_fastpath
599 15496449 3405822 sleep_on_page wait_on_page_bit kretprobe_trampoline filemap_write_and_wait_range ext4_sync_file do_fsync sys_fsync system_call_fastpath
405 28572881 2619592 jbd2_log_wait_commit ext4_sync_file do_fsync sys_fsync system_call_fastpath


> If an atime update is getting blocked by 12 seconds, then it would be
> interesting to see if a journal commit is running for significantly
> longer than 5 seconds, or if one of the other commit phases is taking
> significant amounts of time.  (On the example above they are all
> taking no time, since I ran this on a relatively uncontended system;
> only a single git operation taking place.)
> 
> Something else that might be worth trying is grabbing a lock_stat
> report and see if something is sitting on an ext4 or jbd2 mutex for a
> long time.
> 

Ok, if nothing useful falls out in this session I'll enable lock
debugging. latency_stat on its own would not be enough to conclude that
a problem was related to lock contention.

> Finally, as I mentioned I tried some rather simplistic tests and I
> didn't notice any difference between a 3.2 kernel and a 3.8/3.9-rc5
> kernel.  Assuming you can get a version of systemtap that
> simultaneously works on 3.2 and 3.9-rc5 :-P, and chance you could do a
> quick experiment and see if you're seeing a difference on your setup?
> 

stap-fix.sh should be able to kick systemtap sufficiently hard for
either 3.2 or 3.9-rc5 to keep it working. I'll keep digging when I can.

-- 
Mel Gorman
SUSE Labs

View attachment "dstate-summary.txt" of type "text/plain" (20392 bytes)

Download attachment "ftrace-debug-stalls-monitor.gz" of type "application/x-gzip" (13681 bytes)