linux-kernel - Re: Excessive stall times on ext4 in 3.9-rc2

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Fri, 12 Apr 2013 10:47:31 +0100
From:	Mel Gorman <mgorman@...e.de>
To:	Theodore Ts'o <tytso@....edu>, Jan Kara <jack@...e.cz>,
	linux-ext4@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>,
	Linux-MM <linux-mm@...ck.org>, Jiri Slaby <jslaby@...e.cz>
Subject: Re: Excessive stall times on ext4 in 3.9-rc2

On Thu, Apr 11, 2013 at 10:57:08PM -0400, Theodore Ts'o wrote:
> On Thu, Apr 11, 2013 at 11:33:35PM +0200, Jan Kara wrote:
> >   I think it might be more enlightening if Mel traced which process in
> > which funclion is holding the buffer lock. I suspect we'll find out that
> > the flusher thread has submitted the buffer for IO as an async write and
> > thus it takes a long time to complete in presence of reads which have
> > higher priority.
> 
> That's an interesting theory.  If the workload is one which is very
> heavy on reads and writes, that could explain the high latency.  That
> would explain why those of us who are using primarily SSD's are seeing
> the problems, because would be reads are nice and fast.
> 
> If that is the case, one possible solution that comes to mind would be
> to mark buffer_heads that contain metadata with a flag, so that the
> flusher thread can write them back at the same priority as reads.
> 
> The only problem I can see with this hypothesis is that if this is the
> explanation for what Mel and Jiri are seeing, it's something that
> would have been around for a long time, and would affect ext3 as well
> as ext4.  That isn't quite consistent, however, with Mel's observation
> that this is a probablem which has gotten worse in relatively
> recently.
> 

According to the tests I've run, multi-second stalls have been a problem for
a while but never really bothered me. I'm not sure why it felt particularly
bad around -rc2 or why it seems to be better now. Maybe I just had my
cranky pants on.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/