lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7098047.RSyYY1KrfL@deuteros>
Date:	Fri, 12 Apr 2013 11:18:13 +0100
From:	Tvrtko Ursulin <tvrtko.ursulin@...lan.co.uk>
To:	Theodore Ts'o <tytso@....edu>
Cc:	Jan Kara <jack@...e.cz>, Mel Gorman <mgorman@...e.de>,
	linux-ext4@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>,
	Linux-MM <linux-mm@...ck.org>, Jiri Slaby <jslaby@...e.cz>
Subject: Re: Excessive stall times on ext4 in 3.9-rc2


Hi all,

On Thursday 11 April 2013 22:57:08 Theodore Ts'o wrote:
> That's an interesting theory.  If the workload is one which is very
> heavy on reads and writes, that could explain the high latency.  That
> would explain why those of us who are using primarily SSD's are seeing
> the problems, because would be reads are nice and fast.
> 
> If that is the case, one possible solution that comes to mind would be
> to mark buffer_heads that contain metadata with a flag, so that the
> flusher thread can write them back at the same priority as reads.
> 
> The only problem I can see with this hypothesis is that if this is the
> explanation for what Mel and Jiri are seeing, it's something that
> would have been around for a long time, and would affect ext3 as well
> as ext4.  That isn't quite consistent, however, with Mel's observation
> that this is a probablem which has gotten worse in relatively
> recently.

Dropping in as a casual observer and having missed the start of the thread, 
risking that I will just muddle the water for you.

I had a similar problem for quite a while with ext4, at least that was my 
conclusion since the fix was to migrate one filesystem to xfs which fixed it 
for me. Time period when I observed this was between 3.5 and 3.7 kernels.

Situation was I had an ext4 filesystem (on top of LVM, which was on top of MD 
RAID 1, which was on top of two mechanical hard drives) which was dedicated to 
holding a large SVN check-out. Other filesystems were also ext4 on different 
logical volumes (but same spindles).

Symptoms were long stalls of everything (including window management!) on a 
relatively heavily loaded desktop (which was KDE). Stalls would last anything 
from five to maybe even 30 seconds. Not sure exactly but long enough that you 
think the system has actually crashed. I couldn't even switch away to a 
different virtual terminal during the stall, nothing.

Eventually I traced it down to kdesvn (subversion client) periodically 
refreshing (or something) it's metadata and hence generating some IO on that 
dedicated filesystem. That combined with some other desktop activity had an 
effect of stalling everything else. I thought it was very weird, but I suppose 
KDE and all the rest nowadays do to much IO in everything they do.

Following a hunch I reformatted that filesystem as XFS which fixed the 
problem.

I can't reproduce this now to run any tests so I know this is not very helpful 
now. But perhaps some of the info will be useful to someone.

Tvrtko

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ