[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20070911165535.GA23520@atrey.karlin.mff.cuni.cz>
Date: Tue, 11 Sep 2007 18:55:35 +0200
From: Jan Kara <jack@...e.cz>
To: Andy Whitcroft <apw@...dowen.org>
Cc: sct@...hat.com, akpm@...ux-foundation.org, adilger@...sterfs.com,
linux-ext4@...r.kernel.org, linux-kernel@...r.kernel.org,
Linus Torvalds <torvalds@...ux-foundation.org>, mel@....ul.ie
Subject: Re: 2.6.23-rc6: hanging ext3 dbench tests
> I have a couple of failed test runs against 2.6.23-rc6 where the
> job timed out while running dbench over ext3. Both on powerpc,
> though both significantly different hardware setups. A failed
> run like this implies that the machine was still responsive to
> other processes but the dbench was making no progress. There is
> no console diagnostics during the failure.
>
> beavis was lost during a plain ext3 dbench run, having just
> successfully run a complete ext2 run. elm3b19 was lost during an
> ext3 "data=writeback" dbench run, having already completed an plain
> ext2, and ext3 runs.
OK, thanks for report.
> A quick poke at the dbench logs on the second machine shows this
> for the working ext3 dbench run:
>
> 4 clients started
> 4 35288 814.49 MB/sec
> 0 62477 822.99 MB/sec
> Throughput 822.954 MB/sec 4 procs
>
> Whereas the hanging run shows the following continuing until the
> machine is reset, which confirms that the machine as a whole was
> still with us:
>
> 4 clients started
> 4 36479 824.92 MB/sec
> 1 46857 519.98 MB/sec
> 1 46857 346.65 MB/sec
> 1 46857 259.99 MB/sec
> 1 46857 207.99 MB/sec
> 1 46857 173.32 MB/sec
> 1 46857 148.56 MB/sec
> 1 46857 129.99 MB/sec
> 1 46857 115.55 MB/sec
> 1 46857 103.99 MB/sec
> 1 46857 94.54 MB/sec
> 1 46857 86.66 MB/sec
> 1 46857 80.00 MB/sec
> [...]
So the process doing IO is hung. Could you dump stack of all the tasks
(Alt-Sysrq-t) and send the output? We've probably deadlocked
somewhere...
> The first machine is very similar:
>
> 4 clients started
> 4 18468 445.29 MB/sec
> 4 41945 469.36 MB/sec
> 1 46857 346.68 MB/sec
> 1 46857 260.00 MB/sec
> 1 46857 208.00 MB/sec
> [...]
>
> Not sure if there is any significance to the 46857. Though it feels
> like we may be at the end of the run when it fails.
>
> I will try and reproduce this on one of the machines and see if I
> can get any further info.
Honza
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists