linux-kernel - Re: 2.6.23-rc6: hanging ext3 dbench tests

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20070911165535.GA23520@atrey.karlin.mff.cuni.cz>
Date:	Tue, 11 Sep 2007 18:55:35 +0200
From:	Jan Kara <jack@...e.cz>
To:	Andy Whitcroft <apw@...dowen.org>
Cc:	sct@...hat.com, akpm@...ux-foundation.org, adilger@...sterfs.com,
	linux-ext4@...r.kernel.org, linux-kernel@...r.kernel.org,
	Linus Torvalds <torvalds@...ux-foundation.org>, mel@....ul.ie
Subject: Re: 2.6.23-rc6: hanging ext3 dbench tests

> I have a couple of failed test runs against 2.6.23-rc6 where the
> job timed out while running dbench over ext3.  Both on powerpc,
> though both significantly different hardware setups.  A failed
> run like this implies that the machine was still responsive to
> other processes but the dbench was making no progress.  There is
> no console diagnostics during the failure.
> 
> beavis was lost during a plain ext3 dbench run, having just
> successfully run a complete ext2 run.  elm3b19 was lost during an
> ext3 "data=writeback" dbench run, having already completed an plain
> ext2, and ext3 runs.
  OK, thanks for report.

> A quick poke at the dbench logs on the second machine shows this
> for the working ext3 dbench run:
> 
>   4 clients started
>   4     35288  814.49 MB/sec
>   0     62477  822.99 MB/sec
>   Throughput 822.954 MB/sec 4 procs
> 
> Whereas the hanging run shows the following continuing until the
> machine is reset, which confirms that the machine as a whole was
> still with us:
> 
>   4 clients started
>   4     36479  824.92 MB/sec
>   1     46857  519.98 MB/sec
>   1     46857  346.65 MB/sec
>   1     46857  259.99 MB/sec
>   1     46857  207.99 MB/sec
>   1     46857  173.32 MB/sec
>   1     46857  148.56 MB/sec
>   1     46857  129.99 MB/sec
>   1     46857  115.55 MB/sec
>   1     46857  103.99 MB/sec
>   1     46857  94.54 MB/sec
>   1     46857  86.66 MB/sec
>   1     46857  80.00 MB/sec
>   [...]
  So the process doing IO is hung. Could you dump stack of all the tasks
(Alt-Sysrq-t) and send the output? We've probably deadlocked
somewhere...

> The first machine is very similar:
> 
>   4 clients started
>   4     18468  445.29 MB/sec
>   4     41945  469.36 MB/sec
>   1     46857  346.68 MB/sec
>   1     46857  260.00 MB/sec
>   1     46857  208.00 MB/sec
>   [...]
> 
> Not sure if there is any significance to the 46857.  Though it feels
> like we may be at the end of the run when it fails.
> 
> I will try and reproduce this on one of the machines and see if I
> can get any further info.

								Honza
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/