linux-kernel - Re: IO CPU affinity test results

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20080314123606.GL17940@kernel.dk>
Date:	Fri, 14 Mar 2008 13:36:06 +0100
From:	Jens Axboe <jens.axboe@...cle.com>
To:	"Alan D. Brunelle" <Alan.Brunelle@...com>
Cc:	linux-kernel@...r.kernel.org, npiggin@...e.de, dgc@....com
Subject: Re: IO CPU affinity test results

On Fri, Mar 14 2008, Alan D. Brunelle wrote:
> Good morning Jens - 
> 
> I had two machines running the latest patches hang last night: 
> 
> o  2-way AMD64 - I inadvertently left the patched kernel running, and
> I was moving a ton of data (100+GB) back up over the net to this node.
> It hard hung (believe it or not) about 99% of the way through. Hard
> hang, wouldn't respond to anything.
> 
> o  4-way IA64 - I was performing a simple test: [mkfs / mount / untar
> linux sources / make allnoconfig / make -j 5 / umount] repeatedly
> switching rq_affinity to 0/1 between each run. After 22 passes it had
> a hard hang with rq_affinity set to 1.
> 
> Of course, there is no way of knowing if either hang had anything to
> do with the patches, but it seems a bit ominous as RQ=1 was set in
> both cases.
> 
> This same test worked fine for 30 passes on a 2-way AMD64 box, with
> the following results:
> 
> Part  RQ   MIN     AVG     MAX      Dev
> ----- --  ------  ------  ------  ------
>  mkfs  0  41.656  41.862  42.086   0.141
>  mkfs  1  41.618  41.909  42.270   0.192
> 
> untar  0  18.055  19.611  20.906   0.720
> untar  1  18.523  19.905  21.988   0.738
> 
>  make  0  50.480  50.991  51.752   0.340
>  make  1  49.819  50.442  51.000   0.292
> 
>  comb  0 110.433 112.464 114.176   0.932
>  comb  1 110.694 112.256 114.683   0.948
> 
>  psys  0  10.28%  10.91%  11.29%   0.243
>  psys  1  10.21%  11.05%  11.80%   0.350
> 
> 
> All results are in seconds (as measured by Python's time.time()),
> except for the psys - which was the average of mpstat's %sys column
> over the life of the whole run. The mkfs part consisted of [mkfs -t
> ext2 ; sync ; sync], untar [mount; untar linux sources; umount; sync;
> sync], make [mount; make allnoconfig; make -j 3; umount; sync; sync],
> and comb is the combined times of the mkfs, untar and make parts. 
> 
> So, in a nutshell, we saw slightly better overall performance, but not
> conclusively, and we saw slightly elevated %system time to accomplish
> the task. 

I think that is encouraging, for such a small setup. The make results
are particularly nice. The hangs are a bother, I have no good ideas on
why the occur. The fact that it happens on both archs indicates that
this is perhaps a generic problem, which is good. The code to support
this is relatively simple, so it should be possible to go over it with a
fine toothed comb and see if anything shows up.

You didn't get any watchdog triggers on the serial console, or anything
like that?

> On the 4-way, results were much worse: the final data shown before the
> system hung showed the rq=1 passes taking significantly longer, albeit
> at lower %system. I'm going to try the runs again, but I have a
> feeling that the latest "clean" patch based upon Nick's single call
> mechanism is a step backwards.

As to the hangs, yes it's definitely a step back. Cleanliness of code is
better though and it's something we can support, so I'm inclined to try
and make it work. The overhead of the cleaner approach should not be a
problem, it'll be a bit more costly than the ipi hack but not a lot.
Since the tuning and tweaks, we got rid of the allocations in the path,
so it should be about as fast as the original.

So once the base is stable, we can go and profile and squeeze some more
performance out of it :-)

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/