lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 06 Aug 2007 22:04:05 +0200
From:	Dimitrios Apostolou <jimis@....net>
To:	Andrew Morton <akpm@...ux-foundation.org>
CC:	linux-kernel@...r.kernel.org
Subject: Re: high system cpu load during intense disk i/o

Andrew Morton wrote:
> On Mon, 06 Aug 2007 16:20:30 +0200 Dimitrios Apostolou <jimis@....net> wrote:
> 
>> Andrew Morton wrote:
>>> On Sun, 5 Aug 2007 19:03:12 +0300 Dimitrios Apostolou <jimis@....net> wrote:
>>>
>>>> was my report so complicated?
>>> We're bad.
>>>
>>> Seems that your context switch rate when running two instances of
>>> badblocks against two different disks went batshit insane.  It doesn't
>>> happen here.
>>>
>>> Please capture the `vmstat 1' output while running the problematic
>>> workload.
>>>
>>> The oom-killing could have been unrelated to the CPU load problem.  iirc
>>> badblocks uses a lot of memory, so it might have been genuine.  Keep an eye
>>> on the /proc/meminfo output and send the kernel dmesg output from the
>>> oom-killing event.
>> Please see the attached files. Unfortunately I don't see any useful info 
>> in them:
>> 	*_before: before running any badblocks process
>> 	*_while: while running badblocks process, but without any cron job 
>> having kicked in
>> 	*_bad: 5 minutes later that some cron jobs kicked in
>>
>> About the OOM killer, indeed I believe that it is unrelated. It started 
>> killing after about 2 days, that hundreds of processes were stuck as 
>> running and taking up memory, so I suppose the 256 MB RAM were truly 
>> filled. I just mentioned it because its behaviour is completely 
>> non-helpful. It doesn't touch the badblocks process, it rarely touches 
>> the stuck as running cron jobs, but it kills other irrelevant processes. 
>> If you still want the killing logs, tell me and I'll search for them.
> 
> ah.  Your context-switch rate during the dual-badblocks run is not high at
> all.
> 
> I suspect I was fooled by the oprofile output, which showed tremendous
> amounts of load in schedule() and switch_to().  The percentages which
> opreport shows are the percentage of non-halted CPU time.  So if you have a
> function in the kernel which is using 1% of the total CPU, and the CPU is
> halted for 95% of the time, it appears that the function is taking 20% of
> CPU!
> 
> The fix for that is to boot with the "idle=poll" boot parameter, to make
> the CPU spin when it has nothing else to do.

I'm attaching the new oprofile output. I can't see any difference 
however. :-s


Dimitris



View attachment "bad.txt" of type "text/plain" (18206 bytes)

View attachment "idle.txt" of type "text/plain" (9398 bytes)

View attachment "vmstat_bad.txt" of type "text/plain" (937 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ