lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1254745898.26976.52.camel@twins>
Date:	Mon, 05 Oct 2009 14:31:38 +0200
From:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
To:	Jens Axboe <jens.axboe@...cle.com>
Cc:	Linux Kernel <linux-kernel@...r.kernel.org>, mingo@...e.hu
Subject: Re: find_busiest_group using lots of CPU

On Wed, 2009-09-30 at 10:18 +0200, Jens Axboe wrote:
> Hi,
> 
> I stuffed a few more SSDs into my text box. Running a simple workload
> that just does streaming reads from 10 processes (throughput is around
> 2.2GB/sec), find_busiest_group() is using > 10% of the CPU time. This is
> a 64 thread box.
> 
> The top two profile entries are:
> 
>     10.86%      fio  [kernel]                [k] find_busiest_group
>                 |          
>                 |--99.91%-- thread_return
>                 |          io_schedule
>                 |          sys_io_getevents
>                 |          system_call_fastpath
>                 |          0x7f4b50b61604
>                 |          |          
>                 |           --100.00%-- td_io_getevents
>                 |                     io_u_queued_complete
>                 |                     thread_main
>                 |                     run_threads
>                 |                     main
>                 |                     __libc_start_main
>                  --0.09%-- [...]
> 
>      5.78%      fio  [kernel]                [k] cpumask_next_and
>                 |          
>                 |--67.21%-- thread_return
>                 |          io_schedule
>                 |          sys_io_getevents
>                 |          system_call_fastpath
>                 |          0x7f4b50b61604
>                 |          |          
>                 |           --100.00%-- td_io_getevents
>                 |                     io_u_queued_complete
>                 |                     thread_main
>                 |                     run_threads
>                 |                     main
>                 |                     __libc_start_main
>                 |          
>                  --32.79%-- find_busiest_group
>                            thread_return
>                            io_schedule
>                            sys_io_getevents
>                            system_call_fastpath
>                            0x7f4b50b61604
>                            |          
>                             --100.00%-- td_io_getevents
>                                       io_u_queued_complete
>                                       thread_main
>                                       run_threads
>                                       main
>                                       __libc_start_main
> 
> This is with SCHED_DEBUG=y and SCHEDSTATS=y enabled, I just tried with
> both disabled but that yields the same result (well actually worse, 22%
> spent in there. dunno if that's normal "fluctuation"). GROUP_SCHED is
> not set. This seems way excessive!

io_schedule() straight into find_busiest_group() leads me to think this
could be SD_BALANCE_NEWIDLE, does something like:

for i in /proc/sys/kernel/sched_domain/cpu*/domain*/flags; 
do 
	val=`cat $i`; echo $((val & ~0x02)) > $i; 
done

[ assuming SCHED_DEBUG=y ]

Cure things?

If so, then its spending time looking for work, which there might not be
on your machine, since everything is waiting for IO or somesuch.

Not really sure what to do about it though, this is a quad socket
nehalem, right? We could possibly disable SD_BALANCE_NEWIDLE on the NODE
level, but that would again decrease throughput in things like kbuild.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ