linux-kernel - Re: find_busiest_group using lots of CPU

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <1254745898.26976.52.camel@twins>
Date:	Mon, 05 Oct 2009 14:31:38 +0200
From:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
To:	Jens Axboe <jens.axboe@...cle.com>
Cc:	Linux Kernel <linux-kernel@...r.kernel.org>, mingo@...e.hu
Subject: Re: find_busiest_group using lots of CPU

On Wed, 2009-09-30 at 10:18 +0200, Jens Axboe wrote:
> Hi,
> 
> I stuffed a few more SSDs into my text box. Running a simple workload
> that just does streaming reads from 10 processes (throughput is around
> 2.2GB/sec), find_busiest_group() is using > 10% of the CPU time. This is
> a 64 thread box.
> 
> The top two profile entries are:
> 
>     10.86%      fio  [kernel]                [k] find_busiest_group
>                 |          
>                 |--99.91%-- thread_return
>                 |          io_schedule
>                 |          sys_io_getevents
>                 |          system_call_fastpath
>                 |          0x7f4b50b61604
>                 |          |          
>                 |           --100.00%-- td_io_getevents
>                 |                     io_u_queued_complete
>                 |                     thread_main
>                 |                     run_threads
>                 |                     main
>                 |                     __libc_start_main
>                  --0.09%-- [...]
> 
>      5.78%      fio  [kernel]                [k] cpumask_next_and
>                 |          
>                 |--67.21%-- thread_return
>                 |          io_schedule
>                 |          sys_io_getevents
>                 |          system_call_fastpath
>                 |          0x7f4b50b61604
>                 |          |          
>                 |           --100.00%-- td_io_getevents
>                 |                     io_u_queued_complete
>                 |                     thread_main
>                 |                     run_threads
>                 |                     main
>                 |                     __libc_start_main
>                 |          
>                  --32.79%-- find_busiest_group
>                            thread_return
>                            io_schedule
>                            sys_io_getevents
>                            system_call_fastpath
>                            0x7f4b50b61604
>                            |          
>                             --100.00%-- td_io_getevents
>                                       io_u_queued_complete
>                                       thread_main
>                                       run_threads
>                                       main
>                                       __libc_start_main
> 
> This is with SCHED_DEBUG=y and SCHEDSTATS=y enabled, I just tried with
> both disabled but that yields the same result (well actually worse, 22%
> spent in there. dunno if that's normal "fluctuation"). GROUP_SCHED is
> not set. This seems way excessive!

io_schedule() straight into find_busiest_group() leads me to think this
could be SD_BALANCE_NEWIDLE, does something like:

for i in /proc/sys/kernel/sched_domain/cpu*/domain*/flags; 
do 
	val=`cat $i`; echo $((val & ~0x02)) > $i; 
done

[ assuming SCHED_DEBUG=y ]

Cure things?

If so, then its spending time looking for work, which there might not be
on your machine, since everything is waiting for IO or somesuch.

Not really sure what to do about it though, this is a quad socket
nehalem, right? We could possibly disable SD_BALANCE_NEWIDLE on the NODE
level, but that would again decrease throughput in things like kbuild.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/