lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1262592958.22471.104.camel@minggr.sh.intel.com>
Date:	Mon, 04 Jan 2010 16:15:58 +0800
From:	Lin Ming <ming.m.lin@...el.com>
To:	Mike Galbraith <efault@....de>,
	Peter Zijlstra <peterz@...radead.org>
Cc:	lkml <linux-kernel@...r.kernel.org>,
	"Zhang, Yanmin" <yanmin_zhang@...ux.intel.com>
Subject: volano ~30% regression with 2.6.33-rc1 & -rc2

Mike & Peter,

Compared with 2.6.32, volano has ~30% regression with 2.6.33-rc1 & -rc2.
Testing machine: Tigerton Xeon, 16cpus(4P/4Core), 16G memory

Bisect to below commit,

commit a1f84a3ab8e002159498814eaa7e48c33752b04b
Author: Mike Galbraith <efault@....de>
Date:   Tue Oct 27 15:35:38 2009 +0100

    sched: Check for an idle shared cache in select_task_rq_fair()

    When waking affine, check for an idle shared cache, and if
    found, wake to that CPU/sibling instead of the waker's CPU.

    This improves pgsql+oltp ramp up by roughly 8%. Possibly more
    for other loads, depending on overlap. The trade-off is a
    roughly 1% peak downturn if tasks are truly synchronous.

    Signed-off-by: Mike Galbraith <efault@....de>
    Cc: Arjan van de Ven <arjan@...radead.org>
    Cc: Peter Zijlstra <peterz@...radead.org>
    Cc: <stable@...nel.org>
    LKML-Reference: <1256654138.17752.7.camel@...ge.simson.net>
    Signed-off-by: Ingo Molnar <mingo@...e.hu>


This commit can't be reverted due to conflict, so I reverted below 4
commits related to idle-shared-cache in 2.6.33-rc2, and then the
performance was restored to 2.6.32.

fe3bcfe (sched: More generic WAKE_AFFINE vs select_idle_sibling())
a50bde5 (sched: Cleanup select_task_rq_fair())
fd21073 (sched: Fix affinity logic in select_task_rq_fair())
a1f84a3 (sched: Check for an idle shared cache in select_task_rq_fair())

This regression seems caused by cache misses of access to per cpu data.
(see below perf top cache-misses data for detail)

select_idle_sibling(...)
{
        ....
        for_each_cpu_and(i, sched_domain_span(sd), &p->cpus_allowed) {
                if (!cpu_rq(i)->cfs.nr_running) {
                        target = i;
                        break;
                }
        }
	....
}

The performance can be restored to 2.6.32 as well if SD_PREFER_SIBLING
is not set, so select_idle_sibling will not be called.

perf top data as follow,

2.6.33-rc1 cache-misses data (note 11.8% select_task_rq_fair)
------------------------------------------------------------------------------------
   PerfTop:   12262 irqs/sec  kernel:90.6% [1000Hz cache-misses],  (all, 16 CPUs)
------------------------------------------------------------------------------------

             samples  pcnt function                      DSO
             _______ _____ _____________________________ ________________

            18272.00 11.8% select_task_rq_fair           [kernel.kallsyms]       
            15499.00 10.0% schedule                      [kernel.kallsyms]       
             9447.00  6.1% update_curr                   [kernel.kallsyms]       
             9255.00  6.0% _raw_spin_lock                [kernel.kallsyms]       
             5161.00  3.3% tcp_sendmsg                   [kernel.kallsyms] 

2.6.32 cache-misses data
--------------------------------------------------------------------------------------
   PerfTop:   11749 irqs/sec  kernel:88.2% [1000Hz cache-misses],  (all, 16 CPUs)
--------------------------------------------------------------------------------------

             samples  pcnt function                      DSO
             _______ _____ _____________________________ _________________
            11974.00 11.5% schedule                      [kernel.kallsyms]                                                                        
             6656.00  6.4% _spin_lock                    [kernel.kallsyms]                                                                        
             5852.00  5.6% update_curr                   [kernel.kallsyms]                                                                        
             3140.00  3.0% enqueue_entity                [kernel.kallsyms]                                                                        
             2846.00  2.7% tcp_sendmsg                   [kernel.kallsyms] 
    
2.6.33-rc1 cycles data (note 6.5% select_task_rq_fair)        
-------------------------------------------------------------------------------
   PerfTop:   11106 irqs/sec  kernel:99.7% [1000Hz cycles],  (all, 16 CPUs)
-------------------------------------------------------------------------------

             samples  pcnt function                  DSO
             _______ _____ _________________________ _________________

            11658.00 10.0% schedule                  [kernel.kallsyms]
            10870.00  9.4% _raw_spin_lock            [kernel.kallsyms]
             7576.00  6.5% select_task_rq_fair       [kernel.kallsyms]
             3696.00  3.2% tcp_sendmsg               [kernel.kallsyms]
             3000.00  2.6% update_curr               [kernel.kallsyms]

2.6.32 cycles data
------------------------------------------------------------------------------------
   PerfTop:   10462 irqs/sec  kernel:99.8% [1000Hz cycles],  (all, 16 CPUs)
------------------------------------------------------------------------------------

             samples  pcnt function                  DSO
             _______ _____ _________________________ _________________

            13364.00  9.9% schedule                  [kernel.kallsyms]
            13140.00  9.8% _spin_lock                [kernel.kallsyms]
             4903.00  3.6% tcp_sendmsg               [kernel.kallsyms]
             4017.00  3.0% update_curr               [kernel.kallsyms]
             3395.00  2.5% _spin_lock_bh             [kernel.kallsyms]


Lin Ming


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ