lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AANLkTik-Rt-Fy-2fKQTr2JBB9f7U2cbc0FmTZOoXc1+5@mail.gmail.com>
Date:	Tue, 22 Mar 2011 14:44:27 +0545
From:	Ben Nagy <ben@...u.net>
To:	Eric Dumazet <eric.dumazet@...il.com>
Cc:	Thomas Gleixner <tglx@...utronix.de>, Avi Kivity <avi@...hat.com>,
	KVM list <kvm@...r.kernel.org>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	John Stultz <johnstul@...ibm.com>,
	Richard Cochran <richard.cochran@...cron.at>
Subject: Re: [PATCH] posix-timers: RCU conversion

On Tue, Mar 22, 2011 at 12:54 PM, Eric Dumazet <eric.dumazet@...il.com> wrote:
> Ben Nagy reported a scalability problem with KVM/QEMU that hit very hard
> a single spinlock (idr_lock) in posix-timers code, on its 48 core
> machine.

Hi all,

Thanks a lot for all the help so far. We've tested with Eric's patch.

First up, here's our version of the patch for the current ubuntu
kernel from git:
http://paste.ubuntu.com/583668/

Here's top with 96 idle guests running:
op - 16:47:53 up  1:09,  3 users,  load average: 0.00, 0.01, 0.05
Tasks: 499 total,   3 running, 496 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.9%us,  3.2%sy,  0.0%ni, 95.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  99068656k total, 13121096k used, 85947560k free,    22192k buffers
Swap:  2438140k total,        0k used,  2438140k free,  3597860k cached
 (much better!)

Start of perf top:

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   PerfTop:   10318 irqs/sec  kernel:97.4%  exact:  0.0% [1000Hz
cycles],  (all, 48 CPUs)
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

             samples  pcnt function                    DSO
             _______ _____ ___________________________
___________________________________________________________

            95444.00 59.3% __ticket_spin_lock
[kernel.kallsyms]
            12937.00  8.0% native_safe_halt
[kernel.kallsyms]
             6149.00  3.8% kvm_get_cs_db_l_bits
/lib/modules/2.6.38-7-server/kernel/arch/x86/kvm/kvm.ko
             5105.00  3.2% tg_load_down
[kernel.kallsyms]
             5088.00  3.2% svm_vcpu_run
/lib/modules/2.6.38-7-server/kernel/arch/x86/kvm/kvm-amd.ko
             4807.00  3.0% kvm_set_pfn_dirty
/lib/modules/2.6.38-7-server/kernel/arch/x86/kvm/kvm.ko
             2855.00  1.8% ktime_get
[kernel.kallsyms]
             1535.00  1.0% find_busiest_group
[kernel.kallsyms]
             1386.00  0.9% find_next_bit
[kernel.kallsyms]


Start of perf report -g
    55.26%            kvm  [kernel.kallsyms]     [k] __ticket_spin_lock
                      |
                      --- __ticket_spin_lock
                         |
                         |--94.68%-- _raw_spin_lock
                         |          |
                         |          |--97.55%-- double_rq_lock
                         |          |          load_balance
                         |          |          idle_balance
                         |          |          schedule
                         |          |          |
                         |          |          |--60.56%--
schedule_hrtimeout_range_clock
                         |          |          |
schedule_hrtimeout_range
                         |          |          |          poll_schedule_timeout
                         |          |          |          do_select
                         |          |          |          core_sys_select
                         |          |          |          sys_select
                         |          |          |          system_call_fastpath


Here is the perf.data from the unpatched (non debug) kernel
http://www.coseinc.com/woigbfwr32/perf.data

Here is the perf.data from the patched (non debug) kernel
http://www.coseinc.com/woigbfwr32/perf_patched.data

I think we're certainly in 'it's going to be useable' territory now,
but any further improvements or patches to test would of course be
gratefully received! Next step from my end is to test the guests under
load, unless there are any other suggestions.

I'm extremely impressed by the speed and professionalism of the
response to this problem, both from those on #kvm and the widening
circle of those on this email thread.

Many thanks!

Cheers,

ben
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ