lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 03 Sep 2013 11:14:54 -0400
From:	Waiman Long <waiman.long@...com>
To:	Ingo Molnar <mingo@...nel.org>
CC:	Al Viro <viro@...IV.linux.org.uk>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Benjamin Herrenschmidt <benh@...nel.crashing.org>,
	Jeff Layton <jlayton@...hat.com>,
	Miklos Szeredi <mszeredi@...e.cz>,
	Ingo Molnar <mingo@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	linux-fsdevel <linux-fsdevel@...r.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Andi Kleen <andi@...stfloor.org>,
	"Chandramouleeswaran, Aswin" <aswin@...com>,
	"Norton, Scott J" <scott.norton@...com>
Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless
 update of refcount

On 09/03/2013 02:01 AM, Ingo Molnar wrote:
> * Waiman Long<waiman.long@...com>  wrote:
>
>> Yes, that patch worked. It eliminated the lglock as a bottleneck in 
>> the AIM7 workload. The lg_global_lock did not show up in the perf 
>> profile, whereas the lg_local_lock was only 0.07%. 
> Just curious: what's the worst bottleneck now in the optimized kernel? :-)
>
> Thanks,
>
> 	Ingo
With the following patches on v3.11:
1. Linus's version of lockref patch
2. Al's lglock patch
3. My preliminary patch to convert prepend_path under RCU

The perf profile of the kernel portion of the short workload in a 
80-core system became like this:

     29.87%     reaim  [kernel.kallsyms]  [k] _raw_spin_lock_irqsave
                   |--50.00%-- tty_ldisc_deref
                   |--49.01%-- tty_ldisc_try
                    --0.99%-- [...]

      7.55%   swapper  [kernel.kallsyms]  [k] intel_idle
      1.03%     reaim  [kernel.kallsyms]  [k] copy_user_generic_string
      0.91%     reaim  [kernel.kallsyms]  [k] _raw_spin_lock
                   |--15.88%-- __rcu_process_callbacks
                   |--6.55%-- load_balance
                   |--6.02%-- sem_lock
                   |--4.77%-- enqueue_to_backlog
                   |--4.21%-- task_rq_lock
                   |--3.97%-- process_backlog
                   |--3.35%-- unix_dgram_sendmsg
                   |--3.28%-- kmem_cache_free
                   |--3.16%-- tcp_v4_rcv
                   |--2.77%-- unix_stream_sendmsg
                   |--2.36%-- rcu_accelerate_cbs
                   |--2.02%-- do_wp_page
                   |--2.02%-- unix_create1
                   |--1.83%-- unix_peer_get
                   |--1.67%-- udp_lib_get_port
                   |--1.66%-- unix_stream_recvmsg
                   |--1.63%-- handle_pte_fault
                   |--1.63%-- udp_queue_rcv_skb
                   |--1.54%-- unix_release_sock
                   |--1.48%-- try_to_wake_up
                   |--1.37%-- do_anonymous_page
                   |--1.37%-- new_inode_pseudo
                   |--1.33%-- __d_lookup
                   |--1.20%-- free_one_page
                   |--1.11%-- __do_fault
                   |--1.06%-- scheduler_tick
                   |--0.90%-- __drain_alien_cache
                   |--0.81%-- inet_csk_get_port
                   |--0.76%-- sock_alloc
                   |--0.76%-- shmem_lock
                   |--0.75%-- __d_instantiate
                   |--0.70%-- __inet_hash_connect
                   |--0.69%-- __inet_hash_nolisten
                   |--0.68%-- ip_local_deliver_finish
                   |--0.64%-- inet_hash
                   |--0.64%-- kfree
                   |--0.60%-- d_path
                   |--0.58%-- __close_fd
                   |--0.51%-- evict
                    --11.76%-- [...]

      0.51%     reaim  [ip_tables]        [k] ipt_do_table
      0.46%     reaim  [kernel.kallsyms]  [k] __alloc_skb
      0.38%     reaim  [kernel.kallsyms]  [k] kfree
      0.36%     reaim  [kernel.kallsyms]  [k] kmem_cache_free
      0.34%     reaim  [kernel.kallsyms]  [k] system_call_after_swapg
      0.32%     reaim  [kernel.kallsyms]  [k] fsnotify
      0.32%     reaim  [kernel.kallsyms]  [k] ip_finish_output
      0.27%     reaim  [kernel.kallsyms]  [k] system_call

Other than the global tty_ldisc_lock, there is no other major
bottleneck. I am not that worry about the tty_ldisc_lock bottleneck
as real world applications probably won't have that many calls to
set the tty driver.

Regards,
Longman

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ