[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20061204015323.GA28519@gnuppy.monkey.org>
Date: Sun, 3 Dec 2006 17:53:23 -0800
From: Bill Huey (hui) <billh@...ppy.monkey.org>
To: Ingo Molnar <mingo@...e.hu>
Cc: Steven Rostedt <rostedt@...dmis.org>,
Thomas Gleixner <tglx@...utronix.de>,
linux-kernel@...r.kernel.org,
"Eric W. Biederman" <ebiederm@...ssion.com>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Subject: [PATCH 0/4] lock stat for 2.6.19-rt1
Hello,
I'm please to announce the "lock stat" patch which instruments all locks in
the kernel tracking contention against the slow path of an a rt_mutex in -rt
kernels before blocking and possibly priority inheritence boosting. This is
for 2.6.19-rt1. In the real time patch, all locks such as a mutex, semaphore,
etc... are funneled through the rt_mutex as well as locks such as a spinlock
which preserve the semantics of BKL when hitting scheduler code.
On a contention, it increments a field atomically that's pointed in a variable
within the rt_mutex structure. It records the numbers of times the rt_mutex
is contented against during the lifetime of that lock.
By using a hash constructed out of __FILE__, __func__ and __LINE__ of the
actual lock initializer function (spin_lock_init() and friends) single
backing object within a red black tree based dictionary can serve to record
these statistics versus explicitly tracking the life and death of a lock
explicitly. This is more meaningful than having these statistics recorded
on an individual lock instance. It's a higher order of expression with how
lock statistics are represented since it's about the macro behavior of the
kernel and therefore is more meaningful.
I'll defer the other parts of the discussion to the comments in the patch
itself. I just want to get this patch right now with further delaying this.
It's quite similar to how lockdep also hooks into the kernel but I wasn't
aware of how lockdep worked when I was writing this code only to be told by
Peter Zijlstra after his review of the patch.
This is superior to lock meter for a number of reasons. First it's the -rt
kernel (and we all know how I love the -rt kernel :) ) it is Linux specific,
it hooks into lockdep closely and has a friendly proc interface to go with
it with the ability to reset stats during load testing runs by simply
writing to the file, /proc/lock_stat/contention.
Sample output is attached:
-------------------------------
[1, 1, 0] {finish_wait, -, 0}
[1, 1, 0] {lru_add_drain, -, 0}
[1, 1, 0] {rt_down_read, -, 0}
[1, 1, 0] {__sync_inodes, -, 0}
[1, 1, 0] {unix_create1, -, 0}
[1110, 1, 0] {kmem_cache_free, -, 0}
[11, 1, 0] {pdflush, -, 0}
[1, 123148, 0] {init_buffer_head, fs/buffer.c, 2968}
[1, 2, 0] {__svc_create, net/sunrpc/svc.c, 328}
[14, 11446, 0] {sock_lock_init, net/core/sock.c, 809}
[14373, 1, 0] {ide_intr, -, 0}
[14878, 13, 0] {create_workqueue_thread, kernel/workqueue.c, 347}
[15, 123148, 0] {init_buffer_head, fs/buffer.c, 2969}
[153, 1, 0] {__free_pages_ok, -, 0}
[17, 1, 0] {init_timers_cpu, kernel/timer.c, 1834}
[17533, 1, 0] {_atomic_dec_and_spin_lock, -, 0}
[203, 1, 0] {release_pages, -, 0}
[2040, 1, 0] {do_wait, -, 0}
[2, 1, 0] {cache_alloc_refill, -, 0}
[2, 1, 0] {sys_setsid, -, 0}
[23, 996648, 0] {inode_init_once, fs/inode.c, 199}
[242, 996648, 0] {inode_init_once, fs/inode.c, 197}
[251, 1, 0] {get_zone_pcp, -, 0}
[25, 1248, 0] {anon_vma_ctor, mm/rmap.c, 168}
[2725, 1, 0] {lock_kernel, -, 0}
[3, 1, 0] {lock_timer_base, -, 0}
[3, 1, 0] {serio_thread, -, 0}
[31, 81, 0] {kmem_list3_init, mm/slab.c, 411}
[32, 1, 0] {lru_cache_add_active, -, 0}
[3, 256, 0] {init, kernel/futex.c, 2776}
[33, 1, 0] {get_page_from_freelist, -, 0}
[35, 1, 0] {do_lookup, -, 0}
[37, 1, 0] {file_kill, -, 0}
[39, 1, 0] {__cache_free, -, 0}
[403, 204, 0] {sighand_ctor, kernel/fork.c, 1469}
[4, 16, 0] {xprt_create_transport, net/sunrpc/xprt.c, 930}
[4, 29369, 0] {mm_init, kernel/fork.c, 368}
[49, 2, 0] {journal_init_common, fs/jbd/journal.c, 666}
[5, 1, 0] {mm_init, -, 0}
[52, 1, 0] {lookup_mnt, -, 0}
[549, 0, 1349567] {, d_alloc_string, 1}
[5871, 2, 0] {journal_init_common, fs/jbd/journal.c, 667}
[59, 1, 0] {nv_probe, drivers/net/forcedeth.c, 4241}
[6, 11443, 0] {sock_init_data, net/core/sock.c, 1503}
[8, 2, 0] {journal_init_revoke, fs/jbd/revoke.c, 261}
[8264, 996648, 0] {inode_init_once, fs/inode.c, 196}
[8552, 996648, 0] {inode_init_once, fs/inode.c, 193}
[86, 1, 0] {exit_mmap, -, 0}
[86, 86258, 0] {init_waitqueue_head, kernel/wait.c, 15}
[8968, 1, 0] {iget_locked, -, 0}
[9, 1, 0] {tty_ldisc_try, -, 0}
[9, 1, 0] {uart_set_options, drivers/serial/serial_core.c, 1876}
@contention events = 86925
@found = 9
-------------------------------
The first column is the number of the times that object was contented against.
The second is the number of times this lock object was initialized. The third
is the annotation scheme that directly attaches the lock object (spinlock,
etc..) in line with the function initializer to avoid the binary tree lookup.
This shows contention issue with inode access around what looks like a lock
around a tree structure (inode_init_once @ lines 193/196) as well as lower
layers of the IO system around the anticipatory scheduler and interrupt handlers.
Keep in mind this is -rt so we're using interrupt threads here.
This result is a combination of series of normal "find" loads as a result of
the machine being idle overnight in a 2x AMD process set up. I'm sure that
bigger machines should generate more interesting loads from the IBM folks and
others.
Thanks to all of the folks on #offtopic2 (Zwane Mwaikambo, nikita, Rik van Riel,
the ever increasingly bitter Con Kolivas, Bill Irwin, etc...) for talking me
through early stages of this process and general support.
I'm requesting review, comments and inclusion of this into the -rt series of
patches as well as general help.
I've CCed folks that might be interested in this patch outside of the normal
core -rt that I thought might find this interesting or have been friendly with
me about -rt in the past.
bill
View attachment "corefiles.diff" of type "text/plain" (22958 bytes)
Powered by blists - more mailing lists