[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090703025607.GK5480@parisc-linux.org>
Date: Thu, 2 Jul 2009 20:56:08 -0600
From: Matthew Wilcox <matthew@....cx>
To: linux-kernel@...r.kernel.org
Cc: "Styner, Douglas W" <douglas.w.styner@...el.com>,
Chinang Ma <chinang.ma@...el.com>,
"Prickett, Terry O" <terry.o.prickett@...el.com>,
Matthew Wilcox <matthew.r.wilcox@...el.com>
Subject: >10% performance degradation since 2.6.18
The team of database performance specialists that I work with have put
together a shiny new system with dual Nehalem processors and 192 SSDs.
The SSDs are in SAS enclosures which are connected to LSI 3801 SAS
controllers.
Unfortunately, 2.6.30's performance has fallen off a cliff compared to Red
Hat Enterprise 5.2 (2.6.18-92). Because Nehalem support was added after
2.6.18, doing bisection is somewhat of a pain, and because it's a great
big OLTP benchmark which takes hours to run, it's even more impractical.
We've included the top 30 functions below, but honestly, we're looking
at a 10% dip in performance, and shaving the length of time it takes
to execute mpt_interrupt and kmem_cache_alloc in half doesn't feel like
it's going to be enough.
On the subject of kmem_cache_alloc, this run was using SLAB, not SLUB
or SLQB. I've attached the .configs for both kernels, in case they help.
Including the raw /proc/interrupts doesn't seem very helpful. Lots of
columns (16 CPUs), mostly with zeroes in them. I've written a hacky
little perl script to summarise the contents of /proc/interrupts.
Mail me for it if you want it.
-----------------------
Linux OLTP Performance summary
Kernel# Speedup(x) Intr/s CtxSw/s us% sys% idle%
iowait%
2.6.18-92.el5-op 1.000 137524 183275 68 28 1 3
2.6.30 0.897 171211 152962 71 29 0 0
Server configurations:
NHM-EP 2.93GHz+turbo 2 sockets/8 cores/16 threads
72GB memory. 4 LSI 3801SAS + 2 QLA2300, 192 SSDs+ 28 spindles log
Summary of /proc/interrupts for 2.6.18-92.el5-op:
114: 5474028 IO-APIC-level qla2xxx
122: 404 IO-APIC-level qla2xxx
130: 69084136 PCI-MSI-X eth1-Q0
138: 1 PCI-MSI-X eth1
146: 61405320 PCI-MSI-X eth0-Q0
154: 1 PCI-MSI-X eth0
162: 98883979 PCI-MSI ioc0
170: 100831391 PCI-MSI ioc1
178: 99384797 PCI-MSI ioc2
186: 89566728 PCI-MSI ioc3
NMI: 53394359
LOC: 84892159
Summary of /proc/interrupts for 2.6.30:
48: 5766139 IO-APIC-fasteoi qla2xxx
49: 143 IO-APIC-fasteoi qla2xxx
79: 19028 PCI-MSI-edge ahci
80: 13129177 PCI-MSI-edge eth1-tx-0
81: 12916368 PCI-MSI-edge eth1-tx-1
82: 10176805 PCI-MSI-edge eth1-tx-2
83: 12145055 PCI-MSI-edge eth1-tx-3
84: 13942598 PCI-MSI-edge eth1-rx-0
85: 23239552 PCI-MSI-edge eth1-rx-1
86: 24251265 PCI-MSI-edge eth1-rx-2
87: 12875519 PCI-MSI-edge eth1-rx-3
88: 1 PCI-MSI-edge eth1
89: 8880631 PCI-MSI-edge eth0-tx-0
90: 9251548 PCI-MSI-edge eth0-tx-1
91: 7306336 PCI-MSI-edge eth0-tx-2
92: 10463687 PCI-MSI-edge eth0-tx-3
93: 11147199 PCI-MSI-edge eth0-rx-0
94: 11155044 PCI-MSI-edge eth0-rx-1
95: 11155852 PCI-MSI-edge eth0-rx-2
96: 9921438 PCI-MSI-edge eth0-rx-3
97: 1 PCI-MSI-edge eth0
98: 96487143 PCI-MSI-edge ioc0
99: 98432533 PCI-MSI-edge ioc1
100: 96488192 PCI-MSI-edge ioc2
101: 87011174 PCI-MSI-edge ioc3
NMI: 57707812 Non-maskable interrupts
LOC: 73336211 Local timer interrupts
SPU: 0 Spurious interrupts
RES: 13506834 Rescheduling interrupts
CAL: 70455 Function call interrupts
TLB: 19341 TLB shootdowns
TRM: 0 Thermal event interrupts
THR: 0 Threshold APIC interrupts
======oprofile CPU_CLK_UNHALTED for top 30 functions
Cycles% 2.6.18-92.el5-op Cycles% 2.6.30
70.1409 <database> 67.0207 <database>
1.3556 mpt_interrupt 1.7029 mpt_interrupt
1.1622 __blockdev_direct_IO 1.1443 kmem_cache_alloc
0.8246 kmem_cache_free 0.8801 kmem_cache_free
0.7108 schedule 0.7774 __blockdev_direct_IO
0.6733 scsi_request_fn 0.7031 scsi_request_fn
0.6114 kmem_cache_alloc 0.5317 __schedule
0.4207 follow_hugetlb_page 0.3922 task_rq_lock
0.4062 list_del 0.3629 sd_prep_fn
0.3400 __switch_to 0.3504 list_del
0.3339 generic_make_request 0.3382 __sigsetjmp
0.3204 memmove 0.3270 __switch_to
0.3088 __sigsetjmp 0.3257 generic_make_request
0.2848 get_request 0.3116 kfree
0.2804 lock_timer_base 0.2895 memmove
0.2789 kfree 0.2803 try_to_wake_up
0.2736 scsi_get_command 0.2625 fget_light
0.2732 task_rq_lock 0.2579 generic_file_aio_read
0.2716 scsi_prep_fn 0.2530 mptscsih_io_done
0.2572 __end_that_request_first 0.2402 aio_complete
0.2567 fget_light 0.2382 mptscsih_qcmd
0.2531 submit_page_section 0.2342 fget
0.2428 mempool_alloc 0.2277 gup_huge_pmd
0.2428 __generic_file_aio_read 0.2264 submit_page_section
0.2368 touch_atime 0.2204 touch_atime
0.2270 __aio_get_req 0.2165 __list_add
0.2223 mptscsih_qcmd 0.2063 scsi_dispatch_cmd
0.2198 init_request_from_bio 0.2040 lock_timer_base
0.2191 fget 0.2036 irq_entries_start
0.2141 device_not_available 0.2036 plist_del
0.2125 try_to_wake_up 0.2004 elv_queue_empty
0.2065 mptscsih_io_done 0.2004 get_user_pages_fast
0.2059 math_state_restore 0.1997 copy_user_generic_string
0.2035 __errno_location 0.1925 kref_get
0.2022 find_vma 0.1905 scsi_finish_command
0.1967 _setjmp 0.1888 aio_rw_vect_retry
0.1966 kref_get 0.1882 __errno_location
0.1952 memset 0.1849 scsi_device_unbusy
0.1924 __list_add 0.1846 pick_next_highest_task_rt
0.1917 copy_user_generic 0.1826 memset_c
0.1907 acpi_os_read_port 0.1819 _setjmp
0.1842 elv_queue_empty 0.1816 ipc_lock
0.1809 scsi_dispatch_cmd 0.1809 mod_timer
0.1808 sd_init_command 0.1800 noop_queue_empty
0.1789 swiotlb_unmap_sg 0.1796 scsi_softirq_done
0.1766 rw_verify_area 0.1757 scsi_run_queue
-----------------------------------
--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
View attachment "linux-2.6.18-92.el5-op.config" of type "text/plain" (62398 bytes)
View attachment "linux-2.6.30.config" of type "text/plain" (81660 bytes)
Powered by blists - more mailing lists