lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090703025607.GK5480@parisc-linux.org>
Date:	Thu, 2 Jul 2009 20:56:08 -0600
From:	Matthew Wilcox <matthew@....cx>
To:	linux-kernel@...r.kernel.org
Cc:	"Styner, Douglas W" <douglas.w.styner@...el.com>,
	Chinang Ma <chinang.ma@...el.com>,
	"Prickett, Terry O" <terry.o.prickett@...el.com>,
	Matthew Wilcox <matthew.r.wilcox@...el.com>
Subject: >10% performance degradation since 2.6.18


The team of database performance specialists that I work with have put
together a shiny new system with dual Nehalem processors and 192 SSDs.
The SSDs are in SAS enclosures which are connected to LSI 3801 SAS
controllers.

Unfortunately, 2.6.30's performance has fallen off a cliff compared to Red
Hat Enterprise 5.2 (2.6.18-92).  Because Nehalem support was added after
2.6.18, doing bisection is somewhat of a pain, and because it's a great
big OLTP benchmark which takes hours to run, it's even more impractical.

We've included the top 30 functions below, but honestly, we're looking
at a 10% dip in performance, and shaving the length of time it takes
to execute mpt_interrupt and kmem_cache_alloc in half doesn't feel like
it's going to be enough.

On the subject of kmem_cache_alloc, this run was using SLAB, not SLUB
or SLQB.  I've attached the .configs for both kernels, in case they help.

Including the raw /proc/interrupts doesn't seem very helpful.  Lots of
columns (16 CPUs), mostly with zeroes in them.  I've written a hacky
little perl script to summarise the contents of /proc/interrupts.
Mail me for it if you want it.

-----------------------
Linux OLTP Performance summary
Kernel#            Speedup(x)   Intr/s  CtxSw/s us%     sys%    idle%
iowait%
2.6.18-92.el5-op        1.000   137524  183275  68      28      1       3
2.6.30                  0.897   171211  152962  71      29      0       0

Server configurations:
NHM-EP 2.93GHz+turbo 2 sockets/8 cores/16 threads
72GB memory.  4 LSI 3801SAS + 2 QLA2300, 192 SSDs+ 28 spindles log

Summary of /proc/interrupts for 2.6.18-92.el5-op:

114:   5474028  IO-APIC-level  qla2xxx
122:       404  IO-APIC-level  qla2xxx
130:  69084136  PCI-MSI-X  eth1-Q0
138:         1  PCI-MSI-X  eth1
146:  61405320  PCI-MSI-X  eth0-Q0
154:         1  PCI-MSI-X  eth0
162:  98883979  PCI-MSI  ioc0
170: 100831391  PCI-MSI  ioc1
178:  99384797  PCI-MSI  ioc2
186:  89566728  PCI-MSI  ioc3
NMI:  53394359
LOC:  84892159

Summary of /proc/interrupts for 2.6.30:

 48:   5766139  IO-APIC-fasteoi   qla2xxx
 49:       143  IO-APIC-fasteoi   qla2xxx
 79:     19028  PCI-MSI-edge      ahci
 80:  13129177  PCI-MSI-edge      eth1-tx-0
 81:  12916368  PCI-MSI-edge      eth1-tx-1
 82:  10176805  PCI-MSI-edge      eth1-tx-2
 83:  12145055  PCI-MSI-edge      eth1-tx-3
 84:  13942598  PCI-MSI-edge      eth1-rx-0
 85:  23239552  PCI-MSI-edge      eth1-rx-1
 86:  24251265  PCI-MSI-edge      eth1-rx-2
 87:  12875519  PCI-MSI-edge      eth1-rx-3
 88:         1  PCI-MSI-edge      eth1
 89:   8880631  PCI-MSI-edge      eth0-tx-0
 90:   9251548  PCI-MSI-edge      eth0-tx-1
 91:   7306336  PCI-MSI-edge      eth0-tx-2
 92:  10463687  PCI-MSI-edge      eth0-tx-3
 93:  11147199  PCI-MSI-edge      eth0-rx-0
 94:  11155044  PCI-MSI-edge      eth0-rx-1
 95:  11155852  PCI-MSI-edge      eth0-rx-2
 96:   9921438  PCI-MSI-edge      eth0-rx-3
 97:         1  PCI-MSI-edge      eth0
 98:  96487143  PCI-MSI-edge      ioc0
 99:  98432533  PCI-MSI-edge      ioc1
100:  96488192  PCI-MSI-edge      ioc2
101:  87011174  PCI-MSI-edge      ioc3
NMI:  57707812  Non-maskable interrupts
LOC:  73336211  Local timer interrupts
SPU:         0  Spurious interrupts
RES:  13506834  Rescheduling interrupts
CAL:     70455  Function call interrupts
TLB:     19341  TLB shootdowns
TRM:         0  Thermal event interrupts
THR:         0  Threshold APIC interrupts


======oprofile CPU_CLK_UNHALTED for top 30 functions
Cycles% 2.6.18-92.el5-op           Cycles% 2.6.30
70.1409 <database>                 67.0207 <database>
1.3556 mpt_interrupt               1.7029 mpt_interrupt
1.1622 __blockdev_direct_IO        1.1443 kmem_cache_alloc
0.8246 kmem_cache_free             0.8801 kmem_cache_free
0.7108 schedule                    0.7774 __blockdev_direct_IO
0.6733 scsi_request_fn             0.7031 scsi_request_fn
0.6114 kmem_cache_alloc            0.5317 __schedule
0.4207 follow_hugetlb_page         0.3922 task_rq_lock
0.4062 list_del                    0.3629 sd_prep_fn
0.3400 __switch_to                 0.3504 list_del
0.3339 generic_make_request        0.3382 __sigsetjmp
0.3204 memmove                     0.3270 __switch_to
0.3088 __sigsetjmp                 0.3257 generic_make_request
0.2848 get_request                 0.3116 kfree
0.2804 lock_timer_base             0.2895 memmove
0.2789 kfree                       0.2803 try_to_wake_up
0.2736 scsi_get_command            0.2625 fget_light
0.2732 task_rq_lock                0.2579 generic_file_aio_read
0.2716 scsi_prep_fn                0.2530 mptscsih_io_done
0.2572 __end_that_request_first    0.2402 aio_complete
0.2567 fget_light                  0.2382 mptscsih_qcmd
0.2531 submit_page_section         0.2342 fget
0.2428 mempool_alloc               0.2277 gup_huge_pmd
0.2428 __generic_file_aio_read     0.2264 submit_page_section
0.2368 touch_atime                 0.2204 touch_atime
0.2270 __aio_get_req               0.2165 __list_add
0.2223 mptscsih_qcmd               0.2063 scsi_dispatch_cmd
0.2198 init_request_from_bio       0.2040 lock_timer_base
0.2191 fget                        0.2036 irq_entries_start
0.2141 device_not_available        0.2036 plist_del
0.2125 try_to_wake_up              0.2004 elv_queue_empty
0.2065 mptscsih_io_done            0.2004 get_user_pages_fast
0.2059 math_state_restore          0.1997 copy_user_generic_string
0.2035 __errno_location            0.1925 kref_get
0.2022 find_vma                    0.1905 scsi_finish_command
0.1967 _setjmp                     0.1888 aio_rw_vect_retry
0.1966 kref_get                    0.1882 __errno_location
0.1952 memset                      0.1849 scsi_device_unbusy
0.1924 __list_add                  0.1846 pick_next_highest_task_rt
0.1917 copy_user_generic           0.1826 memset_c
0.1907 acpi_os_read_port           0.1819 _setjmp
0.1842 elv_queue_empty             0.1816 ipc_lock
0.1809 scsi_dispatch_cmd           0.1809 mod_timer
0.1808 sd_init_command             0.1800 noop_queue_empty
0.1789 swiotlb_unmap_sg            0.1796 scsi_softirq_done
0.1766 rw_verify_area              0.1757 scsi_run_queue
-----------------------------------

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

View attachment "linux-2.6.18-92.el5-op.config" of type "text/plain" (62398 bytes)

View attachment "linux-2.6.30.config" of type "text/plain" (81660 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ