lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <C6267C73.F240%anirban.chakraborty@qlogic.com>
Date:	Tue, 5 May 2009 23:29:53 -0700
From:	Anirban Chakraborty <anirban.chakraborty@...gic.com>
To:	"Styner, Douglas W" <douglas.w.styner@...el.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
CC:	"Tripathi, Sharad C" <sharad.c.tripathi@...el.com>,
	"arjan@...ux.intel.com" <arjan@...ux.intel.com>,
	"Wilcox, Matthew R" <matthew.r.wilcox@...el.com>,
	"Kleen, Andi" <andi.kleen@...el.com>,
	"Siddha, Suresh B" <suresh.b.siddha@...el.com>,
	"Ma, Chinang" <chinang.ma@...el.com>,
	"Wang, Peter Xihong" <peter.xihong.wang@...el.com>,
	"Nueckel, Hubert" <hubert.nueckel@...el.com>,
	"Recalde, Luis F" <luis.f.recalde@...el.com>,
	"Nelson, Doug" <doug.nelson@...el.com>,
	"Cheng, Wu-sun" <wu-sun.cheng@...el.com>,
	"Prickett, Terry O" <terry.o.prickett@...el.com>,
	"Shunmuganathan, Rajalakshmi" <rajalakshmi.shunmuganathan@...el.com>,
	"Garg, Anil K" <anil.k.garg@...el.com>,
	"Chilukuri, Harita" <harita.chilukuri@...el.com>,
	"chris.mason@...cle.com" <chris.mason@...cle.com>
Subject: Re: Mainline kernel OLTP performance update




On 5/4/09 8:54 AM, "Styner, Douglas W" <douglas.w.styner@...el.com> wrote:

> <this time with subject line>
> Summary: Measured the mainline kernel from kernel.org (2.6.30-rc4).
> 
> The regression for 2.6.30-rc4 against the baseline, 2.6.24.2 is 2.15%
> (2.6.30-rc3 regression was 1.91%).  Oprofile reports 70.1204% user, 29.874%
> system.
> 
> Linux OLTP Performance summary
> Kernel#            Speedup(x)   Intr/s  CtxSw/s us%     sys%    idle%
> iowait%
> 2.6.24.2                1.000   22106   43709   75      24      0       0
> 2.6.30-rc4              0.978   30581   43034   75      25      0       0
> 
> Server configurations:
> Intel Xeon Quad-core 2.0GHz  2 cpus/8 cores/8 threads
> 64GB memory, 3 qle2462 FC HBA, 450 spindles (30 logical units)
> 
> 
> ======oprofile CPU_CLK_UNHALTED for top 30 functions
> Cycles% 2.6.24.2                   Cycles% 2.6.30-rc4
> 74.8578 <database>                 67.8732 <database>
> 1.0500 qla24xx_start_scsi          1.1162 qla24xx_start_scsi
> 0.8089 schedule                    0.9888 qla24xx_intr_handler
> 0.5864 kmem_cache_alloc            0.8776 __schedule
> 0.4989 __blockdev_direct_IO        0.7401 kmem_cache_alloc
> 0.4357 __sigsetjmp                 0.4914 read_hpet
> 0.4152 copy_user_generic_string    0.4792 __sigsetjmp
> 0.3953 qla24xx_intr_handler        0.4368 __blockdev_direct_IO
> 0.3850 memcpy                      0.3822 task_rq_lock
> 0.3596 scsi_request_fn             0.3781 __switch_to
> 0.3188 __switch_to                 0.3620 __list_add
> 0.2889 lock_timer_base             0.3377 rb_get_reader_page
> 0.2750 memmove                     0.3336 copy_user_generic_string
> 0.2519 task_rq_lock                0.3195 try_to_wake_up
> 0.2474 aio_complete                0.3114 scsi_request_fn
> 0.2460 scsi_alloc_sgtable          0.3114 ring_buffer_consume
> 0.2445 generic_make_request        0.2932 aio_complete
> 0.2263 qla2x00_process_completed_re0.2730 lock_timer_base
> 0.2118 blk_queue_end_tag           0.2588 memset_c
> 0.2085 dio_bio_complete            0.2588 mod_timer
> 0.2021 e1000_xmit_frame            0.2447 generic_make_request
> 0.2006 __end_that_request_first    0.2426 qla2x00_process_completed_re
> 0.1954 generic_file_aio_read       0.2265 tcp_sendmsg
> 0.1949 kfree                       0.2184 memmove
> 0.1915 tcp_sendmsg                 0.2184 kfree
> 0.1901 try_to_wake_up              0.2103 scsi_device_unbusy
> 0.1895 kref_get                    0.2083 mempool_free
> 0.1864 __mod_timer                 0.1961 blk_queue_end_tag
> 0.1863 thread_return               0.1941 kmem_cache_free
> 0.1854 math_state_restore          0.1921 kref_get

I tried to replicate the scenario. I have used Orion (a database load
generator from Oracle) with following settings. The results do not show
significant difference in cycles.

Setup:
Xeon Quad core (7350), 4 sockets with 16GB memory, 1 qle2462 directly
connected to SanBlaze target with 255 luns.

ORION VERSION 11.1.0.7.0
-run advanced -testname test -num_disks 255 -num_streamIO 16 -write 100
-type seq -matrix point -size_large 1 -num_small 0 -num_large 16 -simulate
raid0 -cache_size 0
 
CPU: Core 2, speed 2933.45 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit
mask of 0x00 (Unhalted core cycles) count 80000
Counted L2_RQSTS events (number of L2 cache requests) with a unit mask of
0x41 (multiple flags) count 6000

2.6.30-rc4                      2.6.24.7
12.4062 tg_shares_up         11.4415 tg_shares_up
6.6774 cache_free_debugcheck  6.3950 check_poison_obj
5.2861 kernel_text_address    6.1896 pick_next_task_fair
4.2201 kernel_map_pages       4.4998 mwait_idle
3.9626 __module_address       3.1111 dequeue_entity
3.7923 _raw_spin_lock         2.8842 mwait_idle
3.1965  kmem_cache_free       2.2679 find_busiest_group
3.1494 __module_text_address  1.7949 _raw_spin_lock
2.5449 find_busiest_group     1.7488 qla24xx_start_scsi
2.4670 mwait_idle             1.5948 find_next_bit
2.2321 qla24xx_start_scsi     1.5433 memset_c
2.1065 kernel_map_pages       1.5265 find_busiest_group
1.9261 is_module_text_address 1.4750 compat_blkdev_ioctl
1.5905 _raw_spin_lock         1.1865 _raw_spin_lock
1.5206 find_next_bit          1.0938 qla24xx_intr_handler
1.2963  cache_alloc_debugcheck_after 0.9805 cache_free_debugcheck
1.2785 memset.c               0.9306 kernel_map_pages
0.9918 __aio_put_req          0.9104 kmem_cache_free
0.9916 check_poison_obj       0.9085 __setscheduler
0.9413 qla24xx_intr_handler   0.8982 sched_rt_handler
0.9081 kmem_cache_alloc       0.8847 kernel_text_address
0.7647 cache_flusharray       0.8634 run_rebalance_domains
0.7213 trace_hardirqs_off     0.8041 _raw_spin_lock
0.6836 __change_page_attr_set_clr 0.7301 cache_alloc_debugcheck_after
0.6450 aio_complete           0.6905 __module_address
0.6365 qla2x00_process_completed_request 0.6630 kmem_cache_alloc
0.6330 delay_tsc              0.6240 memset_c
0.6248 blk_queue_end_tag      0.5501 rwbase_run_test
0.5568 delay_tsc              0.5146 __module_text_address
0.5279 trace_hardirqs_off     0.5064 apic_timer_interrupt
0.5215 scsi_softirq_done      0.4919 cache_free_debugcheck

However, I do notice that profiling report generated is not consistent all
the time. Not sure, if I am missing something in my setup. Sometimes, I do
see following type of error messages popping up while running opreport.
warning: [vdso] (tgid:30873 range:0x7fff6a9fe000-0x7fff6a9ff000) could not
be found.

I was wondering if your kernel config is quite different from mine. I have
attached my kernel config file.

Thanks,
Anirban



Download attachment "config" of type "application/octet-stream" (34560 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ