lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.00.1003250942080.2670@router.home>
Date:	Thu, 25 Mar 2010 09:49:39 -0500 (CDT)
From:	Christoph Lameter <cl@...ux-foundation.org>
To:	Alex Shi <alex.shi@...el.com>
cc:	linux-kernel@...r.kernel.org, ling.ma@...el.com,
	"Zhang, Yanmin" <yanmin.zhang@...el.com>,
	"Chen, Tim C" <tim.c.chen@...el.com>,
	Pekka Enberg <penberg@...helsinki.fi>
Subject: Re: hackbench regression due to commit 9dfc6e68bfe6e

On Thu, 25 Mar 2010, Alex Shi wrote:

>     SLUB: Use this_cpu operations in slub
>
> The hackbench is prepared hundreds pair of processes/threads. And each
> of pair of processes consists of a receiver and a sender. After all
> pairs created and ready with a few memory block (by malloc), hackbench
> let the sender do appointed times sending to receiver via socket, then
> wait all pairs finished. The total sending running time is the indicator
> of this benchmark. The less the better.

> The socket send/receiver generate lots of slub alloc/free. slabinfo
> command show the following slub get huge increase from about 81412344 to
> 141412497, after command "backbench 150 thread 1000" running.

The number of frees is different? From 81 mio to 141 mio? Are you sure it
was the same load?

> Name                   Objects      Alloc       Free   %Fast Fallb O
> :t-0001024                 870  141412497  141412132  94   1     0 3
> :t-0000256                1607  141225312  141224177  94   1     0 1
>
>
> Via perf tool I collected the L1 data cache miss info of comamnd:
> "./hackbench 150 thread 100"
>
> On 33-rc1, about 1303976612 time L1 Dcache missing
>
> On 9dfc6, about 1360574760 times L1 Dcache missing

I hope this is the same load?

What debugging options did you use? We are now using per cpu operations in
the hot paths. Enabling debugging for per cpu ops could decrease your
performance now. Have a look at a dissassembly of kfree() to verify that
there is no instrumentation.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ