linux-kernel - Re: [PATCH v2 0/3] staging: zcache: xcfmalloc support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 15 Sep 2011 15:17:42 -0700
From:	Dave Hansen <dave@...ux.vnet.ibm.com>
To:	Seth Jennings <sjenning@...ux.vnet.ibm.com>
Cc:	Dan Magenheimer <dan.magenheimer@...cle.com>,
	Nitin Gupta <ngupta@...are.org>, Greg KH <greg@...ah.com>,
	gregkh@...e.de, devel@...verdev.osuosl.org,
	cascardo@...oscopio.com, linux-kernel@...r.kernel.org,
	linux-mm@...ck.org, brking@...ux.vnet.ibm.com,
	rcj@...ux.vnet.ibm.com
Subject: Re: [PATCH v2 0/3] staging: zcache: xcfmalloc support

On Thu, 2011-09-15 at 14:24 -0500, Seth Jennings wrote:
> How would you suggest that I measure xcfmalloc performance on a "very
> large set of workloads".  I guess another form of that question is: How
> did xvmalloc do this?

Well, it didn't have a competitor, so this probably wasn't done. :)

I'd like to see a microbenchmarky sort of thing.  Do a million (or 100
million, whatever) allocations, and time it for both allocators doing
the same thing.  You just need to do the *same* allocations for both.

It'd be interesting to see the shape of a graph if you did:

	for (i = 0; i < BIG_NUMBER; i++) 
		for (j = MIN_ALLOC; j < MAX_ALLOC; j += BLOCK_SIZE) 
			alloc(j);
			free();

... basically for both allocators.  Let's see how the graphs look.  You
could do it a lot of different ways: alloc all, then free all, or alloc
one free one, etc...  Maybe it will surprise us.  Maybe the page
allocator overhead will dominate _everything_, and we won't even see the
x*malloc() functions show up.

The other thing that's important is to think of cases like I described
that would cause either allocator to do extra splits/joins or be slow in
other ways.  I expect xcfmalloc() to be slowest when it is allocating
and has to break down a reserve page.  Let's say it does a bunch of ~3kb
allocations and has no pages on the freelists, it will:

	1. scan each of the 64 freelists heads (512 bytes of cache)
	2. split a 4k page
	3. reinsert the 1k remainder

Next time, it will:

	1. scan, and find the 1k bit
	2. continue scanning, eventually touching each freelist...
	3. split a 4k page
	4. reinsert the 2k remainder

It'll end up doing a scan/split/reinsert in 3/4 of the cases, I think.
The case of the freelists being quite empty will also be quite common
during times the pool is expanding.  I think xvmalloc() will have some
of the same problems, but let's see if it does in practice.

-- Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/