linux-kernel - Re: [PATCH 7/9] Make idr

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080520052904.GA11326@tpepper-t42p.dolavim.us>
Date:	Mon, 19 May 2008 22:29:05 -0700
From:	Tim Pepper <lnxninja@...ux.vnet.ibm.com>
To:	Nadia Derbey <Nadia.Derbey@...l.net>
Cc:	Tim Pepper <lnxninja@...ux.vnet.ibm.com>, manfred@...orfullife.com,
	paulmck@...ux.vnet.ibm.com, linux-kernel@...r.kernel.org,
	efault@....de, akpm@...ux-foundation.org
Subject: Re: [PATCH 7/9] Make idr_remove rcu-safe

On Thu 15 May at 09:40:13 +0200 Nadia.Derbey@...l.net said:
> Tim Pepper wrote:
>> On Wed 07 May at 13:36:00 +0200 Nadia.Derbey@...l.net said:
>>
>>> [PATCH 07/09]
>>>
>>> This patch introduces the free_layer() routine: it is the one that actually
>>> frees memory after a grace period has elapsed.
>>>
>>> Index: linux-2.6.25-mm1/lib/idr.c
>>> ===================================================================
>>> --- linux-2.6.25-mm1.orig/lib/idr.c	2008-05-06 18:06:43.000000000 +0200
>>> +++ linux-2.6.25-mm1/lib/idr.c	2008-05-07 09:07:31.000000000 +0200
>>> @@ -424,15 +455,13 @@ void idr_remove_all(struct idr *idp)
>>>
>>> 		id += 1 << n;
>>> 		while (n < fls(id)) {
>>> -			if (p) {
>>> -				memset(p, 0, sizeof *p);
>>> -				move_to_free_list(idp, p);
>>> -			}
>>> +			if (p)
>>> +				free_layer(p);
>>> 			n += IDR_BITS;
>>> 			p = *--paa;
>>> 		}
>>> 	}
>>> -	idp->top = NULL;
>>> +	rcu_assign_pointer(idp->top, NULL);
>>> 	idp->layers = 0;
>>> }
>>> EXPORT_SYMBOL(idr_remove_all);
>>
>>
>> Does idr_remove_all() need an rcu_dereference() in the loop preceeding the
>> above, where it does:
>>
>>                 while (n > IDR_BITS && p) {
>>                         n -= IDR_BITS;
>>                         *paa++ = p;
>>      ---->              p = p->ary[(id >> n) & IDR_MASK];
>>                 }
>
> I assumed here that idr_remove_all() was called in the "typical cleanup 
> sequence" mentioned in the comment describing the routine.
> And actually, that's how it is called in drivers/char/drm.

Sure.  I guess I was thinking out loud that it's maybe somewhat implicit
that things must be serial at that point and I wasn't sure if it was meant
to be required or enforced, if it should be clarified with comments or
by explicitly adding the rcu calls in these couple additional places.

>> I've been having some machine issues, but hope to give this patch set a run
>> still on a 128way machine and mainline to provide some additional
>> datapoints.
>>
>
> That would be kind, indeed (hope I didn't break anything).

I've had a bunch of machine issues, but I did manage to do some testing.

I'd started looking at the regression after Anton Blanchard mentioned
seeing it via this simple bit of code:
    http://ozlabs.org/~anton/junkcode/lock_comparison.c
It essentially spawns a number of threads to match the cpu count, each
thread looping 10000 times, where each loop does some trivial semop()'s.
Each thread has its own semaphore it is semop()'ing so there's no
contention.

To get a little more detail I hacked Anton's code minimally to record
results for thread counts 1..n and also to optionally have just a single
semaphore on which all of these threads are contending.  I ran this on
a 64cpu machine (128 given SMT), but didn't make any effort to force
clean thread/cpu affinity.

The contended numbers don't look quite as consistent as they could at
the high end, but I think this is more quick/dirty test than patch.
Nevertheless the patch clearly outperforms mainline and despite the
noise actually looks to perform a more consistently than mainline
(see graphs).

Summary numbers from a run (with a bit more detail at the high thread
side as the numbers had more variation there and just showing the power
of two numbers for this run incorrectly implies a knee...I can do more
runs with averages/stats if people need more convincing):

threads          uncontended                     contended
                 semop loops                    semop loops

        2.6.26-rc2      +patchset        2.6.26-rc2      +patchset

1	  2243.94	  4436.25	   2063.18	   4386.78
2	  2954.11	  5445.12	  67885.16	   5906.72
4	  4367.45	  8125.67	  72820.32	  44263.86
8	  7440.00	  9842.66	  60184.17	  95677.58
16	 12959.44	 20323.97	 136482.42	 248143.80
32	 35252.71	 56334.28	 363884.09	 599773.31
48	 62811.15	102684.67	 515886.12	1714530.12
...
57	 81064.99	141434.33	 564874.69	2518078.75
58	 79486.08	145685.84	 693038.06	1868813.12
59	 83634.19	153087.80	1237576.25	2828301.25
60	 91581.07	158207.08	 797796.94	2970977.25
61	 89209.40	160529.38	1202135.38	2538114.50
62	 89008.45	167843.78	1305666.75	2274845.00
63	 97753.17	177470.12	 733957.31	 363952.62
64	102556.05	175923.56	1356988.88	 199527.83

(detail plots from this same run attached...)

Nadia, you're welcome to add either or both of these to the series if
you'd like:

Reviewed-by: Tim Pepper <lnxninja@...ux.vnet.ibm.com>
Tested-by: Tim Pepper <lnxninja@...ux.vnet.ibm.com>

-- 
Tim Pepper  <lnxninja@...ux.vnet.ibm.com>
IBM Linux Technology Center

Download attachment "2.6.26-rc2.contended.128_semops.png" of type "image/png" (13381 bytes)

Download attachment "2.6.26-rc2-patched.contended_semops.png" of type "image/png" (11563 bytes)

Download attachment "2.6.26-rc2.uncontended_semops.png" of type "image/png" (9257 bytes)

Download attachment "2.6.26-rc2-patched.uncontended_semops.png" of type "image/png" (11056 bytes)