[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4846DC6B.6030802@sgi.com>
Date: Wed, 04 Jun 2008 11:18:19 -0700
From: Mike Travis <travis@....com>
To: Rusty Russell <rusty@...tcorp.com.au>
CC: Christoph Lameter <clameter@....com>,
Andrew Morton <akpm@...ux-foundation.org>,
linux-arch@...r.kernel.org, linux-kernel@...r.kernel.org,
David Miller <davem@...emloft.net>,
Eric Dumazet <dada1@...mosbay.com>,
Peter Zijlstra <peterz@...radead.org>
Subject: Re: [patch 04/41] cpu ops: Core piece for generic atomic per cpu
operations
> cpu_local_inc() does all this: it takes the name of a local_t var, and is
> expected to increment this cpu's version of that. You ripped this out and
> called it CPU_INC().
Hi,
I'm attempting to test both approaches to compare the object generated in order
to understand the issues involved here. Here's my code:
void test_cpu_inc(int *s)
{
__CPU_INC(s);
}
void test_local_inc(local_t *t)
{
__local_inc(THIS_CPU(t));
}
void test_cpu_local_inc(local_t *t)
{
__cpu_local_inc(t);
}
But I don't know how I can use cpu_local_inc because the pointer to the object
is not &__get_cpu_var(l):
#define __cpu_local_inc(l) cpu_local_inc((l))
#define cpu_local_inc(l) cpu_local_wrap(local_inc(&__get_cpu_var((l))))
At the minimum, we would need a new local_t op to get the correct CPU_ALLOC'd
pointer value for the increment. These new local_t ops for CPU_ALLOC'd variables
could use CPU_XXX primitives to implement them, or just a base val_to_ptr primitive
to replace __get_cpu_var().
I did notice this in local.h:
* X86_64: This could be done better if we moved the per cpu data directly
* after GS.
... which it now is, so true per_cpu variables could be optimized better as well.
Also, the above cpu_local_wrap(...) adds:
#define cpu_local_wrap(l) \
({ \
preempt_disable(); \
(l); \
preempt_enable(); \
}) \
... and there isn't a non-preemption version that I can find.
Here are the objects.
0000000000000000 <test_cpu_inc>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 48 83 ec 08 sub $0x8,%rsp
8: 48 89 7d f8 mov %rdi,0xfffffffffffffff8(%rbp)
c: 65 48 ff 45 f8 incq %gs:0xfffffffffffffff8(%rbp)
11: c9 leaveq
12: c3 retq
0000000000000013 <test_local_inc>:
13: 55 push %rbp
14: 65 48 8b 05 00 00 00 mov %gs:0(%rip),%rax # 1c <test_local_inc+0x9>
1b: 00
1c: 48 89 e5 mov %rsp,%rbp
1f: 48 ff 04 07 incq (%rdi,%rax,1)
23: c9 leaveq
24: c3 retq
With a new local_t op then test_local_inc probably could be optimized to be
the same instructions as test_cpu_inc.
One other distinction is CPU_INC increments an arbitrary sized variable
while local_inc requires a local_t variable. This may not make it usable
in all cases.
Thanks,
Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists