[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ff13bc9a1002180250t5c9acb54ib8609b4e752520c5@mail.gmail.com>
Date: Thu, 18 Feb 2010 11:50:15 +0100
From: Luca Barbieri <luca@...a-barbieri.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: mingo@...e.hu, hpa@...or.com, akpm@...ux-foundation.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 09/10] x86-32: use SSE for atomic64_read/set if available
On Thu, Feb 18, 2010 at 11:25 AM, Peter Zijlstra <peterz@...radead.org> wrote:
> On Wed, 2010-02-17 at 12:42 +0100, Luca Barbieri wrote:
>> +DEFINE_PER_CPU_ALIGNED(struct sse_atomic64_percpu, sse_atomic64_percpu);
>> +
>> +/* using the fpu/mmx looks infeasible due to the need to save the FPU environment, which is very slow
>> + * SSE2 is slightly slower on Core 2 and less compatible, so avoid it for now
>> + */
>> +long long sse_atomic64_read_cx8call(long long dummy, const atomic64_t *v)
>> +{
>> + long long res;
>> + unsigned long cr0 = 0;
>> + struct thread_info *me = current_thread_info();
>> + preempt_disable();
>> + if (!(me->status & TS_USEDFPU)) {
>> + cr0 = read_cr0();
>> + if (cr0 & X86_CR0_TS)
>> + clts();
>> + }
>> + asm volatile(
>> + "movlps %%xmm0, " __percpu_arg(0) "\n\t"
>> + "movlps %3, %%xmm0\n\t"
>> + "movlps %%xmm0, " __percpu_arg(1) "\n\t"
>> + "movlps " __percpu_arg(0) ", %%xmm0\n\t"
>> + : "+m" (per_cpu__sse_atomic64_percpu.xmm0_low), "=m" (per_cpu__sse_atomic64_percpu.low), "=m" (per_cpu__sse_atomic64_percpu.high)
>> + : "m" (v->counter));
>> + if (cr0 & X86_CR0_TS)
>> + write_cr0(cr0);
>> + res = (long long)(unsigned)percpu_read(sse_atomic64_percpu.low) | ((long long)(unsigned)percpu_read(sse_atomic64_percpu.high) << 32);
>> + preempt_enable();
>> + return res;
>> +}
>> +EXPORT_SYMBOL(sse_atomic64_read_cx8call);
>
> Care to explain how this is IRQ and NMI safe?
Unfortunately it isn't, due to the per-CPU variables, and thus needs
to be fixed to align the stack and use it instead
(__attribute__((force_align_arg_pointer)) should do the job).
Sorry for this, I initially used the stack and later changed it to
guarantee alignment without rechecking the IRQ/NMI safety.
If we use the stack instead of per-CPU variables, all IRQs and NMIs
preserve CR0 and the SSE registers, so this would be safe, right?
kernel_fpu_begin/end cannot be used in interrupts, so that shouldn't
be a concern.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists