[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4B7C91F8.50509@redhat.com>
Date: Wed, 17 Feb 2010 15:03:52 -1000
From: Zachary Amsden <zamsden@...hat.com>
To: "H. Peter Anvin" <hpa@...or.com>
CC: Linus Torvalds <torvalds@...ux-foundation.org>,
linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, x86@...nel.org,
Avi Kivity <avi@...hat.com>
Subject: Re: [PATCH] x86 rwsem optimization extreme
>
> On 02/17/2010 02:10 PM, Linus Torvalds wrote:
>
>> The cost of 'adc' may happen to be identical in this case, but I suspect
>> you didn't test on UP, where the 'lock' prefix goes away. An unlocked
>> 'add' tends to be faster than an unlocked 'adc'.
>>
>> (It's possible that some micro-architectures don't care, since it's a
>> memory op, and they can see that 'C' is set. But it's a fragile assumption
>> that it would always be ok).
>>
>>
> FWIW, I don't know of any microarchitecture where adc is slower than
> add, *as long as* the setup time for the CF flag is already used up.
> However, as I already commented, I don't think this is worth it. This
> inline appears to only be instantiated once, and as such, it takes a
> whopping six bytes across the entire kernel.
>
>
Without the locks,
stc; adc %rdx, (%rax)
vs.
add %rdx, (%rax)
Shows no statistical difference on Intel.
On AMD, the first form is about twice as expensive.
Course this is all completely useless, but it would be if the locks were
inline (which is actually an askable question now). There was just so
much awesomeness going on with the 64-bit rwsem constructs I felt I had
to add even more awesomeness to the plate. For some definition of
awesomeness.
Zach
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists