[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.2.00.1002171743200.4141@localhost.localdomain>
Date: Wed, 17 Feb 2010 17:53:50 -0800 (PST)
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: "H. Peter Anvin" <hpa@...or.com>
cc: Zachary Amsden <zamsden@...hat.com>, linux-kernel@...r.kernel.org,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, x86@...nel.org,
Avi Kivity <avi@...hat.com>
Subject: Re: [PATCH] x86 rwsem optimization extreme
On Wed, 17 Feb 2010, H. Peter Anvin wrote:
>
> FWIW, I don't know of any microarchitecture where adc is slower than
> add, *as long as* the setup time for the CF flag is already used up.
Oh, I think there are lots.
Look at just about any x86 latency/throughput table, and you'll see:
- adc latencies are typically much higher than a single cycle
But you are right that this is likel not an issue on any out-of-order
chip, since the 'stc' will schedule perfectly.
- but adc _throughput_ is also typically much higher, which indicates
that even if you do flag renaming, the 'adc' quite likely only
schedules in a single ALU unit.
For example, on a Pentium, adc/sbb can only go in the U pipe, and I think
the same is true of 'stc'. Now, nobody likely cares about Pentiums any
more, but the point is, 'adc' does often have constraints that a regular
'add' does not, and there's an example of a 'stc+adc' pair would at the
very least have to be scheduled with an instruction in between.
Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists