[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-id: <4B6F2D59.1070508@majjas.com>
Date: Sun, 07 Feb 2010 16:15:05 -0500
From: Michael Breuer <mbreuer@...jas.com>
To: Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Cc: Mike Galbraith <efault@....de>
Subject: Re: x86 - cpu_relax - why nop vs. pause?
On 02/07/2010 03:08 PM, Michael Breuer wrote:
> On 2/7/2010 1:14 PM, Mike Galbraith wrote:
> , and this got me thinking... and testing... I think there's an
> optimization issue with gcc:
>
> First of all - a bit of background on how I got here:
>
> After reading the Intel documentation, I tried replacing rep:nop with
> pause (in theory exactly what's shown above). The system hung on booting.
> I then tried replacing nop with pause (rep:pause) and the system
> booted. Using the above example, the opcode becomes f3 f3 90 vs f3 90
> (rep nop).
>
> Given the above compiler test case, this seemed odd, to say the least.
> So I played a bit more with gcc. Seems that the optimizer (-O3) is
> handling the *three*cases differently (objdump output)
>
> Base code for all three cases (only change is the asm volitile line as
> shown for each case):
>
> static inline void pause(void)
> {
> asm volatile("pause" ::: "memory");
> }
>
> void main(void)
> {
> pause();
> }
>
> Case1 - asm volatile("pause" ::: "memory");
> 0000000000400480 <main>:
> 400480: f3 90 pause
> 400482: c3 retq
> 400483: 90 nop
>
> Case2 - asm volitile("rep;nop" ::: "memory") Note: this didn't inline!
>
> 0000000000400474 <pause>:
> 400474: 55 push %rbp
> 400475: 48 89 e5 mov %rsp,%rbp
> 400478: f3 90 pause
> 40047a: c9 leaveq
> 40047b: c3 retq
>
> 000000000040047c <main>:
> 40047c: 55 push %rbp
> 40047d: 48 89 e5 mov %rsp,%rbp
> 400480: e8 ef ff ff ff callq 400474 <pause>
> 400485: c9 leaveq
> 400486: c3 retq
> 400487: 90 nop
> 400488: 90 nop
> 400489: 90 nop
> 40048a: 90 nop
> 40048b: 90 nop
> 40048c: 90 nop
> 40048d: 90 nop
> 40048e: 90 nop
> 40048f: 90 nop
>
> Case3 - asm volitile("rep;pause" ::: "memory")
> 0000000000400480 <main>:
> 400480: f3 f3 90 pause
> 400483: c3 retq
> 400484: 90 nop
> _______
> Note the difference between opcodes case 1 and case 3, and the mess
> made by the compiler in case 2.
>
> As to benchmarks - I've checked a few things, no formal or lasting
> stuff... but striking at first glance:
>
> 1) At idle, perf top shows time spent in _raw_spin_lock dropping from
> ~35% to ~25%.
> 2) Running a media transcode (single core - handbrakecli): frame rate
> increased by about 5-10%.
> 3) During file-intensive operations (#2, above, or copying large files
> - ext4 on software raid6) - latencytop shows a decerase on writing a
> page to disc from about 120ms to about 90ms.
> --
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
Disregard case 2 - was missing -O3. With -O3 or -O2 rep;nop and pause
are identical. The interesting case is rep;pause which is different and
seems more efficient.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists