[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <12996.1329699216@neuling.org>
Date: Mon, 20 Feb 2012 11:53:36 +1100
From: Michael Neuling <mikey@...ling.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>
cc: Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>,
"H. Peter Anvin" <hpa@...or.com>, x86@...nel.org,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
benh@...nel.crashing.org, anton@...ba.org
Subject: Re: [PATCH 0/2] More i387 state save/restore work
Linus,
> Ok, this is a series of two patches that continue my i387 state
> save/restore series, but aren't necessarily worth it for Linux-3.3.
We have similar lazy save/restore code on powerpc here:
http://lists.ozlabs.org/pipermail/linuxppc-dev/2010-December/087422.html
With your test, it looks like you're getting about a 10% performance
boost. For VSX registers on powerpc we got about 8% with a similar
micro-benchmark. We were a little disappointed it took such a
tailored/synthetic micro-benchmark to get such modest performance
improvements.
> That said, the first one is a bug-fix - but it's an old bug, and I'm not
> sure it can actually be triggered. The failure path for the FP state
> preload is bogus - and always was. But I'm not sure it really *can* fail.
>
> The first one has another small bugfix in it too, and I think that one may
> be new to the rewritten FP state preloading - it doesn't update the
> fpu_counter, so once it starts preloading, it never stops.
>
> I wrote a silly FPU task switch testing program, which basically starts
> two processes pinned to the same CPU, and then uses sched_yield() in both
> to switch back-and-forth between them. *One* of the processes uses the FPU
> between every yield, the other does not. It runs for two seconds, and
> counts how many loops it gets through.
> With that test, I get:
>
> - Plain 3.3-rc4:
>
> [torvalds@i5 ~]$ uname -r
> 3.3.0-rc4
> [torvalds@i5 ~]$ ./a.out ;./a.out ;./a.out ;./a.out ;./a.out ;./a.out ;
> 2216090 loops in 2 seconds
> 2216922 loops in 2 seconds
> 2217148 loops in 2 seconds
> 2232191 loops in 2 seconds
> 2186203 loops in 2 seconds
> 2231614 loops in 2 seconds
>
> - With the first patch that fixes the FPU preloading to eventually stop:
>
> [torvalds@i5 ~]$ uname -r
> 3.3.0-rc4-00001-g704ed737bd3c
> [torvalds@i5 ~]$ ./a.out ;./a.out ;./a.out ;./a.out ;./a.out ;./a.out ;
> 2306667 loops in 2 seconds
> 2295760 loops in 2 seconds
> 2295494 loops in 2 seconds
> 2296282 loops in 2 seconds
> 2282229 loops in 2 seconds
> 2301842 loops in 2 seconds
>
> - With the second patch that does the lazy preloading
>
> [torvalds@i5 ~]$ uname -r
> 3.3.0-rc4-00002-g022899d937f9
> [torvalds@i5 ~]$ ./a.out ;./a.out ;./a.out ;./a.out ;./a.out ;./a.out ;
> 2466973 loops in 2 seconds
> 2456168 loops in 2 seconds
> 2449863 loops in 2 seconds
> 2461588 loops in 2 seconds
> 2478256 loops in 2 seconds
> 2476844 loops in 2 seconds
Does "2476844 loops in 2 seconds" imply 2476844 context switches in 2
sec? With Anton's context_switch [1] benchmark, we don't even hit 100K
context switches per sec.
Do you have this test program anywhere?
Mikey
1. http://ozlabs.org/~anton/junkcode/context_switch.c
> so these things do make some difference. But it is also interesting to see
> from profiles just how expensive setting CR0.TS is (the write to CR0 is
> very expensive indeed), so even when you avoid the FP state restore
> lazily, just setting TS in between task switches is still a big cost of
> FPU save/restore.
>
>
> Linus Torvalds (2):
> i387: use 'restore_fpu_checking()' directly in task switching code
> i387: support lazy restore of FPU state
>
> arch/x86/include/asm/i387.h | 48 +++++++++++++++++++++++++++---------
-
> arch/x86/include/asm/processor.h | 3 +-
> arch/x86/kernel/cpu/common.c | 2 +
> arch/x86/kernel/process_32.c | 2 +-
> arch/x86/kernel/process_64.c | 2 +-
> arch/x86/kernel/traps.c | 40 ++++++-------------------------
> 6 files changed, 49 insertions(+), 48 deletions(-)
>
> Comments? I feel confident enough about these that I thin kthey might even
> work in 3.3, especially the first one. But I want people to look at
> them.
>
> Linus
>
> --
> 1.7.9.188.g12766.dirty
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists