[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3E5A0FA7E9CA944F9D5414FEC6C712200771A972@ORSMSX105.amr.corp.intel.com>
Date: Fri, 25 May 2012 02:47:22 +0000
From: "Yu, Fenghua" <fenghua.yu@...el.com>
To: David Miller <davem@...emloft.net>
CC: "mingo@...e.hu" <mingo@...e.hu>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"hpa@...or.com" <hpa@...or.com>,
"andi@...stfloor.org" <andi@...stfloor.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"x86@...nel.org" <x86@...nel.org>
Subject: RE: [PATCH] x86/copy_user_generic: Optimize copy_user_generic with
CPU erms feature
> From: David Miller [mailto:davem@...emloft.net]
> Sent: Thursday, May 24, 2012 6:50 PM
> From: "Fenghua Yu" <fenghua.yu@...el.com>
> Date: Thu, 24 May 2012 18:19:45 -0700
>
> > According to Intel 64 and IA-32 SDM and Optimization Reference
> > Manual, beginning with Ivybridge, REG string operation using MOVSB
> > and STOSB can provide both flexible and high-performance REG string
> > operations in cases like memory copy. Enhancement availability is
> > indicated by CPUID.7.0.EBX[9] (Enhanced REP MOVSB/ STOSB).
>
> How does the cpu do overlap detection?
>
> If the cpu does overlap detection on sub-pagesize bits, performance
> will unnecessarily suffer under such circumstances.
Are you talking about memory overlap between source and destination? There is no overlap between these two areas in copy_user case because one area is in user space and another one is in kernel space.
In overlap case, it's software that detects overlap and sets backward copy. I don't see backward rep movsb performance degradation from my measurement.
Thanks.
-Fenghua
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists