linux-kernel - RE: [PATCH RFC] [x86] Optimize copy-page by reducing impact from HW prefetch

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <C10D3FB0CD45994C8A51FEC1227CE22F27124E9914@shsmsx502.ccr.corp.intel.com>
Date:	Fri, 1 Jul 2011 16:10:37 +0800
From:	"Ma, Ling" <ling.ma@...el.com>
To:	"Ma, Ling" <ling.ma@...el.com>, Ingo Molnar <mingo@...e.hu>,
	Andi Kleen <andi@...stfloor.org>
CC:	"hpa@...or.com" <hpa@...or.com>,
	"tglx@...utronix.de" <tglx@...utronix.de>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH RFC] [x86] Optimize copy-page by reducing impact from HW
 prefetch

Forget to append experiment data:

1. We copy 4096 bytes for 32 times on snb, and extract minimum execution time 
On hot cache case: 
  Copy_page          copy_page_c 
  482 cycles          350 cycles

2. the same routine with hot-caches, but before each execution we copy 512k data to push original data out of L1 &L2.
On cold cache case:
  copy_page(with prefetch)    copy_page(without prefetch)      copy_page_c
   853~873 cycles                  1037~1051 cycles            959~976 cycles 

Thanks
Ling

> -----Original Message-----
> From: Ma, Ling
> Sent: Tuesday, June 28, 2011 11:24 PM
> To: 'Ingo Molnar'; Andi Kleen
> Cc: hpa@...or.com; tglx@...utronix.de; linux-kernel@...r.kernel.org
> Subject: RE: [PATCH RFC] [x86] Optimize copy-page by reducing impact
> from HW prefetch
> 
> Hi Ingo
> 
> > Ling, mind double checking which one is the faster/better one on SNB,
> > in cold-cache and hot-cache situations, copy_page or copy_page_c?
> Copy_page_c
> on hot-cache copy_page_c on SNB combines data to 128bit (processor
> limit 128bit/cycle for write) after startup latency
> so it is faster than copy_page which provides 64bit/cycle for write.
> 
> on cold-cache copy_page_c doesn't use prefetch, which uses prfetch
> according to copy size,
> so copy_page function is better.
> 
> Thanks
> Ling

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/