lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110623070448.GA25707@elte.hu>
Date:	Thu, 23 Jun 2011 09:04:48 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Andi Kleen <andi@...stfloor.org>
Cc:	ling.ma@...el.com, hpa@...or.com, tglx@...utronix.de,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH RFC] [x86] Optimize copy-page by reducing impact from HW
 prefetch


* Andi Kleen <andi@...stfloor.org> wrote:

> ling.ma@...el.com writes:
>
> > impact(DCU prefetcher), and simplify original code. The 
> > performance is improved about 15% on core2, 36% on snb 
> > respectively. (We use our micro-benchmark, and will do further 
> > test according to your requirment)
> 
> This doesn't make a lot of sense because neither Core-2 nor SNB use 
> the code path you patched. They all use the rep ; movs path

Ling, mind double checking which one is the faster/better one on SNB, 
in cold-cache and hot-cache situations, copy_page or copy_page_c?

Also, while looking at this file please fix the countless pieces of 
style excrements it has before modifying it:

 - non-Linux comment style (and needless two comments - it can 
   be in one comment block):

  /* Don't use streaming store because it's better when the target
     ends up in cache. */
            
  /* Could vary the prefetch distance based on SMP/UP */

 - (there's other non-standard comment blocks in this file as well)

 - The copy_page/copy_page_c naming is needlessly obfuscated, it 
   should be copy_page, copy_page_norep or so - the _c postfix has no
   obvious meaning.

 - all #include's should be at the top

 - please standardize it on the 'instrn %x, %y' pattern that we 
   generally use in arch/x86/, not 'instrn %x,%y' pattern.

and do this cleanup patch first and the speedup on top of it, and 
keep the two in two separate patches so that the modification to the 
assembly code can be reviewed more easily.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ