lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Date:	Wed, 28 Oct 2009 14:09:26 +0800
From:	"Ma, Ling" <ling.ma@...el.com>
To:	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: FW: Re: [PATCH RFC] [X86] performance improvement for memcpy_64.S
 by avoid memory miss predication.

Hi Ingo
There are another test cases we need to do or comments?

Best Regards
Ma Ling

________________________________________
From: Ma, Ling 
Sent: 2009年10月26日 16:26
To: 'mingo@...e.hu'
Cc: 'hpa@...or.com'; 'tglx@...utronix.de'; 'linux-kernel@...r.kernel.org'
Subject: Re: [PATCH RFC] [X86] performance improvement for memcpy_64.S by avoid memory miss predication.


We generate new report for another case when src offset is 0x45010, dst is 0x34020.
by 'perf stat --repeat 10 ./static_rsi_45010_rdi_34020_old/new' .
 
The test program I wrote:
 for (i = 64; i < 4096 *4; i ++)
      do_memcpy(src, dst, i);
 
 
Before the patch:
Performance counter stats for './static_rsi_45010_rdi_34020_old' (10 runs):          
                                                                                     
  54014.766012  task-clock-msecs         #      0.999 CPUs    ( +-   0.016% )        
             80  context-switches         #      0.000 M/sec   ( +-   7.894% )        
              0  CPU-migrations           #      0.000 M/sec   ( +-  66.667% )        
          4429  page-faults              #      0.000 M/sec   ( +-   0.002% )        
 136855571663  cycles                   #   2533.670 M/sec   ( +-   0.016% )        
  44524796868  instructions             #      0.325 IPC     ( +-   0.008% )        
        771000  cache-references         #      0.014 M/sec   ( +-  10.397% )        
        541785  cache-misses             #      0.010 M/sec   ( +-   4.203% )        
                                                                                     
  54.062799203  seconds time elapsed   ( +-   0.021% )                               
                                                                                     
After the patch                                                                                     
Performance counter stats for './static_rsi_45010_rdi_34020_new' (10 runs):          
                                                                                     
   7570.357661  task-clock-msecs         #      0.999 CPUs    ( +-   0.350% )        
            13  context-switches         #      0.000 M/sec   ( +-   9.320% )        
             0  CPU-migrations           #      0.000 M/sec   ( +-     nan% )        
         4429  page-faults              #      0.001 M/sec   ( +-   0.004% )        
 19180782064  cycles                   #   2533.669 M/sec   ( +-   0.349% )        
 44462001104  instructions             #      2.318 IPC     ( +-   0.001% )        
       383673  cache-references         #      0.051 M/sec   ( +-   4.112% )        
       317436  cache-misses             #      0.042 M/sec   ( +-   1.607% )        
                                                                                     
   7.581541785  seconds time elapsed   ( +-   0.343% )      
                          
The patch got performance improvement 54.062799203/ 7.581541785  = 7.13x.
If you need any other test reports, please let me know
 
Thanks
Ma Ling


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ