linux-kernel - AMD dual core opetron optimization

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <f69849430704301213l5283b949y53c48c36087969b5@mail.gmail.com>
Date:	Mon, 30 Apr 2007 12:13:07 -0700
From:	"kernel coder" <lhrkernelcoder@...il.com>
To:	linux-kernel@...r.kernel.org
Subject: AMD dual core opetron optimization

hi,

I'm doing trying to write some optimized code  for AMD dual core
opetron processor.But things are getting no where.I've installed
Fedora 5 with 2.6 series Linux kernel and 4 series GCC

Following are few lines of code which are consuming close to 100
cycles.Yes this is not the forum for such questions but i think people
on linux kernel and GCC are best to answer such type of questions.I'm
realy getting frustated and helpless ,that's why i've put question on
this forum.

/*******************************************/
/* these variables will be used for RDTSC instrucion */
uint64_t before, overhead, clocks;


/*ReadTsc funtion is given below */
before=ReadTsc();
before=ReadTsc();
before=ReadTsc();

overhead=ReadTsc()-before;

printf(" ReadTSC overerhead is %lu ",overhead);


unsigned int test;
unsigned long buffer [128];
buffer[12]=08;
buffer[13]=00;
buffer[23]=06;

/* starting cycles */

before=ReadTsc();

/**************************Start of Targeted code
******************************/

 test= buffer[12] | buffer[13] | buffer[23] ;

 switch( test )

 {

case  12:
                asm(" jmp proc_1");

case  13:
                 asm("jmp proc_2");
case  14:
                 asm("jmp proc_3");
case  15:
                 asm("jmp proc_4");
default :
             asm("jmp proc_5");

}



asm(" proc_5:");

/**************************End of Targeted code ******************************/
           /*current cycles */
           clocks=ReadTsc() ;

         clocks=clocks - before;
         printf("\n cycles consumed %lu \n",clocks - overhead);

/**********************************/

The overhead varies from generally 360  to 395 cycles .Sometimes it
also reduces close to 270 cycles.

Cycles consumed by the targetd code varies from 20 to 100
cycles.Theoratically i thing cycles consumed should be less than
20.Then why so many cycles  ? and the output vary from 20 to 100
cycles .Sometimes it crosses 100 cycles as well.

Sometimes the cycles consumed by targetted code become far less that
the RDTSC instrucion overhead.

Is there better way to write above code.I even used the prefetch
instruction  before the targeted code to make sure that buffer is in
the L1 cache but no success.

The code for ReadTsc() is as follows.Please also tell me if its
correct way to measure cycles .


/*****************************************/

 typedef long long __int64;

 __int64 ReadTSC() {
   int res[2];                              // store 64 bit result here
   #if defined(__GNUC__) && !defined(__INTEL_COMPILER)
   // Inline assembly in AT&T syntax
   #if defined (_LP64)                      // 64 bit mode
      __asm__ __volatile__  (               // serialize (save rbx)
      "xorl %%eax,%%eax \n push %%rbx \n cpuid \n"
       ::: "%rax", "%rcx", "%rdx");
      __asm__ __volatile__  (               // read TSC, store edx:eax in res
      "rdtsc\n"
       : "=a" (res[0]), "=d" (res[1]) );
      __asm__ __volatile__  (               // serialize again
      "xorl %%eax,%%eax \n cpuid \n pop %%rbx \n"
       ::: "%rax", "%rcx", "%rdx");
   #else                                    // 32 bit mode
      __asm__ __volatile__  (               // serialize (save ebx)
      "xorl %%eax,%%eax \n pushl %%ebx \n cpuid \n"
       ::: "%eax", "%ecx", "%edx");
      __asm__ __volatile__  (               // read TSC, store edx:eax in res
      "rdtsc\n"
       : "=a" (res[0]), "=d" (res[1]) );
      __asm__ __volatile__  (               // serialize again
      "xorl %%eax,%%eax \n cpuid \n popl %%ebx \n"
       ::: "%eax", "%ecx", "%edx");
   #endif
   #else
   // Inline assembly in MASM syntax
      __asm {
         xor eax, eax
         cpuid                              // serialize
         rdtsc                              // read TSC
         mov dword ptr res, eax             // store low dword in res[0]
         mov dword ptr res+4, edx           // store high dword in res[1]
         xor eax, eax
         cpuid                              // serialize again
      };
   #endif   // __GNUC__
   return *(__int64*)res;                   // return result
 }


/*****************************************************************/


thanks,
shahzad
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/