lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sat, 23 Aug 2008 22:25:31 -0600 (MDT)
From:	jmerkey@...fmountaingroup.com
To:	"Linus Torvalds" <torvalds@...ux-foundation.org>
Cc:	"Jeremy Fitzhardinge" <jeremy@...p.org>,
	"Nick Piggin" <nickpiggin@...oo.com.au>,
	jmerkey@...fmountaingroup.com,
	"Stefan Richter" <stefanr@...6.in-berlin.de>,
	paulmck@...ux.vnet.ibm.com,
	"Peter Zijlstra" <peterz@...radead.org>,
	linux-kernel@...r.kernel.org, "David Howells" <dhowells@...hat.com>
Subject: Re: [ANNOUNCE] mdb: Merkey's Linux Kernel Debugger  2.6.27-rc4 
     released


Results from Analysis of GCC volatile/memory barriers

Use of volatile will produce the results intended in those files which
have shared data elements, but will also result in some cases global data
which is not referenced outside of a file and which has not also been
declared as volatile as being treated as static and optimized
into local variables in some cases.

If volatile is avoided entirely, the compiler appears to make correct
assumptions about whether or not it is in fact global memory references.
My conclusion is that the code generation of gcc appears correct and
in fact does a better job than Microsoft's implementation of shared
data management on SMP systems.

If you use volatile, and use optimization at the same time, the compiler
will take you at your word and potentially optimize global references into
local variables.  This is in fact better than MS cl which will ALWAYS
optimize global data into local if volatile is not used with SMP data within
a single file.

If you choose to use volatile, then you had better use it on every variable
you need shared between processors -- or just leave it out entirely -- and
gcc does appear figure it out code references properly (though some of them
are quite odd).

While this may be counter-intuitive, it makes sense.  When you are using
volatile, you are telling the compiler anything not declared as volatile
witihin a given file is fair game for local optimization if you turn on
optimization at the same time.


Analysis

Code Generation for two atomic_t variables.  One an array and the other
standalone -- macros in kernel includes and their interactions
with the compiler may be the basis of some of these cases.

    atomic_inc(&debuggerActive);
    atomic_inc(&debuggerProcessors[processor]);

    55a0:	f0 ff 05 00 00 00 00 	lock incl 0x0
    55a7:	8d 2c 8d 00 00 00 00 	lea    0x0(,%ecx,4),%ebp
    55ae:	8d 85 00 00 00 00    	lea    0x0(%ebp),%eax
    55b4:	89 04 24             	mov    %eax,(%esp)
    55b7:	f0 ff 85 00 00 00 00 	lock incl 0x0(%ebp)

Although the emitted asssembly is essentially correct, its odd.  two
identical data types, one emitted as a global fixup and the other as
a relative fixup indirected from the stack frame.  This works since
the fixup (substitute for the 0x0) input by the loader is a negative
offset relative to the entire 32 bit address space i.e. lock incl
[ebp-f800XXXX], but its still an odd way to treat an atomic variable.
I would think these would result in an absolute address fixup record
treated some other way than as data referenced from the stack frame.

Code section with mixed volatile declarations

volatile unsigned long ProcessorHold[MAX_PROCESSORS];
unsigned long ProcessorState[MAX_PROCESSORS];

         case 2: /* nmi */
            if (ProcessorHold[processor])  /* hold processor */
            {
               ProcessorHold[processor] = 0;
	       ProcessorState[processor] = PROCESSOR_SUSPEND;

               /* processor suspend loop */
               atomic_inc(&nmiProcessors[processor]);
	       while ((ProcessorState[processor] != PROCESSOR_RESUME) &&
	              (ProcessorState[processor] != PROCESSOR_SWITCH))
               {
	          if ((ProcessorState[processor] == PROCESSOR_RESUME) ||
	              (ProcessorState[processor] == PROCESSOR_SWITCH))
                     break;

                  touch_nmi_watchdog();
                  cpu_relax();
               }
               atomic_dec(&nmiProcessors[processor]);
    56ec:	83 3c b5 00 00 00 00 	cmpl   $0x0,0x0(,%esi,4)
    56f3:	00
    56f4:	74 1b                	je     5711 <debugger_entry+0x17e>
    56f6:	c7 04 b5 00 00 00 00 	movl   $0x0,0x0(,%esi,4)
    56fd:	00 00 00 00
    5701:	f0 ff 85 00 00 00 00 	lock incl 0x0(%ebp)
    5708:	e8 fc ff ff ff       	call   5709 <debugger_entry+0x176>
    570d:	f3 90                	pause
    570f:	eb f7                	jmp    5708 <debugger_entry+0x175>

    // THIS APPEARS BROKEN - THE COMPILER IS TREATING A GLOBAL ARRAY
    // AS LOCAL DATA
    5711:	89 f1                	mov    %esi,%ecx
    5713:	89 da                	mov    %ebx,%edx
    5715:	b8 02 00 00 00       	mov    $0x2,%eax
    571a:	eb 06                	jmp    5722 <debugger_entry+0x18f>
    571c:	89 f1                	mov    %esi,%ecx
    571e:	89 da                	mov    %ebx,%edx
    5720:	89 f8                	mov    %edi,%eax
    5722:	e8 fc ff ff ff       	call   5723 <debugger_entry+0x190>
    5727:	83 3c b5 00 00 00 00 	cmpl   $0x0,0x0(,%esi,4)
    572e:	00
    572f:	75 c5                	jne    56f6 <debugger_entry+0x163>
    5731:	e8 fc ff ff ff       	call   5732 <debugger_entry+0x19f>
    5736:	e8 fc ff ff ff       	call   5737 <debugger_entry+0x1a4>
    573b:	85 c0                	test   %eax,%eax
    573d:	74 0f                	je     574e <debugger_entry+0x1bb>
    573f:	89 f0                	mov    %esi,%eax
    5741:	c1 e0 07             	shl    $0x7,%eax
    5744:	05 00 00 00 00       	add    $0x0,%eax
    5749:	e8 fc ff ff ff       	call   574a <debugger_entry+0x1b7>
    574e:	c7 04 b5 00 00 00 00 	movl   $0x0,0x0(,%esi,4)
    5755:	00 00 00 00
    5759:	8b 04 24             	mov    (%esp),%eax
    575c:	c7 04 b5 00 00 00 00 	movl   $0x1,0x0(,%esi,4)
    5763:	01 00 00 00
    5767:	f0 ff 08             	lock decl (%eax)



Code section without ANY volatile declarations (CODE GENERATION CORRECT)

unsigned long ProcessorHold[MAX_PROCESSORS];
unsigned long ProcessorState[MAX_PROCESSORS];

         case 2: /* nmi */
            if (ProcessorHold[processor])  /* hold processor */
            {
               ProcessorHold[processor] = 0;
	       ProcessorState[processor] = PROCESSOR_SUSPEND;

               /* processor suspend loop */
               atomic_inc(&nmiProcessors[processor]);
	       while ((ProcessorState[processor] != PROCESSOR_RESUME) &&
	              (ProcessorState[processor] != PROCESSOR_SWITCH))
               {
	          if ((ProcessorState[processor] == PROCESSOR_RESUME) ||
	              (ProcessorState[processor] == PROCESSOR_SWITCH))
                     break;

                  touch_nmi_watchdog();
                  cpu_relax();
               }
               atomic_dec(&nmiProcessors[processor]);

Code output from section without ANY volatile declarations

    56f2:	83 3c bd 00 00 00 00 	cmpl   $0x0,0x0(,%edi,4)
    56f9:	00
    56fa:	74 5f                	je     575b <debugger_entry+0x1c8>
    56fc:	c7 04 bd 00 00 00 00 	movl   $0x0,0x0(,%edi,4)
    5703:	00 00 00 00
    5707:	8d b5 00 00 00 00    	lea    0x0(%ebp),%esi
    570d:	c7 04 bd 00 00 00 00 	movl   $0x2,0x0(,%edi,4)
    5714:	02 00 00 00
    5718:	f0 ff 85 00 00 00 00 	lock incl 0x0(%ebp)
    571f:	eb 11                	jmp    5732 <debugger_entry+0x19f>
    5721:	83 f8 03             	cmp    $0x3,%eax
    5724:	74 1d                	je     5743 <debugger_entry+0x1b0>
    5726:	83 f8 07             	cmp    $0x7,%eax
    5729:	74 18                	je     5743 <debugger_entry+0x1b0>
    572b:	e8 fc ff ff ff       	call   572c <debugger_entry+0x199>
    5730:	f3 90                	pause
    5732:	8b 04 bd 00 00 00 00 	mov    0x0(,%edi,4),%eax
    5739:	83 f8 03             	cmp    $0x3,%eax
    573c:	74 05                	je     5743 <debugger_entry+0x1b0>
    573e:	83 f8 07             	cmp    $0x7,%eax
    5741:	75 de                	jne    5721 <debugger_entry+0x18e>
    5743:	f0 ff 0e             	lock decl (%esi)
    5746:	83 3c bd 00 00 00 00 	cmpl   $0x7,0x0(,%edi,4)



Code from section with volatile declarations (CODE GENERATION CORRECT)

volatile unsigned long ProcessorHold[MAX_PROCESSORS];
volatile unsigned long ProcessorState[MAX_PROCESSORS];

         case 2: /* nmi */
            if (ProcessorHold[processor])  /* hold processor */
            {
               ProcessorHold[processor] = 0;
	       ProcessorState[processor] = PROCESSOR_SUSPEND;

               /* processor suspend loop */
               atomic_inc(&nmiProcessors[processor]);
	       while ((ProcessorState[processor] != PROCESSOR_RESUME) &&
	              (ProcessorState[processor] != PROCESSOR_SWITCH))
               {
	          if ((ProcessorState[processor] == PROCESSOR_RESUME) ||
	              (ProcessorState[processor] == PROCESSOR_SWITCH))
                     break;

                  touch_nmi_watchdog();
                  cpu_relax();
               }
               atomic_dec(&nmiProcessors[processor]);


Code Output from section with volatile declarations

    5896:	8b 04 9d 00 00 00 00 	mov    0x0(,%ebx,4),%eax
    589d:	85 c0                	test   %eax,%eax
    589f:	74 73                	je     5914 <debugger_entry+0x1f7>
    58a1:	c7 04 9d 00 00 00 00 	movl   $0x0,0x0(,%ebx,4)
    58a8:	00 00 00 00
    58ac:	8d bd 00 00 00 00    	lea    0x0(%ebp),%edi
    58b2:	c7 04 9d 00 00 00 00 	movl   $0x2,0x0(,%ebx,4)
    58b9:	02 00 00 00
    58bd:	f0 ff 85 00 00 00 00 	lock incl 0x0(%ebp)
    58c4:	eb 1f                	jmp    58e5 <debugger_entry+0x1c8>
    58c6:	8b 04 9d 00 00 00 00 	mov    0x0(,%ebx,4),%eax
    58cd:	83 f8 03             	cmp    $0x3,%eax
    58d0:	74 2b                	je     58fd <debugger_entry+0x1e0>
    58d2:	8b 04 9d 00 00 00 00 	mov    0x0(,%ebx,4),%eax
    58d9:	83 f8 07             	cmp    $0x7,%eax
    58dc:	74 1f                	je     58fd <debugger_entry+0x1e0>
    58de:	e8 fc ff ff ff       	call   58df <debugger_entry+0x1c2>
    58e3:	f3 90                	pause
    58e5:	8b 04 9d 00 00 00 00 	mov    0x0(,%ebx,4),%eax
    58ec:	83 f8 03             	cmp    $0x3,%eax
    58ef:	74 0c                	je     58fd <debugger_entry+0x1e0>
    58f1:	8b 04 9d 00 00 00 00 	mov    0x0(,%ebx,4),%eax
    58f8:	83 f8 07             	cmp    $0x7,%eax
    58fb:	75 c9                	jne    58c6 <debugger_entry+0x1a9>
    58fd:	f0 ff 0f             	lock decl (%edi)
    5900:	8b 04 9d 00 00 00 00 	mov    0x0(,%ebx,4),%eax



Code from sections without volatile declaration using wmb()/rmb()
                                     (CODE GENERATION CORRECT)

   for (i=0; i < MAX_PROCESSORS; i++)
   {
      if (ProcessorState[i] != PROCESSOR_HOLD)
      {
         wmb();
         ProcessorState[i] = PROCESSOR_RESUME;
      }
   }

unsigned long ProcessorHold[MAX_PROCESSORS];
unsigned long ProcessorState[MAX_PROCESSORS];

         case 2: /* nmi */
            if (ProcessorHold[processor])  /* hold processor */
            {
               ProcessorHold[processor] = 0;
	       ProcessorState[processor] = PROCESSOR_SUSPEND;

               /* processor suspend loop */
               atomic_inc(&nmiProcessors[processor]);
	       while ((ProcessorState[processor] != PROCESSOR_RESUME) &&
	              (ProcessorState[processor] != PROCESSOR_SWITCH))
               {
                  rmb();
	          if ((ProcessorState[processor] == PROCESSOR_RESUME) ||
	              (ProcessorState[processor] == PROCESSOR_SWITCH))
                     break;

                  touch_nmi_watchdog();
                  cpu_relax();
               }
               atomic_dec(&nmiProcessors[processor]);

Code output from sections without volatile declaration using wmb()/rmb()

    56fa:	83 3c b5 00 00 00 00 	cmpl   $0x0,0x0(,%esi,4)
    5701:	00
    5702:	74 6b                	je     576f <debugger_entry+0x1d7>
    5704:	c7 04 b5 00 00 00 00 	movl   $0x0,0x0(,%esi,4)
    570b:	00 00 00 00
    570f:	8d bd 00 00 00 00    	lea    0x0(%ebp),%edi
    5715:	c7 04 b5 00 00 00 00 	movl   $0x2,0x0(,%esi,4)
    571c:	02 00 00 00
    5720:	f0 ff 85 00 00 00 00 	lock incl 0x0(%ebp)
    5727:	eb 1d                	jmp    5746 <debugger_entry+0x1ae>
    5729:	f0 83 04 24 00       	lock addl $0x0,(%esp)
    572e:	8b 04 b5 00 00 00 00 	mov    0x0(,%esi,4),%eax
    5735:	83 f8 03             	cmp    $0x3,%eax
    5738:	74 1d                	je     5757 <debugger_entry+0x1bf>
    573a:	83 f8 07             	cmp    $0x7,%eax
    573d:	74 18                	je     5757 <debugger_entry+0x1bf>
    573f:	e8 fc ff ff ff       	call   5740 <debugger_entry+0x1a8>
    5744:	f3 90                	pause
    5746:	8b 04 b5 00 00 00 00 	mov    0x0(,%esi,4),%eax
    574d:	83 f8 03             	cmp    $0x3,%eax
    5750:	74 05                	je     5757 <debugger_entry+0x1bf>
    5752:	83 f8 07             	cmp    $0x7,%eax
    5755:	75 d2                	jne    5729 <debugger_entry+0x191>
    5757:	f0 ff 0f             	lock decl (%edi)
    575a:	83 3c b5 00 00 00 00 	cmpl   $0x7,0x0(,%esi,4)
    5761:	07
    5762:	75 21                	jne    5785 <debugger_entry+0x1ed>
    5764:	89 f1                	mov    %esi,%ecx

000001e1 <FreeProcessorsExclSelf>:
     1e1:	31 c0                	xor    %eax,%eax
     1e3:	83 3c 85 00 00 00 00 	cmpl   $0x8,0x0(,%eax,4)
     1ea:	08
     1eb:	74 10                	je     1fd <FreeProcessorsExclSelf+0x1c>
     1ed:	f0 83 04 24 00       	lock addl $0x0,(%esp)
     1f2:	c7 04 85 00 00 00 00 	movl   $0x3,0x0(,%eax,4)
     1f9:	03 00 00 00
     1fd:	40                   	inc    %eax
     1fe:	83 f8 08             	cmp    $0x8,%eax
     201:	75 e0                	jne    1e3 <FreeProcessorsExclSelf+0x2>
     203:	c3                   	ret



Code from sections without volatile declaration using barrier()
                                       (CODE GENERATION CORRECT)

   for (i=0; i < MAX_PROCESSORS; i++)
   {
      if (ProcessorState[i] != PROCESSOR_HOLD)
      {
         barrier();
         ProcessorState[i] = PROCESSOR_RESUME;
      }
   }

unsigned long ProcessorHold[MAX_PROCESSORS];
unsigned long ProcessorState[MAX_PROCESSORS];

         case 2: /* nmi */
            if (ProcessorHold[processor])  /* hold processor */
            {
               ProcessorHold[processor] = 0;
	       ProcessorState[processor] = PROCESSOR_SUSPEND;

               /* processor suspend loop */
               atomic_inc(&nmiProcessors[processor]);
	       while ((ProcessorState[processor] != PROCESSOR_RESUME) &&
	              (ProcessorState[processor] != PROCESSOR_SWITCH))
               {
                  barrier();
	          if ((ProcessorState[processor] == PROCESSOR_RESUME) ||
	              (ProcessorState[processor] == PROCESSOR_SWITCH))
                     break;

                  touch_nmi_watchdog();
                  cpu_relax();
               }
               atomic_dec(&nmiProcessors[processor]);

Code output from sections without volatile declaration using barrier()

    56f5:	83 3c b5 00 00 00 00 	cmpl   $0x0,0x0(,%esi,4)
    56fc:	00
    56fd:	74 66                	je     5765 <debugger_entry+0x1d2>
    56ff:	c7 04 b5 00 00 00 00 	movl   $0x0,0x0(,%esi,4)
    5706:	00 00 00 00
    570a:	8d bd 00 00 00 00    	lea    0x0(%ebp),%edi
    5710:	c7 04 b5 00 00 00 00 	movl   $0x2,0x0(,%esi,4)
    5717:	02 00 00 00
    571b:	f0 ff 85 00 00 00 00 	lock incl 0x0(%ebp)
    5722:	eb 18                	jmp    573c <debugger_entry+0x1a9>
    5724:	8b 04 b5 00 00 00 00 	mov    0x0(,%esi,4),%eax
    572b:	83 f8 03             	cmp    $0x3,%eax
    572e:	74 1d                	je     574d <debugger_entry+0x1ba>
    5730:	83 f8 07             	cmp    $0x7,%eax
    5733:	74 18                	je     574d <debugger_entry+0x1ba>
    5735:	e8 fc ff ff ff       	call   5736 <debugger_entry+0x1a3>
    573a:	f3 90                	pause
    573c:	8b 04 b5 00 00 00 00 	mov    0x0(,%esi,4),%eax
    5743:	83 f8 03             	cmp    $0x3,%eax
    5746:	74 05                	je     574d <debugger_entry+0x1ba>
    5748:	83 f8 07             	cmp    $0x7,%eax
    574b:	75 d7                	jne    5724 <debugger_entry+0x191>
    574d:	f0 ff 0f             	lock decl (%edi)
    5750:	83 3c b5 00 00 00 00 	cmpl   $0x7,0x0(,%esi,4)
    5757:	07

000001e1 <FreeProcessorsExclSelf>:
     1e1:	31 c0                	xor    %eax,%eax
     1e3:	83 3c 85 00 00 00 00 	cmpl   $0x8,0x0(,%eax,4)
     1ea:	08
     1eb:	74 0b                	je     1f8 <FreeProcessorsExclSelf+0x17>
     1ed:	c7 04 85 00 00 00 00 	movl   $0x3,0x0(,%eax,4)
     1f4:	03 00 00 00
     1f8:	40                   	inc    %eax
     1f9:	83 f8 08             	cmp    $0x8,%eax
     1fc:	75 e5                	jne    1e3 <FreeProcessorsExclSelf+0x2>
     1fe:	c3                   	ret


Jeff

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ