lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Fri, 10 Jul 2009 15:47:35 +0800
From:	Amerigo Wang <xiyou.wangcong@...il.com>
To:	Randy Dunlap <randy.dunlap@...cle.com>
Cc:	lkml <linux-kernel@...r.kernel.org>,
	torvalds <torvalds@...ux-foundation.org>,
	WANG Cong <amwang@...hat.com>
Subject: Re: [PATCH 1/2] Doc: update Documentation/exception.txt

On Wed, Jul 08, 2009 at 03:02:18PM -0700, Randy Dunlap wrote:
>From: Amerigo Wang <amwang@...hat.com>
>Subject: [RESEND Patch 1/2] Doc: update Documentation/exception.txt
>
>Update Documentation/exception.txt.
>Remove trailing whitespaces in it.
>
>Signed-off-by: WANG Cong <amwang@...hat.com>
>Signed-off-by: Randy Dunlap <randy.dunlap@...cle.com>


Thanks for resending, Randy.

ping Linus...


>---
> Documentation/exception.txt |  202 +++++++++++++++++-----------------
> 1 file changed, 101 insertions(+), 101 deletions(-)
>
>--- linux-2.6.31-rc1-git8.orig/Documentation/exception.txt
>+++ linux-2.6.31-rc1-git8/Documentation/exception.txt
>@@ -1,123 +1,123 @@
>-     Kernel level exception handling in Linux 2.1.8
>+     Kernel level exception handling in Linux
>   Commentary by Joerg Pommnitz <joerg@...eigh.ibm.com>
> 
>-When a process runs in kernel mode, it often has to access user 
>-mode memory whose address has been passed by an untrusted program. 
>+When a process runs in kernel mode, it often has to access user
>+mode memory whose address has been passed by an untrusted program.
> To protect itself the kernel has to verify this address.
> 
>-In older versions of Linux this was done with the 
>-int verify_area(int type, const void * addr, unsigned long size) 
>+In older versions of Linux this was done with the
>+int verify_area(int type, const void * addr, unsigned long size)
> function (which has since been replaced by access_ok()).
> 
>-This function verified that the memory area starting at address 
>+This function verified that the memory area starting at address
> 'addr' and of size 'size' was accessible for the operation specified
>-in type (read or write). To do this, verify_read had to look up the 
>-virtual memory area (vma) that contained the address addr. In the 
>-normal case (correctly working program), this test was successful. 
>+in type (read or write). To do this, verify_read had to look up the
>+virtual memory area (vma) that contained the address addr. In the
>+normal case (correctly working program), this test was successful.
> It only failed for a few buggy programs. In some kernel profiling
> tests, this normally unneeded verification used up a considerable
> amount of time.
> 
>-To overcome this situation, Linus decided to let the virtual memory 
>+To overcome this situation, Linus decided to let the virtual memory
> hardware present in every Linux-capable CPU handle this test.
> 
> How does this work?
> 
>-Whenever the kernel tries to access an address that is currently not 
>-accessible, the CPU generates a page fault exception and calls the 
>-page fault handler 
>+Whenever the kernel tries to access an address that is currently not
>+accessible, the CPU generates a page fault exception and calls the
>+page fault handler
> 
> void do_page_fault(struct pt_regs *regs, unsigned long error_code)
> 
>-in arch/i386/mm/fault.c. The parameters on the stack are set up by 
>-the low level assembly glue in arch/i386/kernel/entry.S. The parameter
>-regs is a pointer to the saved registers on the stack, error_code 
>+in arch/x86/mm/fault.c. The parameters on the stack are set up by
>+the low level assembly glue in arch/x86/kernel/entry_32.S. The parameter
>+regs is a pointer to the saved registers on the stack, error_code
> contains a reason code for the exception.
> 
>-do_page_fault first obtains the unaccessible address from the CPU 
>-control register CR2. If the address is within the virtual address 
>-space of the process, the fault probably occurred, because the page 
>-was not swapped in, write protected or something similar. However, 
>-we are interested in the other case: the address is not valid, there 
>-is no vma that contains this address. In this case, the kernel jumps 
>-to the bad_area label. 
>-
>-There it uses the address of the instruction that caused the exception 
>-(i.e. regs->eip) to find an address where the execution can continue 
>-(fixup). If this search is successful, the fault handler modifies the 
>-return address (again regs->eip) and returns. The execution will 
>+do_page_fault first obtains the unaccessible address from the CPU
>+control register CR2. If the address is within the virtual address
>+space of the process, the fault probably occurred, because the page
>+was not swapped in, write protected or something similar. However,
>+we are interested in the other case: the address is not valid, there
>+is no vma that contains this address. In this case, the kernel jumps
>+to the bad_area label.
>+
>+There it uses the address of the instruction that caused the exception
>+(i.e. regs->eip) to find an address where the execution can continue
>+(fixup). If this search is successful, the fault handler modifies the
>+return address (again regs->eip) and returns. The execution will
> continue at the address in fixup.
> 
> Where does fixup point to?
> 
>-Since we jump to the contents of fixup, fixup obviously points 
>-to executable code. This code is hidden inside the user access macros. 
>-I have picked the get_user macro defined in include/asm/uaccess.h as an
>-example. The definition is somewhat hard to follow, so let's peek at 
>+Since we jump to the contents of fixup, fixup obviously points
>+to executable code. This code is hidden inside the user access macros.
>+I have picked the get_user macro defined in arch/x86/include/asm/uaccess.h
>+as an example. The definition is somewhat hard to follow, so let's peek at
> the code generated by the preprocessor and the compiler. I selected
>-the get_user call in drivers/char/console.c for a detailed examination.
>+the get_user call in drivers/char/sysrq.c for a detailed examination.
> 
>-The original code in console.c line 1405:
>+The original code in sysrq.c line 587:
>         get_user(c, buf);
> 
> The preprocessor output (edited to become somewhat readable):
> 
> (
>-  {        
>-    long __gu_err = - 14 , __gu_val = 0;        
>-    const __typeof__(*( (  buf ) )) *__gu_addr = ((buf));        
>-    if (((((0 + current_set[0])->tss.segment) == 0x18 )  || 
>-       (((sizeof(*(buf))) <= 0xC0000000UL) && 
>-       ((unsigned long)(__gu_addr ) <= 0xC0000000UL - (sizeof(*(buf)))))))        
>+  {
>+    long __gu_err = - 14 , __gu_val = 0;
>+    const __typeof__(*( (  buf ) )) *__gu_addr = ((buf));
>+    if (((((0 + current_set[0])->tss.segment) == 0x18 )  ||
>+       (((sizeof(*(buf))) <= 0xC0000000UL) &&
>+       ((unsigned long)(__gu_addr ) <= 0xC0000000UL - (sizeof(*(buf)))))))
>       do {
>-        __gu_err  = 0;        
>-        switch ((sizeof(*(buf)))) {        
>-          case 1: 
>-            __asm__ __volatile__(        
>-              "1:      mov" "b" " %2,%" "b" "1\n"        
>-              "2:\n"        
>-              ".section .fixup,\"ax\"\n"        
>-              "3:      movl %3,%0\n"        
>-              "        xor" "b" " %" "b" "1,%" "b" "1\n"        
>-              "        jmp 2b\n"        
>-              ".section __ex_table,\"a\"\n"        
>-              "        .align 4\n"        
>-              "        .long 1b,3b\n"        
>+        __gu_err  = 0;
>+        switch ((sizeof(*(buf)))) {
>+          case 1:
>+            __asm__ __volatile__(
>+              "1:      mov" "b" " %2,%" "b" "1\n"
>+              "2:\n"
>+              ".section .fixup,\"ax\"\n"
>+              "3:      movl %3,%0\n"
>+              "        xor" "b" " %" "b" "1,%" "b" "1\n"
>+              "        jmp 2b\n"
>+              ".section __ex_table,\"a\"\n"
>+              "        .align 4\n"
>+              "        .long 1b,3b\n"
>               ".text"        : "=r"(__gu_err), "=q" (__gu_val): "m"((*(struct __large_struct *)
>-                            (   __gu_addr   )) ), "i"(- 14 ), "0"(  __gu_err  )) ; 
>-              break;        
>-          case 2: 
>+                            (   __gu_addr   )) ), "i"(- 14 ), "0"(  __gu_err  )) ;
>+              break;
>+          case 2:
>             __asm__ __volatile__(
>-              "1:      mov" "w" " %2,%" "w" "1\n"        
>-              "2:\n"        
>-              ".section .fixup,\"ax\"\n"        
>-              "3:      movl %3,%0\n"        
>-              "        xor" "w" " %" "w" "1,%" "w" "1\n"        
>-              "        jmp 2b\n"        
>-              ".section __ex_table,\"a\"\n"        
>-              "        .align 4\n"        
>-              "        .long 1b,3b\n"        
>+              "1:      mov" "w" " %2,%" "w" "1\n"
>+              "2:\n"
>+              ".section .fixup,\"ax\"\n"
>+              "3:      movl %3,%0\n"
>+              "        xor" "w" " %" "w" "1,%" "w" "1\n"
>+              "        jmp 2b\n"
>+              ".section __ex_table,\"a\"\n"
>+              "        .align 4\n"
>+              "        .long 1b,3b\n"
>               ".text"        : "=r"(__gu_err), "=r" (__gu_val) : "m"((*(struct __large_struct *)
>-                            (   __gu_addr   )) ), "i"(- 14 ), "0"(  __gu_err  )); 
>-              break;        
>-          case 4: 
>-            __asm__ __volatile__(        
>-              "1:      mov" "l" " %2,%" "" "1\n"        
>-              "2:\n"        
>-              ".section .fixup,\"ax\"\n"        
>-              "3:      movl %3,%0\n"        
>-              "        xor" "l" " %" "" "1,%" "" "1\n"        
>-              "        jmp 2b\n"        
>-              ".section __ex_table,\"a\"\n"        
>-              "        .align 4\n"        "        .long 1b,3b\n"        
>+                            (   __gu_addr   )) ), "i"(- 14 ), "0"(  __gu_err  ));
>+              break;
>+          case 4:
>+            __asm__ __volatile__(
>+              "1:      mov" "l" " %2,%" "" "1\n"
>+              "2:\n"
>+              ".section .fixup,\"ax\"\n"
>+              "3:      movl %3,%0\n"
>+              "        xor" "l" " %" "" "1,%" "" "1\n"
>+              "        jmp 2b\n"
>+              ".section __ex_table,\"a\"\n"
>+              "        .align 4\n"        "        .long 1b,3b\n"
>               ".text"        : "=r"(__gu_err), "=r" (__gu_val) : "m"((*(struct __large_struct *)
>-                            (   __gu_addr   )) ), "i"(- 14 ), "0"(__gu_err)); 
>-              break;        
>-          default: 
>-            (__gu_val) = __get_user_bad();        
>-        }        
>-      } while (0) ;        
>-    ((c)) = (__typeof__(*((buf))))__gu_val;        
>+                            (   __gu_addr   )) ), "i"(- 14 ), "0"(__gu_err));
>+              break;
>+          default:
>+            (__gu_val) = __get_user_bad();
>+        }
>+      } while (0) ;
>+    ((c)) = (__typeof__(*((buf))))__gu_val;
>     __gu_err;
>   }
> );
>@@ -127,12 +127,12 @@ see what code gcc generates:
> 
>  >         xorl %edx,%edx
>  >         movl current_set,%eax
>- >         cmpl $24,788(%eax)        
>- >         je .L1424        
>+ >         cmpl $24,788(%eax)
>+ >         je .L1424
>  >         cmpl $-1073741825,64(%esp)
>- >         ja .L1423                
>+ >         ja .L1423
>  > .L1424:
>- >         movl %edx,%eax                        
>+ >         movl %edx,%eax
>  >         movl 64(%esp),%ebx
>  > #APP
>  > 1:      movb (%ebx),%dl                /* this is the actual user access */
>@@ -149,17 +149,17 @@ see what code gcc generates:
>  > .L1423:
>  >         movzbl %dl,%esi
> 
>-The optimizer does a good job and gives us something we can actually 
>-understand. Can we? The actual user access is quite obvious. Thanks 
>-to the unified address space we can just access the address in user 
>+The optimizer does a good job and gives us something we can actually
>+understand. Can we? The actual user access is quite obvious. Thanks
>+to the unified address space we can just access the address in user
> memory. But what does the .section stuff do?????
> 
> To understand this we have to look at the final kernel:
> 
>  > objdump --section-headers vmlinux
>- > 
>+ >
>  > vmlinux:     file format elf32-i386
>- > 
>+ >
>  > Sections:
>  > Idx Name          Size      VMA       LMA       File off  Algn
>  >   0 .text         00098f40  c0100000  c0100000  00001000  2**4
>@@ -198,18 +198,18 @@ final kernel executable:
> 
> The whole user memory access is reduced to 10 x86 machine instructions.
> The instructions bracketed in the .section directives are no longer
>-in the normal execution path. They are located in a different section 
>+in the normal execution path. They are located in a different section
> of the executable file:
> 
>  > objdump --disassemble --section=.fixup vmlinux
>- > 
>+ >
>  > c0199ff5 <.fixup+10b5> movl   $0xfffffff2,%eax
>  > c0199ffa <.fixup+10ba> xorb   %dl,%dl
>  > c0199ffc <.fixup+10bc> jmp    c017e7a7 <do_con_write+e3>
> 
> And finally:
>  > objdump --full-contents --section=__ex_table vmlinux
>- > 
>+ >
>  >  c01aa7c4 93c017c0 e09f19c0 97c017c0 99c017c0  ................
>  >  c01aa7d4 f6c217c0 e99f19c0 a5e717c0 f59f19c0  ................
>  >  c01aa7e4 080a18c0 01a019c0 0a0a18c0 04a019c0  ................
>@@ -235,8 +235,8 @@ sections in the ELF object file. So the 
> ended up in the .fixup section of the object file and the addresses
>         .long 1b,3b
> ended up in the __ex_table section of the object file. 1b and 3b
>-are local labels. The local label 1b (1b stands for next label 1 
>-backward) is the address of the instruction that might fault, i.e. 
>+are local labels. The local label 1b (1b stands for next label 1
>+backward) is the address of the instruction that might fault, i.e.
> in our case the address of the label 1 is c017e7a5:
> the original assembly code: > 1:      movb (%ebx),%dl
> and linked in vmlinux     : > c017e7a5 <do_con_write+e1> movb   (%ebx),%dl
>@@ -254,7 +254,7 @@ The assembly code
> becomes the value pair
>  >  c01aa7d4 c017c2f6 c0199fe9 c017e7a5 c0199ff5  ................
>                                ^this is ^this is
>-                               1b       3b 
>+                               1b       3b
> c017e7a5,c0199ff5 in the exception table of the kernel.
> 
> So, what actually happens if a fault from kernel mode with no suitable
>@@ -266,9 +266,9 @@ vma occurs?
> 3.) CPU calls do_page_fault
> 4.) do page fault calls search_exception_table (regs->eip == c017e7a5);
> 5.) search_exception_table looks up the address c017e7a5 in the
>-    exception table (i.e. the contents of the ELF section __ex_table) 
>+    exception table (i.e. the contents of the ELF section __ex_table)
>     and returns the address of the associated fault handle code c0199ff5.
>-6.) do_page_fault modifies its own return address to point to the fault 
>+6.) do_page_fault modifies its own return address to point to the fault
>     handle code and returns.
> 7.) execution continues in the fault handling code.
> 8.) 8a) EAX becomes -EFAULT (== -14)
>--
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to majordomo@...r.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ