lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <817ecb6f1003061144r3defbe0eg74e23e0986f8e42d@mail.gmail.com>
Date:	Sat, 6 Mar 2010 14:44:47 -0500
From:	Siarhei Liakh <sliakh.lkml@...il.com>
To:	Ingo Molnar <mingo@...e.hu>
Cc:	"H. Peter Anvin" <hpa@...or.com>, mingo@...hat.com,
	jmorris@...ei.org, linux-kernel@...r.kernel.org,
	arjan@...ux.intel.com, tglx@...utronix.de, jiang@...ncsu.edu,
	linux-tip-commits@...r.kernel.org
Subject: Re: [tip:x86/mm] x86, mm: NX protection for kernel data

On Mon, Feb 22, 2010 at 12:21 PM, Ingo Molnar <mingo@...e.hu> wrote:
>
> * H. Peter Anvin <hpa@...or.com> wrote:
>
>> On 02/22/2010 03:01 AM, Ingo Molnar wrote:
>> >>
>> >>> Commit-ID:  01ab31371da90a795b774d87edf2c21bb3a64dda
>> >>> Gitweb:     http://git.kernel.org/tip/01ab31371da90a795b774d87edf2c21bb3a64dda
>> >>> Author:     Siarhei Liakh <sliakh.lkml@...il.com>
>> >>> AuthorDate: Sun, 31 Jan 2010 18:27:55 -0500
>> >>> Committer:  H. Peter Anvin <hpa@...or.com>
>> >>> CommitDate: Wed, 17 Feb 2010 10:11:24 -0800
>> >>>
>> >>> x86, mm: NX protection for kernel data
>> >>>
>> >>> This patch expands functionality of CONFIG_DEBUG_RODATA to set main
>> >>> (static) kernel data area as NX.
>> >>
>> >> -tip testing is seeing boot hangs along the lines of:
>> >>
>> >> [   15.568108] EXT3-fs (sda1): recovery complete
>> >> [   15.573064] EXT3-fs (sda1): mounted filesystem with ordered data mode
>> >> [   15.580313] VFS: Mounted root (ext3 filesystem) readonly on device 8:1.
>> >> [   15.584021] async_waiting @ 1
>> >> [   15.588008] async_continuing @ 1 after 0 usec
>> >> [   15.592163] Freeing unused kernel memory: 540k freed
>> >> [   15.600126] NX-protecting the kernel data: c15ab000, 2919 pages
>> >>
>> >> which i suspect could be due to the commit above.
>> >
>> > Yep, that's confirmed now, applying these 3 reverts makes it boot fine:
>> >
>> > 833e0ca: Revert "x86, mm: NX protection for kernel data"
>> > ce4b6b4: Revert "x86: RO/NX protection for loadable kernel modules"
>> > e357312: Revert "module: fix () used as prototype in include/linux/module.h"
>> >
>>
>> Given that e357312 is a () -> (void) prototype fix, is hardly seems
>> likely to be at fault.  The RO/NX stuff, on the other hand, make a lot
>> of sense.
>
> Yes, i reverted e357312 because it was a dependent change.

I was able to narrow down the issue to spinlock debugging. More
specifically, DEBUG_SPINLOCK=y seem to be somehow incompatible with
kernel's RW-data being NX.

Crash/nocrash config diff:
============================================
diff -uNr config.tip.crash config.tip.nocrash
--- config.tip.crash	2010-03-05 22:43:01.000000000 -0500
+++ config.tip.nocrash	2010-03-06 01:38:00.000000000 -0500
@@ -1,7 +1,7 @@
 #
 # Automatically generated make config: don't edit
 # Linux kernel version: 2.6.33
-# Fri Mar  5 22:22:10 2010
+# Sat Mar  6 01:22:32 2010
 #
 # CONFIG_64BIT is not set
 CONFIG_X86_32=y
@@ -219,27 +219,27 @@
 # CONFIG_INLINE_SPIN_LOCK_BH is not set
 # CONFIG_INLINE_SPIN_LOCK_IRQ is not set
 # CONFIG_INLINE_SPIN_LOCK_IRQSAVE is not set
-# CONFIG_INLINE_SPIN_UNLOCK is not set
+CONFIG_INLINE_SPIN_UNLOCK=y
 # CONFIG_INLINE_SPIN_UNLOCK_BH is not set
-# CONFIG_INLINE_SPIN_UNLOCK_IRQ is not set
+CONFIG_INLINE_SPIN_UNLOCK_IRQ=y
 # CONFIG_INLINE_SPIN_UNLOCK_IRQRESTORE is not set
 # CONFIG_INLINE_READ_TRYLOCK is not set
 # CONFIG_INLINE_READ_LOCK is not set
 # CONFIG_INLINE_READ_LOCK_BH is not set
 # CONFIG_INLINE_READ_LOCK_IRQ is not set
 # CONFIG_INLINE_READ_LOCK_IRQSAVE is not set
-# CONFIG_INLINE_READ_UNLOCK is not set
+CONFIG_INLINE_READ_UNLOCK=y
 # CONFIG_INLINE_READ_UNLOCK_BH is not set
-# CONFIG_INLINE_READ_UNLOCK_IRQ is not set
+CONFIG_INLINE_READ_UNLOCK_IRQ=y
 # CONFIG_INLINE_READ_UNLOCK_IRQRESTORE is not set
 # CONFIG_INLINE_WRITE_TRYLOCK is not set
 # CONFIG_INLINE_WRITE_LOCK is not set
 # CONFIG_INLINE_WRITE_LOCK_BH is not set
 # CONFIG_INLINE_WRITE_LOCK_IRQ is not set
 # CONFIG_INLINE_WRITE_LOCK_IRQSAVE is not set
-# CONFIG_INLINE_WRITE_UNLOCK is not set
+CONFIG_INLINE_WRITE_UNLOCK=y
 # CONFIG_INLINE_WRITE_UNLOCK_BH is not set
-# CONFIG_INLINE_WRITE_UNLOCK_IRQ is not set
+CONFIG_INLINE_WRITE_UNLOCK_IRQ=y
 # CONFIG_INLINE_WRITE_UNLOCK_IRQRESTORE is not set
 # CONFIG_MUTEX_SPIN_ON_OWNER is not set
 CONFIG_FREEZER=y
@@ -331,7 +331,7 @@
 CONFIG_FLAT_NODE_MEM_MAP=y
 CONFIG_SPARSEMEM_STATIC=y
 CONFIG_PAGEFLAGS_EXTENDED=y
-CONFIG_SPLIT_PTLOCK_CPUS=999999
+CONFIG_SPLIT_PTLOCK_CPUS=4
 CONFIG_PHYS_ADDR_T_64BIT=y
 CONFIG_ZONE_DMA_FLAG=1
 CONFIG_BOUNCE=y
@@ -2808,16 +2808,12 @@
 CONFIG_DEBUG_RT_MUTEXES=y
 CONFIG_DEBUG_PI_LIST=y
 CONFIG_RT_MUTEX_TESTER=y
-CONFIG_DEBUG_SPINLOCK=y
+# CONFIG_DEBUG_SPINLOCK is not set
 CONFIG_DEBUG_MUTEXES=y
-CONFIG_DEBUG_LOCK_ALLOC=y
-CONFIG_PROVE_LOCKING=y
-# CONFIG_PROVE_RCU is not set
-CONFIG_LOCKDEP=y
+# CONFIG_DEBUG_LOCK_ALLOC is not set
+# CONFIG_PROVE_LOCKING is not set
 # CONFIG_LOCK_STAT is not set
-CONFIG_DEBUG_LOCKDEP=y
-CONFIG_TRACE_IRQFLAGS=y
-CONFIG_DEBUG_SPINLOCK_SLEEP=y
+# CONFIG_DEBUG_SPINLOCK_SLEEP is not set
 CONFIG_DEBUG_LOCKING_API_SELFTESTS=y
 CONFIG_STACKTRACE=y
 # CONFIG_DEBUG_KOBJECT is not set
============================================

Kernel crash dump:
============================================
[    2.844000] EXT3-fs (sda1): warning: maximal mount count reached,
running e2fsck is recommended
[    2.848000] EXT3-fs (sda1): using internal journal
[    2.849556] EXT3-fs (sda1): recovery complete
[    2.852000] EXT3-fs (sda1): mounted filesystem with ordered data mode
[    2.854168] VFS: Mounted root (ext3 filesystem) on device 8:1.
[    2.856000] Freeing unused kernel memory (init): 540k freed
[    2.857056] NX-protecting the kernel data: 0xc15b3000 - 0xc1834000, 641 pages
[    2.860328] do_page_fault - entry
[    2.862554] do_page_fault: 0xc17ebdb8
[    2.864000] do_page_fault - kernel space
[    2.864000] do_page_fault - about to call bad_area_nosemaphore()
[    2.864000] BUG: unable to handle kernel paging request at c17ebdb8
[    2.864000] IP: [<c12609f7>] do_raw_spin_unlock+0x5e/0x71
[    2.864000] *pdpt = 00000000018c0001 *pde = 80000000016001e1
[    2.864000] Oops: 0003 [#1] SMP
[    2.864000] last sysfs file:
[    2.864000] Modules linked in:
[    2.864000]
[    2.864000] Pid: 1, comm: swapper Not tainted 2.6.33-tip+ #41 /
[    2.864000] EIP: 0060:[<c12609f7>] EFLAGS: 00010046 CPU: 0
[    2.864000] EIP is at do_raw_spin_unlock+0x5e/0x71
[    2.864000] EAX: 00000000 EBX: c17ebdac ECX: 00000001 EDX: 00000c0b
[    2.864000] ESI: 00000246 EDI: c18c0058 EBP: f780fe14 ESP: f780fe10
[    2.864000]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[    2.864000] Process swapper (pid: 1, ti=f780f000 task=f7826000
task.ti=f780f000)
[    2.864000] Stack:
[    2.864000]  c17ebdac f780fe24 c15ad3f2 00000000 00000000 f780ff18
c1017a57 00000000
[    2.864000] <0> 016001e3 00000000 016001e3 f77a8004 00000001
00000000 00000163 80000000
[    2.864000] <0> 00000000 ffffffff ffffffff 80000000 000001e1
80000000 00000000 80000000
[    2.864000] Call Trace:
[    2.864000]  [<c15ad3f2>] ? _raw_spin_unlock_irqrestore+0x20/0x3c
[    2.864000]  [<c1017a57>] ? __change_page_attr_set_clr+0x65c/0x945
[    2.864000]  [<c1092245>] ? vm_unmap_aliases+0x17b/0x186
[    2.864000]  [<c15b3000>] ? _etext+0x0/0x24
[    2.864000]  [<c1017eb4>] ? change_page_attr_set_clr+0x174/0x312
[    2.864000]  [<c15b3000>] ? _etext+0x0/0x24
[    2.864000]  [<c10182d1>] ? set_memory_nx+0x2d/0x32
[    2.864000]  [<c10163ab>] ? mark_nxdata_nx+0x37/0x41
[    2.864000]  [<c15b3000>] ? _etext+0x0/0x24
[    2.864000]  [<c1834000>] ? i386_start_kernel+0x0/0xaa
[    2.864000]  [<c101649d>] ? free_initmem+0x1c/0x1e
[    2.864000]  [<c1001148>] ? init_post+0xd/0x121
[    2.864000]  [<c1834401>] ? kernel_init+0x1d5/0x1df
[    2.864000]  [<c183422c>] ? kernel_init+0x0/0x1df
[    2.864000]  [<c1002e66>] ? kernel_thread_helper+0x6/0x10
[    2.864000] Code: 54 8b c1 39 43 0c 74 0c ba 74 e1 73 c1 89 d8 e8
31 ff ff ff 64 a1 d8 6b 8b c1 39 43 08 74 0c ba 80 e1 73 c1 89 d8 e8
1a ff ff ff <c7> 43 0c ff ff ff ff c7 43 08 ff ff ff ff fe 03 5b 5d c3
55 89
[    2.864000] EIP: [<c12609f7>] do_raw_spin_unlock+0x5e/0x71 SS:ESP
0068:f780fe10
[    2.864000] CR2: 00000000c17ebdb8
[    2.864000] ---[ end trace 0d94f53e9dfe82f9 ]---
[    2.948071] swapper used greatest stack depth: 1804 bytes left
[    2.952000] Kernel panic - not syncing: Attempted to kill init!
============================================

looking for c17ebdb8 in system.map points to a location in pgd_lock:
============================================
$grep c17ebd System.map
c17ebd68 d bios_check_work
c17ebda8 d highmem_pages
c17ebdac D pgd_lock
c17ebdc8 D pgd_list
c17ebdd0 D show_unhandled_signals
c17ebdd4 d cpa_lock
c17ebdf0 d memtype_lock
============================================

I've looked at the lock debugging and could not find any place that
would look like an attempt to execute data. This would lead me to
think that calling set_memory_nx from kernel_init somehow confuses the
lock debugging subsystem, or set_memory_nx does not change page
attributes in a safe manner (for example when a lock is stored inside
the page whose attributes are being changed).

Any suggestions on how should I proceed forward in troubleshooting this issue?

Thank you.

View attachment "config.tip.crash" of type "text/x-apport" (70743 bytes)

Download attachment "config.tip.nocrash" of type "application/octet-stream" (70622 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ