[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <817ecb6f1003061144r3defbe0eg74e23e0986f8e42d@mail.gmail.com>
Date: Sat, 6 Mar 2010 14:44:47 -0500
From: Siarhei Liakh <sliakh.lkml@...il.com>
To: Ingo Molnar <mingo@...e.hu>
Cc: "H. Peter Anvin" <hpa@...or.com>, mingo@...hat.com,
jmorris@...ei.org, linux-kernel@...r.kernel.org,
arjan@...ux.intel.com, tglx@...utronix.de, jiang@...ncsu.edu,
linux-tip-commits@...r.kernel.org
Subject: Re: [tip:x86/mm] x86, mm: NX protection for kernel data
On Mon, Feb 22, 2010 at 12:21 PM, Ingo Molnar <mingo@...e.hu> wrote:
>
> * H. Peter Anvin <hpa@...or.com> wrote:
>
>> On 02/22/2010 03:01 AM, Ingo Molnar wrote:
>> >>
>> >>> Commit-ID: 01ab31371da90a795b774d87edf2c21bb3a64dda
>> >>> Gitweb: http://git.kernel.org/tip/01ab31371da90a795b774d87edf2c21bb3a64dda
>> >>> Author: Siarhei Liakh <sliakh.lkml@...il.com>
>> >>> AuthorDate: Sun, 31 Jan 2010 18:27:55 -0500
>> >>> Committer: H. Peter Anvin <hpa@...or.com>
>> >>> CommitDate: Wed, 17 Feb 2010 10:11:24 -0800
>> >>>
>> >>> x86, mm: NX protection for kernel data
>> >>>
>> >>> This patch expands functionality of CONFIG_DEBUG_RODATA to set main
>> >>> (static) kernel data area as NX.
>> >>
>> >> -tip testing is seeing boot hangs along the lines of:
>> >>
>> >> [ 15.568108] EXT3-fs (sda1): recovery complete
>> >> [ 15.573064] EXT3-fs (sda1): mounted filesystem with ordered data mode
>> >> [ 15.580313] VFS: Mounted root (ext3 filesystem) readonly on device 8:1.
>> >> [ 15.584021] async_waiting @ 1
>> >> [ 15.588008] async_continuing @ 1 after 0 usec
>> >> [ 15.592163] Freeing unused kernel memory: 540k freed
>> >> [ 15.600126] NX-protecting the kernel data: c15ab000, 2919 pages
>> >>
>> >> which i suspect could be due to the commit above.
>> >
>> > Yep, that's confirmed now, applying these 3 reverts makes it boot fine:
>> >
>> > 833e0ca: Revert "x86, mm: NX protection for kernel data"
>> > ce4b6b4: Revert "x86: RO/NX protection for loadable kernel modules"
>> > e357312: Revert "module: fix () used as prototype in include/linux/module.h"
>> >
>>
>> Given that e357312 is a () -> (void) prototype fix, is hardly seems
>> likely to be at fault. The RO/NX stuff, on the other hand, make a lot
>> of sense.
>
> Yes, i reverted e357312 because it was a dependent change.
I was able to narrow down the issue to spinlock debugging. More
specifically, DEBUG_SPINLOCK=y seem to be somehow incompatible with
kernel's RW-data being NX.
Crash/nocrash config diff:
============================================
diff -uNr config.tip.crash config.tip.nocrash
--- config.tip.crash 2010-03-05 22:43:01.000000000 -0500
+++ config.tip.nocrash 2010-03-06 01:38:00.000000000 -0500
@@ -1,7 +1,7 @@
#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.33
-# Fri Mar 5 22:22:10 2010
+# Sat Mar 6 01:22:32 2010
#
# CONFIG_64BIT is not set
CONFIG_X86_32=y
@@ -219,27 +219,27 @@
# CONFIG_INLINE_SPIN_LOCK_BH is not set
# CONFIG_INLINE_SPIN_LOCK_IRQ is not set
# CONFIG_INLINE_SPIN_LOCK_IRQSAVE is not set
-# CONFIG_INLINE_SPIN_UNLOCK is not set
+CONFIG_INLINE_SPIN_UNLOCK=y
# CONFIG_INLINE_SPIN_UNLOCK_BH is not set
-# CONFIG_INLINE_SPIN_UNLOCK_IRQ is not set
+CONFIG_INLINE_SPIN_UNLOCK_IRQ=y
# CONFIG_INLINE_SPIN_UNLOCK_IRQRESTORE is not set
# CONFIG_INLINE_READ_TRYLOCK is not set
# CONFIG_INLINE_READ_LOCK is not set
# CONFIG_INLINE_READ_LOCK_BH is not set
# CONFIG_INLINE_READ_LOCK_IRQ is not set
# CONFIG_INLINE_READ_LOCK_IRQSAVE is not set
-# CONFIG_INLINE_READ_UNLOCK is not set
+CONFIG_INLINE_READ_UNLOCK=y
# CONFIG_INLINE_READ_UNLOCK_BH is not set
-# CONFIG_INLINE_READ_UNLOCK_IRQ is not set
+CONFIG_INLINE_READ_UNLOCK_IRQ=y
# CONFIG_INLINE_READ_UNLOCK_IRQRESTORE is not set
# CONFIG_INLINE_WRITE_TRYLOCK is not set
# CONFIG_INLINE_WRITE_LOCK is not set
# CONFIG_INLINE_WRITE_LOCK_BH is not set
# CONFIG_INLINE_WRITE_LOCK_IRQ is not set
# CONFIG_INLINE_WRITE_LOCK_IRQSAVE is not set
-# CONFIG_INLINE_WRITE_UNLOCK is not set
+CONFIG_INLINE_WRITE_UNLOCK=y
# CONFIG_INLINE_WRITE_UNLOCK_BH is not set
-# CONFIG_INLINE_WRITE_UNLOCK_IRQ is not set
+CONFIG_INLINE_WRITE_UNLOCK_IRQ=y
# CONFIG_INLINE_WRITE_UNLOCK_IRQRESTORE is not set
# CONFIG_MUTEX_SPIN_ON_OWNER is not set
CONFIG_FREEZER=y
@@ -331,7 +331,7 @@
CONFIG_FLAT_NODE_MEM_MAP=y
CONFIG_SPARSEMEM_STATIC=y
CONFIG_PAGEFLAGS_EXTENDED=y
-CONFIG_SPLIT_PTLOCK_CPUS=999999
+CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_PHYS_ADDR_T_64BIT=y
CONFIG_ZONE_DMA_FLAG=1
CONFIG_BOUNCE=y
@@ -2808,16 +2808,12 @@
CONFIG_DEBUG_RT_MUTEXES=y
CONFIG_DEBUG_PI_LIST=y
CONFIG_RT_MUTEX_TESTER=y
-CONFIG_DEBUG_SPINLOCK=y
+# CONFIG_DEBUG_SPINLOCK is not set
CONFIG_DEBUG_MUTEXES=y
-CONFIG_DEBUG_LOCK_ALLOC=y
-CONFIG_PROVE_LOCKING=y
-# CONFIG_PROVE_RCU is not set
-CONFIG_LOCKDEP=y
+# CONFIG_DEBUG_LOCK_ALLOC is not set
+# CONFIG_PROVE_LOCKING is not set
# CONFIG_LOCK_STAT is not set
-CONFIG_DEBUG_LOCKDEP=y
-CONFIG_TRACE_IRQFLAGS=y
-CONFIG_DEBUG_SPINLOCK_SLEEP=y
+# CONFIG_DEBUG_SPINLOCK_SLEEP is not set
CONFIG_DEBUG_LOCKING_API_SELFTESTS=y
CONFIG_STACKTRACE=y
# CONFIG_DEBUG_KOBJECT is not set
============================================
Kernel crash dump:
============================================
[ 2.844000] EXT3-fs (sda1): warning: maximal mount count reached,
running e2fsck is recommended
[ 2.848000] EXT3-fs (sda1): using internal journal
[ 2.849556] EXT3-fs (sda1): recovery complete
[ 2.852000] EXT3-fs (sda1): mounted filesystem with ordered data mode
[ 2.854168] VFS: Mounted root (ext3 filesystem) on device 8:1.
[ 2.856000] Freeing unused kernel memory (init): 540k freed
[ 2.857056] NX-protecting the kernel data: 0xc15b3000 - 0xc1834000, 641 pages
[ 2.860328] do_page_fault - entry
[ 2.862554] do_page_fault: 0xc17ebdb8
[ 2.864000] do_page_fault - kernel space
[ 2.864000] do_page_fault - about to call bad_area_nosemaphore()
[ 2.864000] BUG: unable to handle kernel paging request at c17ebdb8
[ 2.864000] IP: [<c12609f7>] do_raw_spin_unlock+0x5e/0x71
[ 2.864000] *pdpt = 00000000018c0001 *pde = 80000000016001e1
[ 2.864000] Oops: 0003 [#1] SMP
[ 2.864000] last sysfs file:
[ 2.864000] Modules linked in:
[ 2.864000]
[ 2.864000] Pid: 1, comm: swapper Not tainted 2.6.33-tip+ #41 /
[ 2.864000] EIP: 0060:[<c12609f7>] EFLAGS: 00010046 CPU: 0
[ 2.864000] EIP is at do_raw_spin_unlock+0x5e/0x71
[ 2.864000] EAX: 00000000 EBX: c17ebdac ECX: 00000001 EDX: 00000c0b
[ 2.864000] ESI: 00000246 EDI: c18c0058 EBP: f780fe14 ESP: f780fe10
[ 2.864000] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 2.864000] Process swapper (pid: 1, ti=f780f000 task=f7826000
task.ti=f780f000)
[ 2.864000] Stack:
[ 2.864000] c17ebdac f780fe24 c15ad3f2 00000000 00000000 f780ff18
c1017a57 00000000
[ 2.864000] <0> 016001e3 00000000 016001e3 f77a8004 00000001
00000000 00000163 80000000
[ 2.864000] <0> 00000000 ffffffff ffffffff 80000000 000001e1
80000000 00000000 80000000
[ 2.864000] Call Trace:
[ 2.864000] [<c15ad3f2>] ? _raw_spin_unlock_irqrestore+0x20/0x3c
[ 2.864000] [<c1017a57>] ? __change_page_attr_set_clr+0x65c/0x945
[ 2.864000] [<c1092245>] ? vm_unmap_aliases+0x17b/0x186
[ 2.864000] [<c15b3000>] ? _etext+0x0/0x24
[ 2.864000] [<c1017eb4>] ? change_page_attr_set_clr+0x174/0x312
[ 2.864000] [<c15b3000>] ? _etext+0x0/0x24
[ 2.864000] [<c10182d1>] ? set_memory_nx+0x2d/0x32
[ 2.864000] [<c10163ab>] ? mark_nxdata_nx+0x37/0x41
[ 2.864000] [<c15b3000>] ? _etext+0x0/0x24
[ 2.864000] [<c1834000>] ? i386_start_kernel+0x0/0xaa
[ 2.864000] [<c101649d>] ? free_initmem+0x1c/0x1e
[ 2.864000] [<c1001148>] ? init_post+0xd/0x121
[ 2.864000] [<c1834401>] ? kernel_init+0x1d5/0x1df
[ 2.864000] [<c183422c>] ? kernel_init+0x0/0x1df
[ 2.864000] [<c1002e66>] ? kernel_thread_helper+0x6/0x10
[ 2.864000] Code: 54 8b c1 39 43 0c 74 0c ba 74 e1 73 c1 89 d8 e8
31 ff ff ff 64 a1 d8 6b 8b c1 39 43 08 74 0c ba 80 e1 73 c1 89 d8 e8
1a ff ff ff <c7> 43 0c ff ff ff ff c7 43 08 ff ff ff ff fe 03 5b 5d c3
55 89
[ 2.864000] EIP: [<c12609f7>] do_raw_spin_unlock+0x5e/0x71 SS:ESP
0068:f780fe10
[ 2.864000] CR2: 00000000c17ebdb8
[ 2.864000] ---[ end trace 0d94f53e9dfe82f9 ]---
[ 2.948071] swapper used greatest stack depth: 1804 bytes left
[ 2.952000] Kernel panic - not syncing: Attempted to kill init!
============================================
looking for c17ebdb8 in system.map points to a location in pgd_lock:
============================================
$grep c17ebd System.map
c17ebd68 d bios_check_work
c17ebda8 d highmem_pages
c17ebdac D pgd_lock
c17ebdc8 D pgd_list
c17ebdd0 D show_unhandled_signals
c17ebdd4 d cpa_lock
c17ebdf0 d memtype_lock
============================================
I've looked at the lock debugging and could not find any place that
would look like an attempt to execute data. This would lead me to
think that calling set_memory_nx from kernel_init somehow confuses the
lock debugging subsystem, or set_memory_nx does not change page
attributes in a safe manner (for example when a lock is stored inside
the page whose attributes are being changed).
Any suggestions on how should I proceed forward in troubleshooting this issue?
Thank you.
View attachment "config.tip.crash" of type "text/x-apport" (70743 bytes)
Download attachment "config.tip.nocrash" of type "application/octet-stream" (70622 bytes)
Powered by blists - more mailing lists