lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1212340262.5802.8.camel@odie.local>
Date:	Sun, 01 Jun 2008 19:11:02 +0200
From:	Simon Holm Thøgersen <odie@...aau.dk>
To:	j.mell@...nline.de
Cc:	Steven Rostedt <rostedt@...dmis.org>, linux-kernel@...r.kernel.org,
	ak@...e.de
Subject: Re: CONFIG_PREEMPT causes corruption of application's FPU stack

søn, 01 06 2008 kl. 11:01 +0200, skrev j.mell@...nline.de:
[...]
> 
> 3. If I revert the patch
>  
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=acc207616a91a413a50fdd8847a747c4a7324167
> 
> in 2.6.20, Einstein does not crash anymore (program was run for more than 
> 30 hours while system was in normal use with programming, multi-media 
> etc.). Unfortunately git refuses to revert this patch in 2.6.26-rc4.
[...]

I don't think the bisected commit is responsible for anything, but
triggering a bug elsewhere with your workload. I've been chasing the
same problem I think, but with other symptoms.

I'm triggering the following by running an lguest guest, but I guess the
workload just need to have the right scheduler intensity to trigger the
bug.

BUG: sleeping function called from invalid context at mm/slab.c:3052
in_atomic():1, irqs_disabled():0
Pid: 4771, comm: lguest Not tainted
2.6.26-rc4-debug-only-preemptible-00103-g1beee8d #3
 [<c01146ee>] __might_sleep+0xe4/0xeb
 [<c01605d9>] kmem_cache_alloc+0x22/0xb4
 [<c0108479>] init_fpu+0xb0/0x14d
 [<c0104768>] math_state_restore+0x26/0x5d
 [<c01045ab>] device_not_available+0x43/0x48
 [<c011007b>] ? handle_vm86_fault+0x213/0x6b8
 [<c01029ad>] ? __switch_to+0x23/0x113
 [<c02d6c9f>] schedule+0x221/0x2a4
 [<c02d716a>] ? schedule_timeout+0x16/0x89
 [<c016ed36>] ? __pollwait+0xaa/0xb0
 [<c0168358>] ? pipe_poll+0x29/0x89
 [<c016e79a>] ? do_select+0x478/0x4cd
 [<c016ec8c>] ? __pollwait+0x0/0xb0
 [<c0116657>] ? default_wake_function+0x0/0xd
 [<c0116657>] ? default_wake_function+0x0/0xd
 [<c0116657>] ? default_wake_function+0x0/0xd
 [<c0116657>] ? default_wake_function+0x0/0xd
 [<c0116657>] ? default_wake_function+0x0/0xd
 [<c012f49d>] ? getnstimeofday+0x37/0xb7
 [<c012d810>] ? ktime_get_ts+0x40/0x44
 [<c012d827>] ? ktime_get+0x13/0x2f
 [<c011406f>] ? hrtick_start_fair+0xd5/0x111
 [<c01216f6>] ? internal_add_timer+0x8e/0x92
 [<c01f27b7>] ? delay_tsc+0x4f/0x68
 [<c025b05b>] ? ide_dma_intr+0x0/0x79
 [<c01f274d>] ? __delay+0x9/0xb
 [<c01f2766>] ? __const_udelay+0x17/0x19
 [<c0256df7>] ? ide_execute_command+0x7b/0x95
 [<c025a7a2>] ? ide_dma_start+0x24/0x36
 [<c0259ebb>] ? do_rw_taskfile+0x1be/0x1cf
 [<c025c31a>] ? ide_do_rw_disk+0x19a/0x1dd
 [<c01f2766>] ? __const_udelay+0x17/0x19
 [<c025554e>] ? ide_do_request+0x838/0x875
 [<c0254941>] ? ide_end_request+0x7d/0x99
 [<c016e9d8>] ? core_sys_select+0x1e9/0x2c7
 [<c0146aa4>] ? find_lock_page+0xa1/0xbb
 [<c01489a2>] ? filemap_fault+0x21c/0x382
 [<c0146942>] ? unlock_page+0x24/0x27
 [<c0151bac>] ? __do_fault+0x314/0x34c
 [<c0153a56>] ? handle_mm_fault+0x291/0x65a
 [<c016edcb>] ? sys_select+0x8f/0x143
 [<c02d9dc8>] ? do_page_fault+0x33c/0x616
 [<c0103a67>] ? sysenter_past_esp+0x78/0xb1
 [<c02d0000>] ? packet_rcv+0x159/0x2c7
 =======================
BUG: sleeping function called from invalid context at mm/slab.c:3052
in_atomic():1, irqs_disabled():0
Pid: 4771, comm: lguest Not tainted
2.6.26-rc4-debug-only-preemptible-00103-g1beee8d #3
 [<c01146ee>] __might_sleep+0xe4/0xeb
 [<c01605d9>] kmem_cache_alloc+0x22/0xb4
 [<c012f49d>] ? getnstimeofday+0x37/0xb7
 [<c012f49d>] ? getnstimeofday+0x37/0xb7
 [<c0108479>] init_fpu+0xb0/0x14d
 [<c0104768>] math_state_restore+0x26/0x5d
 [<c01045ab>] device_not_available+0x43/0x48
 [<c0110000>] ? handle_vm86_fault+0x198/0x6b8
 [<c01029ad>] ? __switch_to+0x23/0x113
 [<c02d6c9f>] schedule+0x221/0x2a4
 [<c012a95b>] ? prepare_to_wait+0x6c/0x84
 [<c0168bdd>] ? pipe_wait+0x53/0x72
 [<c012a76f>] ? autoremove_wake_function+0x0/0x30
 [<c01692b9>] ? pipe_read+0x29a/0x302
 [<c012d6ae>] ? hrtimer_start+0xcc/0xf8
 [<c0115efd>] ? hrtick_set+0xcc/0x140
 [<c01630b0>] ? do_sync_read+0xba/0xf8
 [<c012a76f>] ? autoremove_wake_function+0x0/0x30
 [<c01638b8>] ? default_llseek+0xa7/0xb5
 [<c0162ff6>] ? do_sync_read+0x0/0xf8
 [<c0163795>] ? vfs_read+0x8a/0x106
 [<c0163ada>] ? sys_read+0x3b/0x60
 [<c0103a67>] ? sysenter_past_esp+0x78/0xb1
 =======================

I'm getting the traces with CONFIG_DEBUG_PREEMPT=y and no
CONFIG_DEBUG_SPINLOCK, since otherwise I'd just get 

BUG: sleeping function called from invalid context at mm/slab.c:3052
BUG: spinlock recursion on CPU#0, lguest/5428

and nothing further. Using CONFIG_DEBUG_PREEMPT=y,
CONFIG_DEBUG_SPINLOCK=y and booting with "lapic nmi_watchdog=2" I was
also able to grab the following over netconsole though

BUG: sleeping function called from invalid context at mm/slab.c:3052
BUG: spinlock recursion on CPU#0, lguest/5428
BUG: NMI Watchdog detected LOCKUP on CPU0, ip c01f1f76, registers:
Modules linked in: lg tun arc4 ecb crypto_blkcipher cryptomgr
crypto_algapi ieee80211_crypt_wep bridge llc snd_seq snd_seq_device
radeonfb uhci_hcd snd_intel8x0 ehci_hcd snd_ac97_codec ipw2200 usbcore
fb ac97_bus fb_ddc backlight snd_pcm ieee80211 firewire_ohci
firewire_core i2c_algo_bit snd_timer intel_agp cfbcopyarea snd agpgart
i2c_i801 ieee80211_crypt firmware_class cfbimgblt soundcore crc_itu_t
cfbfillrect rtc iTCO_wdt snd_page_alloc i2c_core

Pid: 5428, comm: lguest Not tainted
(2.6.26-rc4debug-locks-extended-00103-g1beee8d #2)
EIP: 0060:[<c01f1f76>] EFLAGS: 00000002 CPU: 0
EIP is at delay_tsc+0x2e/0x68
EAX: 100a8436 EBX: c03acd84 ECX: 00000000 EDX: 0000004c
ESI: 100a83fb EDI: 00000001 EBP: e9651738 ESP: e9651724
 DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
Process lguest (pid: 5428, ti=e9650000 task=efbc92c0 task.ti=e964c000)
Stack: 00000001 100a83fb c03acd84 02df69af 00000001 e9651740 c01f1f2d
e9651758 
       c01ff481 593ac9e8 c03acd84 00000092 000078cf e9651774 c02d4968
00000000 
       00000002 c01141c4 c03acd84 00000001 e965178c c01141c4 00000001
00007892 
Call Trace:
 [<c01f1f2d>] ? __delay+0x9/0xb
 [<c01ff481>] ? _raw_spin_lock+0x83/0xcb
 [<c02d4968>] ? _spin_lock_irqsave+0x46/0x4f
 [<c01141c4>] ? __wake_up+0x15/0x3b
 [<c01141c4>] ? __wake_up+0x15/0x3b
 [<c0119cf0>] ? wake_up_klogd+0x2e/0x31
 [<c0119ea0>] ? release_console_sem+0x1ad/0x1b5
 [<c011a3d1>] ? vprintk+0x383/0x3c2
 [<c0134276>] ? debug_check_no_locks_freed+0x111/0x13c
 [<c0134276>] ? debug_check_no_locks_freed+0x111/0x13c
 [<c011a425>] ? printk+0x15/0x17
 [<c01ff286>] ? spin_bug+0x3f/0x80
 [<c01ff432>] ? _raw_spin_lock+0x34/0xcb
 [<c02d474e>] ? _spin_lock+0x2a/0x32
 [<c011471b>] ? task_rq_lock+0x2a/0x31
 [<c011471b>] ? task_rq_lock+0x2a/0x31
 [<c0114dc8>] ? try_to_wake_up+0x15/0x81
 [<c0114e3f>] ? default_wake_function+0xb/0xd
 [<c012985f>] ? autoremove_wake_function+0xe/0x30
 [<c0113626>] ? __wake_up_common+0x35/0x5b
 [<c01141d7>] ? __wake_up+0x28/0x3b
 [<c0119cf0>] ? wake_up_klogd+0x2e/0x31
 [<c0119ea0>] ? release_console_sem+0x1ad/0x1b5
 [<c011a3d1>] ? vprintk+0x383/0x3c2
 [<c011a425>] ? printk+0x15/0x17
 [<c0115232>] ? __might_sleep+0x8c/0x108
 [<c0160c37>] ? kmem_cache_alloc+0x22/0xd6
 [<c0108441>] ? init_fpu+0xb0/0x14d
 [<c01047fd>] ? math_state_restore+0x2b/0x67
 [<c010463a>] ? device_not_available+0x4e/0x53
 [<c02d2b8a>] ? schedule+0x1fe/0x2a1
 [<c011007b>] ? handle_vm86_fault+0x31f/0x6b8
 [<c01029c1>] ? __switch_to+0x23/0x113
 [<c02d2bc6>] ? schedule+0x23a/0x2a1
 [<c0134130>] ? trace_hardirqs_on+0xe6/0x11b
 [<c02d4c3f>] ? _spin_unlock_irqrestore+0x56/0x6c
 [<c02d2db8>] ? schedule_timeout+0x16/0x89
 [<c016ee08>] ? __pollwait+0xaa/0xb0
 [<c0168604>] ? pipe_poll+0x29/0x89
 [<c016e844>] ? do_select+0x4a6/0x4f8
 [<c016ed5e>] ? __pollwait+0x0/0xb0
 [<c0114e34>] ? default_wake_function+0x0/0xd
 [<c0114e34>] ? default_wake_function+0x0/0xd
 [<c0114e34>] ? default_wake_function+0x0/0xd
 [<c0114e34>] ? default_wake_function+0x0/0xd
 [<c0114e34>] ? default_wake_function+0x0/0xd
 [<c0113dfa>] ? hrtick_start_fair+0x117/0x11f
 [<c01607a8>] ? kfree+0xad/0xc0
 [<c0160833>] ? kmem_cache_free+0x78/0x8a
 [<c0134130>] ? trace_hardirqs_on+0xe6/0x11b
 [<c0133fac>] ? mark_held_locks+0x4e/0x66
 [<c01349ae>] ? __lock_acquire+0x488/0xb4c
 [<c016eaad>] ? core_sys_select+0x217/0x2f2
 [<c02d4d42>] ? _read_unlock_irq+0x36/0x4b
 [<c0147fab>] ? unlock_page+0x24/0x27
 [<c0152c7f>] ? __do_fault+0x31e/0x356
 [<c0153fb3>] ? handle_mm_fault+0x2a0/0x637
 [<c016ee9d>] ? sys_select+0x8f/0x143
 [<c02d67d3>] ? do_page_fault+0x352/0x631
 [<c0103b59>] ? restore_nocheck+0x12/0x15
 [<c02d6481>] ? do_page_fault+0x0/0x631
 [<c0134130>] ? trace_hardirqs_on+0xe6/0x11b
 [<c0103a4f>] ? sysenter_past_esp+0x78/0xd1
 =======================
Code: 57 56 53 83 ec 08 89 45 ec b8 01 00 00 00 e8 b8 4b 0e 00 0f 31 0f
1f 40 00 b9 00 00 00 00 89 c6 89 c8 09 f0 89 45 f0 f3 90 0f 31 <0f> 1f
40 00 b9 00 00 00 00 89 c6 89 c8 09 f0 2b 45 f0 3b 45 ec 


Let us take a closer look at __switch_to. I'm using test data from yet
another trace (that I actually captured before all the other traces
presented here).

BUG: unable to handle kernel NULL pointer dereference at 000001ff
IP: [<c0102964>] __switch_to+0x19/0xff

That is the only part of the trace I've got; there wasn't produced more.
Anyway, for that particular kernel I used, __switch_to disassembled to

0xc0102955 <__switch_to+10>: mov    0x4(%eax),%eax
0xc0102958 <__switch_to+13>: testb  $0x1,0xc(%eax)
0xc010295c <__switch_to+17>: je     0xc0102995 <__switch_to+74>
0xc010295e <__switch_to+19>: mov    0x26c(%edi),%eax
0xc0102964 <__switch_to+25>: fnsave (%eax)

with corresponding source code

struct task_struct * __switch_to(struct task_struct *prev_p, struct
task_struct *next_p)
{
[...]
        __unlazy_fpu(prev_p);
[...]
}

static inline void __unlazy_fpu(struct task_struct *tsk)
{
        if (task_thread_info(tsk)->status & TS_USEDFPU) {
                __save_init_fpu(tsk);
                stts();
        } else
                tsk->fpu_counter = 0;
}

where __save_init_fpu(tsk) accesses
	tsk->thread.xstate->fxsave

The first mov, testb and je of the disassembly are the if statement, and
the second mov and fnsave are part of __save_init_fpu that access
tsk->thread.xstate->fxsave. %eax (initially) and %edi hold prev_p from
__switch_to.

Hope that was readable and not too confusing. Now comes some partly
guesswork from my side that could be wrong. As far as I can tell
%eax (initially) and %edi / prev_p must hold 0xffffff93. Why the first
mov is valid then I'm not sure, but the if statement must be true by
chance.  It is not by pure chance though, since prev_p always has the
same value (i.e. it is consistently "dereference at 000001ff" I get
across multiple runs).

So I'd say some sort of corruption caused by very fast context switch
between the launcher and switcher (or-whatever-their-names-are)
processes in lguest.

That is as far as I have been able to debug this, suggestions are
welcome. I guess I should note that I haven't tried with preemption
disabled, but I don't think there's much point in it.


Simon Holm Thøgersen

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ