[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <200805171831.08490.j.mell@t-online.de>
Date: Sat, 17 May 2008 18:31:08 +0200
From: Jürgen Mell <j.mell@...nline.de>
To: linux-kernel@...r.kernel.org
Subject: CONFIG_PREEMPT causes corruption of application's FPU stack
I am running the Einstein@...e application (version 4.35,
http://einstein.phys.uwm.edu).This application does lots of computations
mostly with FPU and SSE instructions.
After I started experimenting with real-time optimized kernels the
application began to crash with floating point errors like in the
following message:
APP DEBUG: Application caught signal 8.
FPU status word ffffa0e1, flags: ERR_SUMM STACK_FAULT PRECISION INVALID
Obtained 6 stack frames for this thread.
Use gdb command: 'info line *0xADDRESS' to print corresponding line
numbers.
einstein_S5R3_4.35_i686-pc-linux-gnu[0x8069e7e]
einstein_S5R3_4.35_i686-pc-linux-gnu[0x818d436]
einstein_S5R3_4.35_i686-pc-linux-gnu[0x805db8f]
einstein_S5R3_4.35_i686-pc-linux-gnu[0x806b11c]
/lib/libc.so.6(__libc_start_main+0xe0)[0xb7e14fe0]
einstein_S5R3_4.35_i686-pc-linux-gnu(shmat+0x59)[0x804bda1]
Stack trace of LAL functions in worker thread:
GetSemiCohToplist at line 3177 of
file /home/bema/einsteinathome/HierarchicalSearch/EaH_build_release_einstein_S5R3_4.35/extra_sources/lalapps-CVS/src/pulsar/hough/src2/HierarchicalSearch.c
At lowest level status code = 0, description: NO LAL ERROR REGISTERED
called boinc_finish
I tracked this down to a single kernel configuration option. If
CONFIG_PREEMPT is set to 'y' the application will start crashing. If
CONFIG_PREEMPT is replaced by CONFIG_PREEMPT_VOLUNTARY, the application
will run without errors.
The problem is reproducible in so far as the error always occurs when
CONFIG_PREEMPT is set, but the time to the first occurrence varies greatly
from some minutes up to more than 10 CPU hours.
I found this error first on an openSUSE kernel 2.6.22.17-0.1-rt. I verified
the problem on the following kernel versions:
openSUSE 2.6.22.17-0.1-default
openSUSE 2.6.23.17-ccj64-rt
kernel.org 2.6.26-rc1
kernel.org 2.6.26-rc2-git5
My CPU is an Intel Core2Duo 6420, running two of the Einstein applications
in 32-bit mode. From a discussion on the Einstein message boards I know
that other user of the application are also affected.
Please let me know if you need any additional information to track this
down.
Jürgen
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists