lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 5 Oct 2011 16:36:59 -0700
From:	Andrew Lutomirski <luto@....edu>
To:	Adrian Bunk <bunk@...sta.de>,
	richard -rw- weinberger <richard.weinberger@...il.com>
Cc:	"H. Peter Anvin" <hpa@...ux.intel.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>, x86@...nel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [3.1 patch] x86: default to vsyscall=native

On Wed, Oct 5, 2011 at 3:46 PM, Andrew Lutomirski <luto@....edu> wrote:
> On Wed, Oct 5, 2011 at 3:30 PM, Adrian Bunk <bunk@...sta.de> wrote:
>> On Thu, Oct 06, 2011 at 12:22:34AM +0200, richard -rw- weinberger wrote:
>>> On Thu, Oct 6, 2011 at 12:13 AM, Andrew Lutomirski <luto@....edu> wrote:
>>> > On Mon, Oct 3, 2011 at 10:33 AM, Adrian Bunk <bunk@...sta.de> wrote:
>>> >> On Mon, Oct 03, 2011 at 06:04:53AM -0700, Andrew Lutomirski wrote:
>>> >>> On Mon, Oct 3, 2011 at 2:08 AM, Adrian Bunk <bunk@...sta.de> wrote:
>>> >>> > After upgrading a kernel the existing userspace should just work
>>> >>> > (assuming it did work before ;-) ), but when I upgraded my kernel
>>> >>> > from 3.0.4 to 3.1.0-rc8 a UML instance didn't come up properly.
>>> >>> >
>>> >>> > dmesg said:
>>> >>> >  linux-2.6.30.1[3800] vsyscall fault (exploit attempt?) ip:ffffffffff600000 cs:33 sp:7fbfb9c498 ax:ffffffffff600000 si:0 di:606790
>>> >>> >  linux-2.6.30.1[3856] vsyscall fault (exploit attempt?) ip:ffffffffff600000 cs:33 sp:7fbfb13168 ax:ffffffffff600000 si:0 di:606790
>>> >>> >
>>> >>> > Looking throught the changelog I ended up at commit 3ae36655
>>> >>> > ("x86-64: Rework vsyscall emulation and add vsyscall= parameter").
>>> >>> >
>>> >>> > Linus suggested in https://lkml.org/lkml/2011/8/9/376 to default to
>>> >>> > vsyscall=native.
>>> >>> >
>>> >>> > That sounds reasonable to me, and fixes the problem for me.
>>> >>>
>>> >>> At this point in the -rc cycle, this sounds fine.
>>> >>>
>>> >>> That being said, I'd like to fix it for real for 3.2.  This particular
>>> >>> failure is suspicious -- the "vsyscall fault" message means that
>>> >>> sys_gettimeofday returned EFAULT, which means that the old (3.0 and
>>> >>> before) vgettimeofday should *also* have segfaulted.
>>> >>
>>> >> This 2.6.30.1 UML kernel binary from 2009 worked for me for all host
>>> >> kernels from 2.6.30 to 3.0, and with 3.1.0-rc8 and vsyscall=native
>>> >> it also seems to run nicely.
>>> >>
>>> >> Looking deeper into "a UML instance didn't come up properly",
>>> >> the problem is that it comes up in a strange (readonly) state.
>>> >>
>>> >> There are "Using makefile-style concurrent boot in runlevel S."
>>> >> and "Using makefile-style concurrent boot in runlevel 2." in the
>>> >> logs with a Debian userspace, but no output from the init scripts
>>> >> in these broken bootups (normal messages are in non-broken bootups).
>>> >>
>>> >> Perhaps the two the messages I see in dmesg on the host are from the
>>> >> processes running rcS and rc2 failing early?
>>> >>
>>> >> In a working startup with a Debian userspace, I'm getting during rcS
>>> >>  Setting the system clock.
>>> >>  Cannot access the Hardware Clock via any known method.
>>> >>  Use the --debug option to see the details of our search for an access method.
>>> >>  Unable to set System Clock to: Mon Oct 3 17:01:35 UTC 2011 ... (warning).
>>> >>
>>> >>> We do have a bit
>>> >>> of a bug in that the new code doesn't report si_addr properly, but
>>> >>> that sounds unlikely as a culprit.  Did you try with the offending
>>> >>> commit reverted (i.e. fce8dc0)?  I bet that it also fails there.
>>> >>
>>> >> fce8dc0 is "x86-64: Wire up getcpu syscall", is that really the one you
>>> >> want me to revert?
>>> >>
>>> >>> What's the .config for your UML binary?  I'd like to see if I can
>>> >>> reproduce this.
>>> >>
>>> >> It's attached.
>>> >>
>>> >
>>> > I can't reproduce it.  What distro is running inside the UML instance?
>>>
>>> Same here.
>>> Adrian, is the UML kernel crashing before executing init?
>>
>> As I wrote:
>>  Looking deeper into "a UML instance didn't come up properly",
>>  the problem is that it comes up in a strange (readonly) state.
>>
>> The UML kernel is running happily without crashing, and as I wrote my
>> guess about my problems is:
>>  Perhaps the two the messages I see in dmesg on the host are from the
>>  processes running rcS and rc2 failing early?
>>
>>> We definitely need more information...
>>
>> I gave the information that was requested. plus my observations.
>>
>> What more information exactly do you need from me?
>
> None :)  I just reproduced the problem with Debian Squeeze.  Lenny works fine.

This is strange.  The problem appears to be in startpar.  That same
exact Debian image works fine on KVM running 3.1-rc8 (with
vsyscall=emulate) and on 2.6.40 (i.e. Fedora 15's kernel).  If I set
print-fatal-signals=1 I don't see a fatal signal in startpar.

Richard, is it possible that UML 2.6.30.1 generates a bogus
vgettimeofday and recovers successfully on older kernels because the
resulting SIGSEGV had a valid sigcontext?  I can try hacking the
"vsyscall fault" path to generate full sigcontext and info.  This
seems rather unlikely, though.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ