linux-kernel - Re: futex_cmpxchg_enabled not set in futex

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-id: <alpine.LNX.2.00.0911302027020.1345@bruno>
Date:	Mon, 30 Nov 2009 22:27:32 -0600 (CST)
From:	Joseph Parmelee <jparmele@...dbear.com>
To:	Thomas Gleixner <tglx@...utronix.de>
Cc:	Darren Hart <dvhltc@...ibm.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...e.hu>,
	Eric Dumazet <eric.dumazet@...il.com>,
	Dinakar Guniguntala <dino@...ibm.com>,
	linux-kernel@...r.kernel.org
Subject: Re: futex_cmpxchg_enabled not set in futex_init on pentium3

On Mon, 30 Nov 2009, Thomas Gleixner wrote:

> Can you please printk the return value of that cmpxchg() test and
> provide a full bootlog (dmesg) of your machine ?

Thanks for the responses, which verify that my understanding of the expected
behavior was correct.  That means that we have a real bug.

Attached is the complete output of dmesg from the most recent boot run.

This line due to added printk in futex_init:

[    0.147626] futex_init curval = F0006AA0

These lines due to added printk's in
arch/x86/include/asm/futex.h:futex_atomic_cmpxchg_inatomic().

[    0.147384] cmpxchg: ax before=cf80e000, ax after=f0006aa0
[    0.147444] cmpxchg: bx before=0, bx after=0
[    0.147536] cmpxchg: cx before=0, cx after=0

The compiler generates cmpxchg %ecx,(%ebx), so I added extended asm to
dump the registers involved just before and after the cmpxchg into variables
for printk.

All is consistent with the fact that the fault is not occurring and the
cmpxchg is working "as expected" at address 0.  Examining /proc/kcore with
gdb shows that address c0000000 contains f0006aa0.  Direct access with gdb
to address 0 fails as expected.

To completely eliminate any possibility that the fault was getting lost in
the fixup code somehow, I removed all the fixup code from the cmpxchg
extended asm, and the results are exactly the same.  In fact this run is
with the fixup code removed.  The fault is not occurring.

> That'd be a serious bug as it would let every NULL pointer dereference
> in the kernel proceed.

Interestingly, a printk inserted in futex_init that attempts a null
dereference results in an oops as expected.

>
> Could you also please do a quick check in which kernel version this
> got introduced ?

This was known to be working in 2.6.28.6.  Unfortunately, I didn't find it
until I updated glibc and ran its test suite on 2.6.31.5.  However, I have
been noticing some nasty log messages about page allocation failures in pppd
with plenty of available memory starting from sometime in the 2.6.31 series. 
One of them is also attached FWIW.  But these didn't seem to be causing any
problems other than making me nervous.

I am located in the mountains of Costa Rica with only a very slow dialup, so
git bisect is not an option for me.  But I do have old copies of vmlinuz
lying around that go back to the 2.6.29 series.  Unfortunately, I don't have
the matching unstripped vmlinux which would allow debugging, but I might be
able to test other ways.  I will post again as soon as I have something.

In the meantime, if you can think of any tests that you want to run, I will
be most happy to help.

Best regards,

Joseph

View attachment "dmesg" of type "TEXT/PLAIN" (16288 bytes)

View attachment "log" of type "TEXT/PLAIN" (4806 bytes)