linux-ext4 - Re: kerneloops.org: 2.6.26-rc possible regression in ext3

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LFD.1.10.0806182302340.2907@woody.linux-foundation.org>
Date:	Wed, 18 Jun 2008 23:14:12 -0700 (PDT)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Arjan van de Ven <arjan@...ux.intel.com>
cc:	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	linux-ext4@...r.kernel.org,
	Andrew Morton <akpm@...ux-foundation.org>,
	Al Viro <viro@...iv.linux.org.uk>
Subject: Re: kerneloops.org: 2.6.26-rc possible regression in ext3

On Wed, 18 Jun 2008, Linus Torvalds wrote:
> 
> One thing I note is that all the oopses seem to be i686 - are there that 
> few x86-64 fc10 users (I'd have assumed that 64-bit is starting to be the 
> norm for people who live on the edge, but perhaps I'm just out of touch)? 
> 
> Or could this perhaps be an indication that it is specific to i686 some 
> way (eg a compiler issue?)

The oops code is odd:

  27:	8d 4c 18 fe          	lea    0xfffffffe(%eax,%ebx,1),%ecx
  2b:*	8b 19                	mov    (%ecx),%ebx     <-- trapping instruction
  2d:	83 e9 08             	sub    $0x8,%ecx
  30:	89 d8                	mov    %ebx,%eax
  32:	66 d1 e8             	shr    %ax
  35:	0f b7 c0             	movzwl %ax,%eax

and that "lea" is doing an address computation of "eax+2*ebx-2". Which 
does *not* look like an address to a 32-bit entity, but to a 16-bit one. 
Yeah, it's not conclusive, but it is suggestive.

And the 16-bit "shr+movzwl" further strengthens the case that it is 
actually working on a 16-bit entity. The trapping instruction _should_ 
possibly have been a "movzwl (%ecx),%ebx" to begin with.

But it did a 32-bit load, and in this case it looks as if the 16-bit load 
would have been correct! The value of ECX in this example was

	ECX: dc384ffe

ie it was indeed a two-byte aligned thing at the end of the page, and if 
the load had been a 16-bit load (like the data seems to be), it would 
never have oopsed! The page fault seems to be due to DEBUG_PAGEALLOC and 
the next page being unmapped because it's not allocated.

I only looked closer at one particular oops (25906, in case anybody 
cares), but at least judging from that particular one I would indeed 
suspect a compiler bug.

Of course, the main reason I say that is that none of the ext3 or VFS 
changes look even _remotely_ relevant to any of this. They really don't 
look like they could possibly matter for "do_split()" unless there is 
something really odd going on.

			Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html