lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 24 Feb 2012 08:52:32 -0800
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Jiri Kosina <jkosina@...e.cz>
Cc:	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Hugh Dickins <hughd@...gle.com>,
	Al Viro <viro@...iv.linux.org.uk>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Oleg Nesterov <oleg@...hat.com>
Subject: Re: Linux 3.3-rc4

On Fri, Feb 24, 2012 at 2:39 AM, Jiri Kosina <jkosina@...e.cz> wrote:
>
> I just got the BUG below (with g45196ce being the topmost commit).
>
> It happened when trying to start 'gwenview', but I am not able to
> reproduce it again. Adding a few people to CC just in case someone
> immediately sees what might be the problem.

Hmm. That is the code that increments the file counter, afaik:

   0:     48 81 63 30 ff df ff ff   andq   $0xffffffffffffdfff,0x30(%rbx)
   8:     48 c7 43 20 00 00 00 00   movq   $0x0,0x20(%rbx)
  10:     48 c7 43 18 00 00 00 00   movq   $0x0,0x18(%rbx)
  18:     48 85 d2                  test   %rdx,%rdx
  1b:     74 4f                     je     0x6c
  1d:     48 8b 42 18               mov    0x18(%rdx),%rax
  21:     4c 8b a2 30 01 00 00      mov    0x130(%rdx),%r12
  28:*    48 8b 40 30               mov    0x30(%rax),%rax     <--
trapping instruction
  2c:     f0 48 ff 42 68            lock incq 0x68(%rdx)
  31:     f6 43 31 08               testb  $0x8,0x31(%rbx)
  35:     74 07                     je     0x3e

and that preceding test is testing for a NULL "file", and then the

   mov    0x18(%rdx),%rax

is "dentry = file->f_path.dentry", while the trapping "mov
0x30(%rax),%rax" is the continuation of that: "dentry->d_inode" (and
the "lock incq" is the get_file() - it's incrementing the file
counter). That "mov 0x130(%rdx),%r12" in between is doing "mapping =
file->f_mapping"

So dentry seems to be NULL for you.

> The machine has gone through several suspend-resume cycles before this
> happened, so it might well also be some memory corruption caused by a
> random driver.

I almost think it is, because "file->dentry" should never be NULL in a
mapping afaik. Especially as your "mapping" certainly isn't NULL (it's
in %r12, so you can see it in your register dump).

This isn't some unusual code sequence either, so I don't see it as
some random latent bug that is just very unlikely and hard to trigger
in that code itself.

I'll think about it, but my first reaction is "memory corruption". Do
you think you could try to run with a kernel that has SLAB debugging
and poisoning on? If it's a stale pointer dereference that has cleared
that dentry, that _might_ show it closer to the actual bug (rather
than a long time later when the NULL dereference happens).

                    Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists