linux-kernel - Re: BUG: unable to handle kernel NULL pointer dereference

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <18076.42052.961808.528867@notabene.brown>
Date:	Tue, 17 Jul 2007 21:13:08 +1000
From:	Neil Brown <neilb@...e.de>
To:	David CHANIAL <david.ml@...o-web.fr>
Cc:	Linux Kernel Mailinglist <linux-kernel@...r.kernel.org>
Subject: Re: BUG: unable to handle kernel NULL pointer dereference - nfs v3

On Monday July 16, david.ml@...o-web.fr wrote:
> Hi,
> 
> I'm not sure is the good place to poste that, and if not - please excuse me.

This is the correct place to post this, thanks.

> 
> I was running nfs server v2 since a year on one server, there is few days, i 
> have update my kernel to 2.6.21.3 with support of nfsv3 server.
> 
> Somes times per days i have somes crash as below, needing i reboot the server 
> to nfs re-become up.
> 
> ************
> BUG: unable to handle kernel NULL pointer dereference at virtual address 
> 00000004
  ^^^^^^^^

This says that it tried to access memory at address '4'.  There is no
memory there, so it caused the BUG.

>  printing eip:
> c01e7279
> *pde = 09ecc001
> Oops: 0000 [#1]
> SMP
> CPU:    0
> EIP:    0060:[<c01e7279>]    Not tainted VLI
> EFLAGS: 00010246   (2.6.21.3-sdf88-core #9)
                              ^^^^^^^^^^^

What is "-sdf88-core" ?? Are there any extra patches that we should
know about?

> EIP is at encode_fsid+0x67/0x89

This is presumably where the illegal access happened.

> eax: e5bde8c0   ebx: f7593404   ecx: 00000000   edx: 00000006
> esi: dc569048   edi: f75934ec   ebp: f7593404   esp: f75f1f18

Memory accesses are (almost) always relative to the value in some
register.  Of these registers, the most likely is ecx, with edx a
vague possibility.

> Code: e2 08 09 d1 09 c1 eb 10 8b 83 88 00 00 00 8b 40 30 89 c3 89 c1 c1 fb 1f 
> 89 d8 0f c8 89 06 89 c8 eb 1e 

Unfortunately "ksymoops" does seem to decode this into something quite
useful enough.  Normally one of the numbers has <> around it.  Are you
should you copied the number across exactly?

This code decodes as:
   0:   e2 08                     loop   a <_EIP+0xa>
   2:   09 d1                     or     %edx,%ecx
   4:   09 c1                     or     %eax,%ecx
   6:   eb 10                     jmp    18 <_EIP+0x18>
   8:   8b 83 88 00 00 00         mov    0x88(%ebx),%eax
   e:   8b 40 30                  mov    0x30(%eax),%eax
  11:   89 c3                     mov    %eax,%ebx
 ....

 From the 'jmp' onwards, that is what I would expect to see in
 encode_fsid.  The code before there doesn't make a lot of sense, so
 it is hard to pinpoint exactly there the error is.

 In any case, there is no place in encode_fsid where an offset of 4
 from any register is indexed, nor an offset of -2.
 So either there is something wrong with the decoding and displaying
 of this information, or there is something very wrong with your
 hardware.

 I would suggest:
   1/ if possible, run memtest86 on the machine for a while, to make
      sure there isn't a problem with the memory.
   2/ If the problem happens again, post another report with all the
      "oops" information again.  Maybe the next time it will be slightly
      different and will make more sense in some way.

NeilBrown

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/