linux-kernel - Re: [PATCH 06/11] docs: Get rid of the "bug-hunting" guide

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 26 Oct 2016 22:23:14 -0200
From:   Mauro Carvalho Chehab <mchehab@...pensource.com>
To:     Jonathan Corbet <corbet@....net>
Cc:     linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
        Jani Nikula <jani.nikula@...el.com>
Subject: Re: [PATCH 06/11] docs: Get rid of the "bug-hunting" guide

Em Wed, 26 Oct 2016 17:19:34 -0600
Jonathan Corbet <corbet@....net> escreveu:

> Larry McVoy's advice on how to manually bisect 1.3.x kernel bugs is of
> historical interest, but that's what the repository is for.  It is not
> useful to users now.

In the specific case of this file, I think that the information there
about how to disassemble a file and how to use gdb and objdump to get the
error line associated with an OOPS very useful.

So, I prefer to keep this one. See the patch I sent you today.

> Signed-off-by: Jonathan Corbet <corbet@....net>
> ---
>  Documentation/admin-guide/bug-hunting.rst | 249 ------------------------------
>  Documentation/admin-guide/index.rst       |   1 -
>  2 files changed, 250 deletions(-)
>  delete mode 100644 Documentation/admin-guide/bug-hunting.rst
> 
> diff --git a/Documentation/admin-guide/bug-hunting.rst b/Documentation/admin-guide/bug-hunting.rst
> deleted file mode 100644
> index d35dd9fd1af0..000000000000
> --- a/Documentation/admin-guide/bug-hunting.rst
> +++ /dev/null
> @@ -1,249 +0,0 @@
> -Bug hunting
> -+++++++++++
> -
> -Last updated: 20 December 2005
> -
> -Introduction
> -============
> -
> -Always try the latest kernel from kernel.org and build from source. If you are
> -not confident in doing that please report the bug to your distribution vendor
> -instead of to a kernel developer.
> -
> -Finding bugs is not always easy. Have a go though. If you can't find it don't
> -give up. Report as much as you have found to the relevant maintainer. See
> -MAINTAINERS for who that is for the subsystem you have worked on.
> -
> -Before you submit a bug report read
> -:ref:`Documentation/admin-guide/reporting-bugs.rst <reportingbugs>`.
> -
> -Devices not appearing
> -=====================
> -
> -Often this is caused by udev. Check that first before blaming it on the
> -kernel.
> -
> -Finding patch that caused a bug
> -===============================
> -
> -
> -
> -Finding using ``git-bisect``
> -----------------------------
> -
> -Using the provided tools with ``git`` makes finding bugs easy provided the bug
> -is reproducible.
> -
> -Steps to do it:
> -
> -- start using git for the kernel source
> -- read the man page for ``git-bisect``
> -- have fun
> -
> -Finding it the old way
> -----------------------
> -
> -[Sat Mar  2 10:32:33 PST 1996 KERNEL_BUG-HOWTO lm@....com (Larry McVoy)]
> -
> -This is how to track down a bug if you know nothing about kernel hacking.
> -It's a brute force approach but it works pretty well.
> -
> -You need:
> -
> -        - A reproducible bug - it has to happen predictably (sorry)
> -        - All the kernel tar files from a revision that worked to the
> -          revision that doesn't
> -
> -You will then do:
> -
> -        - Rebuild a revision that you believe works, install, and verify that.
> -        - Do a binary search over the kernels to figure out which one
> -          introduced the bug.  I.e., suppose 1.3.28 didn't have the bug, but
> -          you know that 1.3.69 does.  Pick a kernel in the middle and build
> -          that, like 1.3.50.  Build & test; if it works, pick the mid point
> -          between .50 and .69, else the mid point between .28 and .50.
> -        - You'll narrow it down to the kernel that introduced the bug.  You
> -          can probably do better than this but it gets tricky.
> -
> -        - Narrow it down to a subdirectory
> -
> -          - Copy kernel that works into "test".  Let's say that 3.62 works,
> -            but 3.63 doesn't.  So you diff -r those two kernels and come
> -            up with a list of directories that changed.  For each of those
> -            directories:
> -
> -                Copy the non-working directory next to the working directory
> -                as "dir.63".
> -                One directory at time, try moving the working directory to
> -                "dir.62" and mv dir.63 dir"time, try::
> -
> -                        mv dir dir.62
> -                        mv dir.63 dir
> -                        find dir -name '*.[oa]' -print | xargs rm -f
> -
> -                And then rebuild and retest.  Assuming that all related
> -                changes were contained in the sub directory, this should
> -                isolate the change to a directory.
> -
> -                Problems: changes in header files may have occurred; I've
> -                found in my case that they were self explanatory - you may
> -                or may not want to give up when that happens.
> -
> -        - Narrow it down to a file
> -
> -          - You can apply the same technique to each file in the directory,
> -            hoping that the changes in that file are self contained.
> -
> -        - Narrow it down to a routine
> -
> -          - You can take the old file and the new file and manually create
> -            a merged file that has::
> -
> -                #ifdef VER62
> -                routine()
> -                {
> -                        ...
> -                }
> -                #else
> -                routine()
> -                {
> -                        ...
> -                }
> -                #endif
> -
> -            And then walk through that file, one routine at a time and
> -            prefix it with::
> -
> -                #define VER62
> -                /* both routines here */
> -                #undef VER62
> -
> -            Then recompile, retest, move the ifdefs until you find the one
> -            that makes the difference.
> -
> -Finally, you take all the info that you have, kernel revisions, bug
> -description, the extent to which you have narrowed it down, and pass
> -that off to whomever you believe is the maintainer of that section.
> -A post to linux.dev.kernel isn't such a bad idea if you've done some
> -work to narrow it down.
> -
> -If you get it down to a routine, you'll probably get a fix in 24 hours.
> -
> -My apologies to Linus and the other kernel hackers for describing this
> -brute force approach, it's hardly what a kernel hacker would do.  However,
> -it does work and it lets non-hackers help fix bugs.  And it is cool
> -because Linux snapshots will let you do this - something that you can't
> -do with vendor supplied releases.
> -
> -Fixing the bug
> -==============
> -
> -Nobody is going to tell you how to fix bugs. Seriously. You need to work it
> -out. But below are some hints on how to use the tools.
> -
> -To debug a kernel, use objdump and look for the hex offset from the crash
> -output to find the valid line of code/assembler. Without debug symbols, you
> -will see the assembler code for the routine shown, but if your kernel has
> -debug symbols the C code will also be available. (Debug symbols can be enabled
> -in the kernel hacking menu of the menu configuration.) For example::
> -
> -    objdump -r -S -l --disassemble net/dccp/ipv4.o
> -
> -.. note::
> -
> -   You need to be at the top level of the kernel tree for this to pick up
> -   your C files.
> -
> -If you don't have access to the code you can also debug on some crash dumps
> -e.g. crash dump output as shown by Dave Miller::
> -
> -     EIP is at ip_queue_xmit+0x14/0x4c0
> -      ...
> -     Code: 44 24 04 e8 6f 05 00 00 e9 e8 fe ff ff 8d 76 00 8d bc 27 00 00
> -     00 00 55 57  56 53 81 ec bc 00 00 00 8b ac 24 d0 00 00 00 8b 5d 08
> -     <8b> 83 3c 01 00 00 89 44  24 14 8b 45 28 85 c0 89 44 24 18 0f 85
> -
> -     Put the bytes into a "foo.s" file like this:
> -
> -            .text
> -            .globl foo
> -     foo:
> -            .byte  .... /* bytes from Code: part of OOPS dump */
> -
> -     Compile it with "gcc -c -o foo.o foo.s" then look at the output of
> -     "objdump --disassemble foo.o".
> -
> -     Output:
> -
> -     ip_queue_xmit:
> -         push       %ebp
> -         push       %edi
> -         push       %esi
> -         push       %ebx
> -         sub        $0xbc, %esp
> -         mov        0xd0(%esp), %ebp        ! %ebp = arg0 (skb)
> -         mov        0x8(%ebp), %ebx         ! %ebx = skb->sk
> -         mov        0x13c(%ebx), %eax       ! %eax = inet_sk(sk)->opt
> -
> -In addition, you can use GDB to figure out the exact file and line
> -number of the OOPS from the ``vmlinux`` file. If you have
> -``CONFIG_DEBUG_INFO`` enabled, you can simply copy the EIP value from the
> -OOPS::
> -
> - EIP:    0060:[<c021e50e>]    Not tainted VLI
> -
> -And use GDB to translate that to human-readable form::
> -
> -  gdb vmlinux
> -  (gdb) l *0xc021e50e
> -
> -If you don't have ``CONFIG_DEBUG_INFO`` enabled, you use the function
> -offset from the OOPS::
> -
> - EIP is at vt_ioctl+0xda8/0x1482
> -
> -And recompile the kernel with ``CONFIG_DEBUG_INFO`` enabled::
> -
> -  make vmlinux
> -  gdb vmlinux
> -  (gdb) p vt_ioctl
> -  (gdb) l *(0x<address of vt_ioctl> + 0xda8)
> -
> -or, as one command::
> -
> -  (gdb) l *(vt_ioctl + 0xda8)
> -
> -If you have a call trace, such as::
> -
> -     Call Trace:
> -      [<ffffffff8802c8e9>] :jbd:log_wait_commit+0xa3/0xf5
> -      [<ffffffff810482d9>] autoremove_wake_function+0x0/0x2e
> -      [<ffffffff8802770b>] :jbd:journal_stop+0x1be/0x1ee
> -      ...
> -
> -this shows the problem in the :jbd: module. You can load that module in gdb
> -and list the relevant code::
> -
> -  gdb fs/jbd/jbd.ko
> -  (gdb) p log_wait_commit
> -  (gdb) l *(0x<address> + 0xa3)
> -
> -or::
> -
> -  (gdb) l *(log_wait_commit + 0xa3)
> -
> -
> -Another very useful option of the Kernel Hacking section in menuconfig is
> -Debug memory allocations. This will help you see whether data has been
> -initialised and not set before use etc. To see the values that get assigned
> -with this look at ``mm/slab.c`` and search for ``POISON_INUSE``. When using
> -this an Oops will often show the poisoned data instead of zero which is the
> -default.
> -
> -Once you have worked out a fix please submit it upstream. After all open
> -source is about sharing what you do and don't you want to be recognised for
> -your genius?
> -
> -Please do read
> -ref:`Documentation/process/submitting-patches.rst <submittingpatches>` though
> -to help your code get accepted.
> diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst
> index 2872c0c70ea4..2d0a302e8773 100644
> --- a/Documentation/admin-guide/index.rst
> +++ b/Documentation/admin-guide/index.rst
> @@ -25,7 +25,6 @@ problems and bugs in particular.
>     
>     reporting-bugs
>     security-bugs
> -   bug-hunting
>     oops-tracing
>     ramoops
>     dynamic-debug-howto



Thanks,
Mauro