linux-kernel - Re: [patch 2/2] Documentation/process: Add tip tree handbook

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20181108071928.GB20032@gmail.com>
Date:   Thu, 8 Nov 2018 08:19:28 +0100
From:   Ingo Molnar <mingo@...nel.org>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     LKML <linux-kernel@...r.kernel.org>, x86@...nel.org,
        Peter Zijlstra <peterz@...radead.org>,
        Paul McKenney <paulmck@...ux.vnet.ibm.com>,
        John Stultz <john.stultz@...aro.org>,
        Arnaldo Carvalho de Melo <acme@...hat.com>,
        Frederic Weisbecker <frederic@...nel.org>,
        Jonathan Corbet <corbet@....net>,
        Andy Lutomirski <luto@...nel.org>,
        Marc Zyngier <marc.zyngier@....com>,
        Daniel Lezcano <daniel.lezcano@...aro.org>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Ard Biesheuvel <ard.biesheuvel@...aro.org>,
        Will Deacon <will.deacon@....com>,
        Mark Brown <broonie@...nel.org>,
        Dan Williams <dan.j.williams@...el.com>
Subject: Re: [patch 2/2] Documentation/process: Add tip tree handbook


* Thomas Gleixner <tglx@...utronix.de> wrote:

> +Backtraces in changelogs
> +^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +Backtraces can be useful to document the call chain which led to a
> +problem. Though not all back traces are really valuable because the call
> +chain is unique and obvious, e.g. in early boot code. Just copying the full
> +dmesg output is adding a lot of distracting information like timestamps,
> +module lists, register and stack dumps etc.
> +
> +Reducing the backtrace to the real relevant data helps to concentrate on
> +the issue and not being distracted by destilling the relevant information
> +out of the dump. Here is an example of a well trimmed backtrace::
> +
> +  unchecked MSR access error: WRMSR to 0xd51 (tried to write 0x0000
> +  000000000064) at rIP: 0xffffffffae059994 (native_write_msr+0x4/0x20)
> +  Call Trace:
> +  mba_wrmsr+0x41/0x80
> +  update_domains+0x125/0x130
> +  rdtgroup_mkdir+0x270/0x500

Yeah, so I frequently simplify such backtraces even more, i.e.:

> +  unchecked MSR access error: WRMSR to 0xd51 (tried to write 0x0000 000000000064) at rIP: 0xffffffffae059994 (native_write_msr())
> +
> +  Call Trace:
> +    mba_wrmsr()
> +    update_domains()
> +    rdtgroup_mkdir()

Note how the actual MSR contents and the attempted operation's parameters 
are important, the actual hexadecimal offsets of the function call 
backtrace are not. They are useful when trying to do fuzzy version 
matching and in the occasional case when there's a question about which 
exact call chain it is - but those are 0.01% cases really.

See for example this recent commit:

 commit e4a02ed2aaf447fa849e3254bfdb3b9b01e1e520 (origin/locking-urgent-for-linus, locking-urgent-for-linus)
 Author: Guenter Roeck <linux@...ck-us.net>
 Date:   Tue Oct 2 14:48:49 2018 -0700

    locking/ww_mutex: Fix runtime warning in the WW mutex selftest
    
    If CONFIG_WW_MUTEX_SELFTEST=y is enabled, booting an image
    in an arm64 virtual machine results in the following
    traceback if 8 CPUs are enabled:
    
      DEBUG_LOCKS_WARN_ON(__owner_task(owner) != current)
      WARNING: CPU: 2 PID: 537 at kernel/locking/mutex.c:1033 __mutex_unlock_slowpath+0x1a8/0x2e0
      ...
      Call trace:
       __mutex_unlock_slowpath()
       ww_mutex_unlock()
       test_cycle_work()
       process_one_work()
       worker_thread()
       kthread()
       ret_from_fork()
    
    If requesting b_mutex fails with -EDEADLK, the error variable
    is reassigned to the return value from calling ww_mutex_lock
    on a_mutex again. If this call fails, a_mutex is not locked.
    It is, however, unconditionally unlocked subsequently, causing
    the reported warning. Fix the problem by using two error variables.
    
    With this change, the selftest still fails as follows:
    
      cyclic deadlock not resolved, ret[7/8] = -35
    
    However, the traceback is gone.

The C-style writing of the backtrace is more readable than listing the 
offsets.

Thanks,

	Ingo