lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 06 Jun 2008 06:15:13 -0700
From:	Mike Travis <travis@....com>
To:	Jeremy Fitzhardinge <jeremy@...p.org>
CC:	Ingo Molnar <mingo@...e.hu>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Christoph Lameter <clameter@....com>,
	David Miller <davem@...emloft.net>,
	Eric Dumazet <dada1@...mosbay.com>,
	linux-kernel@...r.kernel.org,
	the arch/x86 maintainers <x86@...nel.org>
Subject: Re: [crash, bisected] Re: [PATCH 3/4] x86_64: Fold pda into per cpu
 area

Jeremy Fitzhardinge wrote:
> Mike Travis wrote:
>> Ingo Molnar wrote:
>>  
>>> * Mike Travis <travis@....com> wrote:
>>>
>>>    
>>>>   * Declare the pda as a per cpu variable.
>>>>
>>>>   * Make the x86_64 per cpu area start at zero.
>>>>
>>>>   * Since the pda is now the first element of the per_cpu area,
>>>> cpu_pda()
>>>>     is no longer needed and per_cpu() can be used instead.  This
>>>> also makes
>>>>     the _cpu_pda[] table obsolete.
>>>>
>>>>   * Since %gs is pointing to the pda, it will then also point to the
>>>> per cpu
>>>>     variables and can be accessed thusly:
>>>>
>>>>     %gs:[&per_cpu_xxxx - __per_cpu_start]
>>>>
>>>> Based on linux-2.6.tip
>>>>       
>>> -tip testing found an instantaneous reboot crash on 64-bit x86, with
>>> this config:
>>>
>>>   http://redhat.com/~mingo/misc/config-Thu_Jun__5_11_43_51_CEST_2008.bad
>>>
>>> there is no boot log as the instantaneous reboot happens before
>>> anything is printed to the (early-) serial console. I have bisected
>>> it down to:
>>>
>>> | 7670dc09e89a2b151a1cf49eccebc07c41c2ce9f is first bad commit
>>> | commit 7670dc09e89a2b151a1cf49eccebc07c41c2ce9f
>>> | Author: Mike Travis <travis@....com>
>>> | Date:   Tue Jun 3 17:30:21 2008 -0700
>>> |
>>> |     x86_64: Fold pda into per cpu area
>>>
>>> the big problem is not just this crash, but that the patch is _way_
>>> too big:
>>>
>>>  arch/x86/Kconfig                 |    3 +
>>>  arch/x86/kernel/head64.c         |   34 ++++++--------
>>>  arch/x86/kernel/irq_64.c         |   36 ++++++++-------
>>>  arch/x86/kernel/setup.c          |   90
>>> ++++++++++++---------------------------
>>>  arch/x86/kernel/setup64.c        |    5 --
>>>  arch/x86/kernel/smpboot.c        |   51 ----------------------
>>>  arch/x86/kernel/traps_64.c       |   11 +++-
>>>  arch/x86/kernel/vmlinux_64.lds.S |    1
>>>  include/asm-x86/percpu.h         |   48 ++++++--------------
>>>  9 files changed, 89 insertions(+), 190 deletions(-)
>>>
>>> considering the danger involved, this is just way too large, and
>>> there's no reasonable debugging i can do in the bisection to narrow
>>> it down any further.
>>>
>>> Please resubmit with the bug fixed and with a proper splitup, the
>>> more patches you manage to create, the better. For a dangerous code
>>> area like this, with a track record of frequent breakages in the
>>> past, i would not mind a "one line of code changed per patch" splitup
>>> either. (Feel free to send a git tree link for us to try as well.)
>>>
>>>     Ingo
>>>     
>>
>> Thanks for the feedback Ingo.  I'll test the above config and look at
>> splitting up the patch.  The difficulty is making each patch
>> independently
>> compilable and testable.
> 
> FWIW, I'm getting past the "crashes very, very early" stage with this
> series applied when booting under Xen.  Then it crashes pretty early,
> but that's not your fault...
> 
>    J

Hi Jeremy,

Yes we have a simulator for Nahelem that also breezes past the boot up
problem (actually makes it to the kernel login prompt.)  Weirdly, the
problem doesn't exist in an earlier code base so my changes are tickling
something else newly introduced.  I'm attempting to see if I can use
GRUB 2 with the GDB stubs to track it down (which is time consuming in
itself to setup.)

It is definitely related to basing percpu variable offsets from %gs and
(I think) interrupts.

Thanks,
Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ