lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 23 Dec 2016 07:42:40 +1100
From:   Dave Chinner <david@...morbit.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Peter Anvin <hpa@...or.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        the arch/x86 maintainers <x86@...nel.org>
Subject: Re: [4.10, panic, regression] iscsi: null pointer deref at
 iscsi_tcp_segment_done+0x20d/0x2e0

On Thu, Dec 22, 2016 at 09:24:12AM -0800, Linus Torvalds wrote:
> On Wed, Dec 21, 2016 at 10:28 PM, Dave Chinner <david@...morbit.com> wrote:
> >
> > This sort of thing is normally indicative of a memory reclaim or
> > lock contention problem. Profile showed unusual spinlock contention,
> > but then I realised there was only one kswapd thread running.
> > Yup, sure enough, it's caused by a major change in memory reclaim
> > behaviour:
> >
> > [    0.000000] Zone ranges:
> > [    0.000000]   DMA      [mem 0x0000000000001000-0x0000000000ffffff]
> > [    0.000000]   DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
> > [    0.000000]   Normal   [mem 0x0000000100000000-0x000000083fffffff]
> > [    0.000000] Movable zone start for each node
> > [    0.000000] Early memory node ranges
> > [    0.000000]   node   0: [mem 0x0000000000001000-0x000000000009efff]
> > [    0.000000]   node   0: [mem 0x0000000000100000-0x00000000bffdefff]
> > [    0.000000]   node   0: [mem 0x0000000100000000-0x00000003bfffffff]
> > [    0.000000]   node   0: [mem 0x00000005c0000000-0x00000005ffffffff]
> > [    0.000000]   node   0: [mem 0x0000000800000000-0x000000083fffffff]
> > [    0.000000] Initmem setup node 0 [mem 0x0000000000001000-0x000000083fffffff]
> >
> > the numa=fake=4 CLI option is broken.
> 
> Ok, I think that is independent of anything else. Removing block
> people and adding the x86 people.
> 
> I'm not seeing anything at all that would change the fake numa stuff,
> but maybe the cpu hotplug changes?
> 
> Thomas/Ingo/Peter - Dave is going away for several months, so you
> won't get feedback from him, but can you look at this? Or maybe point
> me towards the right people - I'm seeing no possible relevant changes
> at all fir x85 numa since 4.9, so it must be some indirect breakage.
> 
> Dave is using fake-numa to do performance testing in a VM, and it's a
> big deal for the node optimizations for writeback etc. Do you have any
> ideas?
> 
> Dave, if you're still around, can you send out the kernel config file
> you used...

Looking at this fresh this morning (i.e. not pissed off by having
everything I tried to do fail in different ways all afternoon) I
found this:

$ grep NUMA .config
CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
# CONFIG_NUMA is not set
$

The .config I was using for 4.9 got 'make oldconfig' upgraded, and
looking at it there's a bunch of stuff that has been turned off that
I know was set:

# CONFIG_EXPERT is not set
# CONFIG_PARAVIRT_SPINLOCKS is not set
# CONFIG_COMPACTION is not set

and stuff I never use so don't set was set, like kernel crash dump,
a bunch of stuff for AMD CPUs, susp/resume and power management
debug, every partition type and filesystem under the sun was
selected, heaps of network devices enabled, etc.

So it looks like the problem has occurred during oldconfig, meaning
I have no idea exactly WTF I was testing. Rebuilding now with a
saner config, see what happens.

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ