lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090917181831.GA714@csn.ul.ie>
Date:	Thu, 17 Sep 2009 19:18:32 +0100
From:	Mel Gorman <mel@....ul.ie>
To:	Nick Piggin <npiggin@...e.de>
Cc:	Pekka Enberg <penberg@...helsinki.fi>,
	linux-kernel@...r.kernel.org, akpm@...ux-foundation.org,
	cl@...ux-foundation.org, heiko.carstens@...ibm.com, mingo@...e.hu,
	sachinp@...ibm.com
Subject: Re: [RFC/PATCH] SLQB: Mark the allocator as broken PowerPC and S390

On Thu, Sep 17, 2009 at 01:41:16PM +0200, Nick Piggin wrote:
> On Thu, Sep 17, 2009 at 12:18:28PM +0100, Mel Gorman wrote:
> > On Thu, Sep 17, 2009 at 02:13:39PM +0300, Pekka Enberg wrote:
> > > On Thu, 2009-09-17 at 11:08 +0100, Mel Gorman wrote:
> > > > > > The danger is if SLQB is being silently disabled, it'll never be noticed
> > > > > > or debugged :/
> > > > > 
> > > > > Maybe, but that's not an excuse to push something that's known to break. 
> > > 
> > > On Thu, 2009-09-17 at 11:57 +0100, Mel Gorman wrote:
> > > > Wow, this is from back in May! Lame.
> > > 
> > > Heh, my (lame) excuse is lack of relevant hardware.... ;-)
> > > 
> > 
> > I'm not blaming you. It's just ... unfortunate :/
> 
> Ahh... it's pretty lame of me. Sachin has been a willing tester :(
> I have spent quite a few hours looking at it but I never found
> many good leads. Much appreciated if you can make more progress on
> it.

Nothing much so far. I've reproduced the problem based on 2.6.31 and slqb-core
from Pekka's tree but not a whole pile else. I don't know SLQB at all so the
investigation is fuzzy. It appears to initialise SLQB ok but crashes later when
setting up SCSI. Not 100% sure what the triggering event is but it might be
userspace starting up and other CPUs get involved, possibly corrupting lists.

This machine has two CPUs (0, 1) and two nodes with actual memory (2,3).
After applying a patch to kmem_cache_create, I see in the console

MEL::Creating cache pgd_cache CPU 0 Node 0
MEL::Creating cache pmd_cache CPU 0 Node 0
MEL::Creating cache pid_namespace CPU 0 Node 0
MEL::Creating cache shmem_inode_cache CPU 0 Node 0
MEL::Creating cache scsi_data_buffer CPU 1 Node 0

It crashes at this point during creation before the struct kmem_cache has
been allocated from kmem_cache_cache. Note it's kmem_cache_cache we are
failing to allocate from, not scsi_data_buffer.

I have no theories yet but will stick with it. Any suggestions on where
to investigate are welcome. Will pick this up again tomorrow.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ