lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20100820071051.GA5209@swordfish.minsk.epam.com>
Date:	Fri, 20 Aug 2010 10:10:51 +0300
From:	Sergey Senozhatsky <sergey.senozhatsky@...il.com>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
	Ingo Molnar <mingo@...e.hu>, "H. Peter Anvin" <hpa@...or.com>,
	linux-kernel@...r.kernel.org
Subject: Re: BUG: unable to handle kernel paging request at ffffffffffffffff

On (08/19/10 17:32), Andrew Morton wrote:
> > Hello,
> > 
> > On (08/18/10 12:06), Andrew Morton wrote:
> > > > Hello,
> > > > 
> > > > yet another trace:
> > > > 
> > > > [ 5845.374558] CPU 1 is now offline
> > > > [ 5845.376169] INFO: trying to register non-static key.
> > > > [ 5845.376251] the code is fine but needs lockdep annotation.
> > > > [ 5845.376327] turning off the locking correctness validator.
> > > > [ 5845.376405] Pid: 6754, comm: bash Not tainted 2.6.36-rc0-git12-07921-g60bf26a-dirty #122
> > > > [ 5845.376521] Call Trace:
> > > > [ 5845.376570]  [<ffffffff81063e89>] __lock_acquire+0x2d1/0x17fd
> > > > [ 5845.376657]  [<ffffffff81132b2a>] ? sysfs_deactivate+0x3e/0xec
> > > > [ 5845.376747]  [<ffffffff81062ddd>] ? mark_held_locks+0x50/0x72
> > > > [ 5845.376834]  [<ffffffff81065893>] lock_acquire+0x97/0xb6
> > > > [ 5845.376917]  [<ffffffff8137145b>] ? percpu_counter_hotcpu_callback+0x3e/0x93
> > > > [ 5845.377021]  [<ffffffff81374321>] ? mutex_lock_nested+0x2f3/0x31b
> > > > [ 5845.377113]  [<ffffffff81371446>] ? percpu_counter_hotcpu_callback+0x29/0x93
> > > > [ 5845.377218]  [<ffffffff8137568d>] _raw_spin_lock_irqsave+0x4e/0x60
> > > > [ 5845.377312]  [<ffffffff8137145b>] ? percpu_counter_hotcpu_callback+0x3e/0x93
> > > > [ 5845.377409]  [<ffffffff8137145b>] percpu_counter_hotcpu_callback+0x3e/0x93
> > > > [ 5845.377475]  [<ffffffff81057344>] notifier_call_chain+0x32/0x5e
> > > > [ 5845.377529]  [<ffffffff8105738f>] __raw_notifier_call_chain+0x9/0xb
> > > > [ 5845.377587]  [<ffffffff8103c6e3>] __cpu_notify+0x1b/0x2d
> > > > [ 5845.377638]  [<ffffffff8103c703>] cpu_notify+0xe/0x10
> > > > [ 5845.377684]  [<ffffffff8103c70e>] cpu_notify_nofail+0x9/0x11
> > > > [ 5845.377738]  [<ffffffff81362d82>] _cpu_down+0x151/0x206
> > > > [ 5845.377786]  [<ffffffff81362ea8>] cpu_down+0x28/0x35
> > > > [ 5845.377833]  [<ffffffff8136430d>] store_online+0x27/0x6e
> > > > [ 5845.377884]  [<ffffffff812923ab>] sysdev_store+0x1b/0x1d
> > > > [ 5845.377933]  [<ffffffff811321b2>] sysfs_write_file+0x103/0x13f
> > > > [ 5845.377990]  [<ffffffff810daf92>] vfs_write+0xb0/0x14f
> > > > [ 5845.378038]  [<ffffffff810db22e>] sys_write+0x45/0x6c
> > > > [ 5845.378088]  [<ffffffff81002002>] system_call_fastpath+0x16/0x1b
> > > > [ 5845.378166] BUG: unable to handle kernel paging request at ffffffffffffffff
> > > > [ 5845.378236] IP: [<ffffffff81371487>] percpu_counter_hotcpu_callback+0x6a/0x93
> > > 
> > > It appears that one of the counters on the global list has been
> > > trashed: lockdep doesn't recognise its spinlock and its internal
> > > pointers are all-ones.
> > > 
> > > We need to identify that counter and then go take a look at whichever
> > > subsystem ownes it.
> > > 
> > > A crude approach is:
> > > 
> > > --- a/lib/percpu_counter.c~a
> > > +++ a/lib/percpu_counter.c
> > > @@ -69,6 +69,8 @@ EXPORT_SYMBOL(__percpu_counter_sum);
> > >  int __percpu_counter_init(struct percpu_counter *fbc, s64 amount,
> > >  			  struct lock_class_key *key)
> > >  {
> > > +	printk("__percpu_counter_init(%p)\n", fbc);
> > > +	dump_stack();
> > >  	spin_lock_init(&fbc->lock);
> > >  	lockdep_set_class(&fbc->lock, key);
> > >  	fbc->count = amount;
> > > @@ -126,6 +128,7 @@ static int __cpuinit percpu_counter_hotc
> > >  		s32 *pcount;
> > >  		unsigned long flags;
> > >  
> > > +		printk("percpu_counter_hotcpu_callback(%p)\n", fbc);
> > >  		spin_lock_irqsave(&fbc->lock, flags);
> > >  		pcount = per_cpu_ptr(fbc->counters, cpu);
> > >  		fbc->count += *pcount;
> > > _
> > > 
> > > If you can please apply that patch and then make it crash?  We can use
> > > the address from the percpu_counter_hotcpu_callback() printk to look up
> > > the stack trace from __percpu_counter_init() which will lead us to the
> > > code which owns that counter.
> > > 
> > 
> > Sure, I'll try.
> 
> I suspect this was fixed by
> 
> commit 602586a83b719df0fbd94196a1359ed35aeb2df3
> Author:     Hugh Dickins <hughd@...gle.com>
> AuthorDate: Tue Aug 17 15:23:56 2010 -0700
> Commit:     Linus Torvalds <torvalds@...ux-foundation.org>
> CommitDate: Tue Aug 17 18:33:11 2010 -0700
> 
>     shmem: put_super must percpu_counter_destroy
> 

I'm not very lucky at reproducing crash at the moment.


	Sergey

Content of type "application/pgp-signature" skipped

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ