linux-kernel - Re: 2.6.18-rt1

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20061013212450.GC7477@in.ibm.com>
Date:	Sat, 14 Oct 2006 02:54:50 +0530
From:	Dipankar Sarma <dipankar@...ibm.com>
To:	Karsten Wiese <annabellesgarden@...oo.de>
Cc:	Lee Revell <rlrevell@...-job.com>, Ingo Molnar <mingo@...e.hu>,
	linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
	John Stultz <johnstul@...ibm.com>,
	"Paul E. McKenney" <paulmck@...ibm.com>,
	Arjan van de Ven <arjan@...radead.org>
Subject: Re: 2.6.18-rt1

On Fri, Oct 13, 2006 at 11:18:01PM +0200, Karsten Wiese wrote:
> Am Samstag, 30. September 2006 20:18 schrieb Dipankar Sarma:
> > On Sat, Sep 30, 2006 at 02:06:04PM -0400, Lee Revell wrote:
> > > On Wed, 2006-09-20 at 16:19 +0200, Ingo Molnar wrote:
> > > > I'm pleased to announce the 2.6.18-rt1 tree, which can be downloaded 
> > > > from the usual place:
> > > > 
> > > >    http://redhat.com/~mingo/realtime-preempt/
> > > 
> > > I got this Oops with -rt3, looks RCU related.  Apologies in advance if
> > > it's already known.
> > > 
> > > Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: 
> > >  [<ffffffff802aafa7>] __rcu_read_unlock+0x2e/0x82
> > > PGD 46a3067 PUD 4e27067 PMD 0 
> > > Oops: 0002 [1] PREEMPT SMP 
> > > CPU 1 
> > 
> > I see a very similar crash while running rcutorture on 2.6.18-mm1 and
> > my rcu patchset that has rcupreempt stuff rom -rt. I don't see this
> > 
> 
> Bug just happened here on a tainted UP x86_64 running rt4.
> IIRC this is the second time in 2 weeks or so.
> Machine seams to be fine still after the oops...
> 
> <Oops>
> Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP:
>  [<ffffffff802a1b21>] __rcu_read_unlock+0x2e/0x80
> PGD 3b616067 PUD 1718b067 PMD 0
> Oops: 0002 [1] PREEMPT
> CPU 0
> Modules linked in: autofs4 sunrpc video button ac lp parport_pc parport nvram snd_via82xx gameport snd_ac97_codec snd_ac97_bus snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq nvidia snd_pcm_oss snd_mixer_oss snd_pcm ehci_hcd uhci_hcd snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi pcspkr snd_seq_device snd i2c_viapro i2c_core r8169 soundcore ext3 jbd
> Pid: 7102, comm: sh Tainted: P      2.6.18-rt4 #4
> RIP: 0010:[<ffffffff802a1b21>]  [<ffffffff802a1b21>] __rcu_read_unlock+0x2e/0x80


Sorry, I should have published my investigations long ago. I tracked
this down (atleast the crash in my machine) to NMI interference
with rcu_read_lock()/rcu_read_unlock(). We use those APIs
from NMI context as well 
(default_do_nmi()->notify_die()->atomic_notifier_call_chain()).

Can you try with nmi_watchdog=0 in the kernel command line ?

Paul has an NMI-safe patch for rcupreempt which I am adopting
and testing at the moment. If this works well, I will publish
a new patchset.

Thanks
Dipankar
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/