lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110925130622.GA9205@somewhere.redhat.com>
Date:	Sun, 25 Sep 2011 15:06:25 +0200
From:	Frederic Weisbecker <fweisbec@...il.com>
To:	"Kirill A. Shutemov" <kirill@...temov.name>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Cc:	linux-kernel@...r.kernel.org, Dipankar Sarma <dipankar@...ibm.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...e.hu>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Lai Jiangshan <laijs@...fujitsu.com>
Subject: Re: linux-next-20110923: warning kernel/rcutree.c:1833

On Sun, Sep 25, 2011 at 02:26:37PM +0300, Kirill A. Shutemov wrote:
> On Sat, Sep 24, 2011 at 10:08:26PM -0700, Paul E. McKenney wrote:
> > On Sun, Sep 25, 2011 at 03:24:09AM +0300, Kirill A. Shutemov wrote:
> > > [   29.974288] ------------[ cut here ]------------
> > > [   29.974308] WARNING: at /home/kas/git/public/linux-next/kernel/rcutree.c:1833 rcu_needs_cpu+0xff
> > > [   29.974316] Hardware name: HP EliteBook 8440p
> > > [   29.974321] Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iple_mangle xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc rfcomm bnep acpi_cpufreq mperfckd fscache auth_rpcgss nfs_acl sunrpc ext2 loop kvm_intel kvm snd_hda_codec_hdmi snd_hda_codec_idtideodev media v4l2_compat_ioctl32 snd_seq bluetooth drm_kms_helper snd_timer tpm_infineon snd_seq_drt tpm_tis hp_accel intel_ips soundcore lis3lv02d tpm rfkill i2c_algo_bit snd_page_alloc i2c_core c16 sha256_generic aesni_intel cryptd aes_x86_64 aes_generic cbc dm_crypt dm_mod sg sr_mod sd_mod cd thermal_sys [last unloaded: scsi_wait_scan]
> > > [   29.974517] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-rc7-next-20110923 #2
> > > [   29.974521] Call Trace:
> > > [   29.974525]  <IRQ>  [<ffffffff8104d72a>] warn_slowpath_common+0x7a/0xb0
> > > [   29.974540]  [<ffffffff8104d775>] warn_slowpath_null+0x15/0x20
> > > [   29.974546]  [<ffffffff810bffdf>] rcu_needs_cpu+0xff/0x110
> > > [   29.974555]  [<ffffffff8108396f>] tick_nohz_stop_sched_tick+0x13f/0x3d0
> > > [   29.974563]  [<ffffffff814329c0>] ? notifier_call_chain+0x70/0x70
> > > [   29.974571]  [<ffffffff81055622>] irq_exit+0xa2/0xd0
> > > [   29.974578]  [<ffffffff8101ee75>] smp_apic_timer_interrupt+0x85/0x1c0
> > > [   29.974585]  [<ffffffff814329c0>] ? notifier_call_chain+0x70/0x70
> > > [   29.974592]  [<ffffffff81436e1e>] apic_timer_interrupt+0x6e/0x80
> > > [   29.974596]  <EOI>  [<ffffffff81297abd>] ? acpi_hw_read+0x4a/0x51
> > > [   29.974609]  [<ffffffff81087a07>] ? lock_acquire+0xa7/0x160
> > > [   29.974615]  [<ffffffff814329c0>] ? notifier_call_chain+0x70/0x70
> > > [   29.974622]  [<ffffffff81432a16>] __atomic_notifier_call_chain+0x56/0xb0
> > > [   29.974631]  [<ffffffff814329c0>] ? notifier_call_chain+0x70/0x70
> > > [   29.974642]  [<ffffffff8130ebb6>] ? cpuidle_idle_call+0x106/0x350
> > > [   29.974651]  [<ffffffff81432a81>] atomic_notifier_call_chain+0x11/0x20
> > > [   29.974661]  [<ffffffff81001233>] cpu_idle+0xe3/0x120
> > > [   29.974672]  [<ffffffff8141e34b>] start_secondary+0x1fd/0x204
> > > [   29.974681] ---[ end trace 6c1d44095a3bb7c5 ]---
> > 
> > Do the following help?
> > 
> > 	https://lkml.org/lkml/2011/9/17/47
> > 	https://lkml.org/lkml/2011/9/17/45
> > 	https://lkml.org/lkml/2011/9/17/43
> 
> Yes. Thanks.

I believe that doesn't really fix the issue. But the warning is not
easy to trigger. You simply haven't hit it by chance after applying
the patches.

This happens when the idle notifier callchain is called in idle
and is interrupted in the middle. So we have called rcu_read_lock()
but haven't yet released with rcu_read_unlock(), and in the end
of the interrupt we call tick_nohz_stop_sched_tick() -> rcu_needs_cpu()
which is illegal while in an rcu read side critical section.

No idea how to solve that. Any use of RCU after the tick gets stopped
is concerned here. If it is really required that rcu_needs_cpu() can't
be called in an rcu read side critical sectionn then it's not going
to be easy to fix.

But I don't really understand that requirement. rcu_needs_cpu() simply
checks if we don't have callbacks to handle. So I don't understand how
read side is concerned. It's rather the write side.
The rule I can imagine instead is: don't call __call_rcu() once the tick is
stopped.

But I'm certainly missing something.

Paul?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ