lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-Id: <1218267265.29098.48.camel@lappy.programming.kicks-ass.net>
Date:	Sat, 09 Aug 2008 09:34:25 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	David Witbrodt <dawitbro@...global.net>
Cc:	linux-kernel@...r.kernel.org, Yinghai Lu <yhlu.kernel@...il.com>,
	Ingo Molnar <mingo@...e.hu>,
	Thomas Gleixner <tglx@...utronix.de>,
	"H. Peter Anvin" <hpa@...or.com>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	netdev <netdev@...r.kernel.org>
Subject: Re: HPET regression in 2.6.26 versus 2.6.25 -- RCU problem

On Fri, 2008-08-08 at 18:23 -0700, David Witbrodt wrote:
> I have tracked the regression down to an RCU problem.
> 
> I added some printk()'s to the function inet_register_protosw() in
> net/ipv4/af_inet.c, as seen in this diff:
> 
> ===== BEGIN DIFF ==========
>       * non-permanent entry.  This means that when we remove this entry, the
>       * system automatically returns to the old behavior.
>       */
> +    printk ("     Adding new protocol\n");
>      list_add_rcu(&p->list, last_perm);
> +
>  out:
> +    printk ("     Unlocking spinlock\n");
>      spin_unlock_bh(&inetsw_lock);
>  
> +    printk ("     Calling synchronize_net()\n");
>      synchronize_net();
>  
>      return;
> ===== END DIFF ==========
> 
> A kernel built with these changes freezes with "Calling synchronize_net()"
> as the last printed line.
> 
> I located the function synchronize_net() in net/core/dev.c, and it was easy
> to add some printk()'s there:
> 
> ===== BEGIN DIFF ==========
>  
> void synchronize_net(void)
>  {
> +    printk ("   synchronize_net(): calling might_sleep()\n");
>      might_sleep();
> +
> +    printk ("   synchronize_net(): calling synchronize_rcu()\n");
>      synchronize_rcu();
>  }
> ===== END DIFF ==========
> 
> The kernel built with these changes froze with "synchronize_net(): 
> calling synchronize_rcu()" as the last line on the screen.
> 
> After reading some documentation in Documentation/RCU/, it looks like 
> something is misusing RCU -- and, according to the Documentation, those kinds 
> of mistakes are easy to make.  Maybe necessary calls to
>  
>     rcu_read_lock()
>     rcu_read_unlock()
> 
> are missing, and something about my hardware is triggering a freeze that 
> doesn't occur on most hardware.
> 
> 
> For some reason, turning off the HPET by booting with "hpet=disabled" keeps
> the freeze from happening.  Just reading a couple of those docs about RCU
> made me dizzy, so I hope someone familiar with RCU issues will take a look
> at the code in the files I've listed.  Surely you guys can take it from here
> now?!
> 
> If not, just give me some experimental code changes to make to get my 2.6.26
> and 2.6.27 kernels working again without disabling HPET!!!


The typical way to deadlock like this is do something like:

 rcu_read_lock();

   synchronize_rcu();

 rcu_read_unlock();

While I cannot immediately see any such usage in the function you
quoted, it could be on of the callers.. let me browse some code..

Can't seem to find anything like that.

What's weird though - is that HPET makes any difference on these network
code paths.

Could we end up calling rcu too soon? I doubt we bring up ipv4 before
rcu..


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ