lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sat, 14 May 2011 07:26:21 -0700
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Yinghai Lu <yinghai@...nel.org>
Cc:	Ingo Molnar <mingo@...e.hu>, linux-kernel@...r.kernel.org
Subject: Re: [GIT PULL rcu/next] rcu commits for 2.6.40

On Fri, May 13, 2011 at 02:08:21PM -0700, Yinghai Lu wrote:
> On Thu, May 12, 2011 at 2:36 PM, Yinghai Lu <yinghai@...nel.org> wrote:
> > On 05/12/2011 02:20 AM, Paul E. McKenney wrote:
> >> On Thu, May 12, 2011 at 12:42:50AM -0700, Yinghai Lu wrote:
> >>> On 05/12/2011 12:27 AM, Yinghai Lu wrote:
> >>>> On 05/11/2011 11:03 PM, Ingo Molnar wrote:
> >>>>>
> >>>>> * Yinghai Lu <yinghai@...nel.org> wrote:
> >>>>>
> >>>>>> e59fb3120becfb36b22ddb8bd27d065d3cdca499 is the first bad commit
> >>>>>> commit e59fb3120becfb36b22ddb8bd27d065d3cdca499
> >>>>>> Author: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
> >>>>>> Date:   Tue Sep 7 10:38:22 2010 -0700
> >>>>>>
> >>>>>>     rcu: Decrease memory-barrier usage based on semi-formal proof
> >>>>>
> >>>>> Find below an (untested!) attempt at reverting it for debugging purposes: could
> >>>>> you please try it, does your system now boot up fine?
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>>    Ingo
> >>>>>
> >>>>
> >>>> yes, reverted manually that commit fix the problem.
> >>>
> >>> on system with 8 sockets westmere-ex
> >>>
> >>> it seems other commits after that commit contribute some delay too.
> >>>
> >>> [   32.240739] cpu_dev_init done
> >>> [   73.587288] memory_dev_init done
> >>
> >> I am testing a revert of e59fb3120becfb36b22ddb8bd27d065d3cdca499 and
> >> will chase down the delay.
> >>
> >
> > it seems still need to revert following one in addition  e59fb3120becfb36b22ddb8bd27d065d3cdca499.
> >
> > [root@...14-2404-239-158 linux-2.6]# git bisect good
> > a26ac2455ffcf3be5c6ef92bc6df7182700f2114 is the first bad commit
> > commit a26ac2455ffcf3be5c6ef92bc6df7182700f2114
> > Author: Paul E. McKenney <paul.mckenney@...aro.org>
> > Date:   Wed Jan 12 14:10:23 2011 -0800
> >
> >    rcu: move TREE_RCU from softirq to kthread
> >
> >    If RCU priority boosting is to be meaningful, callback invocation must
> >    be boosted in addition to preempted RCU readers.  Otherwise, in presence
> >    of CPU real-time threads, the grace period ends, but the callbacks don't
> >    get invoked.  If the callbacks don't get invoked, the associated memory
> >    doesn't get freed, so the system is still subject to OOM.
> >
> >    But it is not reasonable to priority-boost RCU_SOFTIRQ, so this commit
> >    moves the callback invocations to a kthread, which can be boosted easily.
> >
> >    Also add comments and properly synchronized all accesses to
> >    rcu_cpu_kthread_task, as suggested by Lai Jiangshan.
> >
> >    Signed-off-by: Paul E. McKenney <paul.mckenney@...aro.org>
> >    Signed-off-by: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
> >    Reviewed-by: Josh Triplett <josh@...htriplett.org>
> >
> > :040000 040000 e40306ac6405952c1d387325a98588442209abe8 efe9ea2f408c62daaccf49e6d1339dff3a74f049 M      Documentation
> > :040000 040000 8f9e7a8fa3a728d4ae58e2efb8ada7cf08aed00e 9b44deba45ba905c5d9b3cc314812f0ba3f7e639 M      include
> > :040000 040000 4b10b719a2d56ed4bc796a9f43775732bb5ff144 4db269277ccf607e1a6a7d7f4c2a7cf8d592d46a M      kernel
> > :040000 040000 881f102e6831381beed016ed240d690f6a2ccd5e 57d2fc6f84e47394c116bc617a9a0ef9b8b6dbd4 M      tools
> 
> so only revert  e59fb3120becfb36b22ddb8bd27d065d3cdca499 is not enough.
> 
> [  315.248277] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
> [  315.285642] serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
> [  427.405283] INFO: rcu_sched_state detected stalls on CPUs/tasks: {
> 0} (detected by 50, t=15002 jiffies)
> [  427.408267] sending NMI to all CPUs:
> [  427.419298] NMI backtrace for cpu 1
> [  427.420616] CPU 1
> 
> Paul, can you make one clean revert for
> | a26ac2455ffcf3be5c6ef92bc6df7182700f2114
> | rcu: move TREE_RCU from softirq to kthread

I will be continuing to look into a few things over the weekend, but
if I cannot find the cause, then changing back to softirq might be the
thing to do.  It won't be so much a revert in the "git revert" sense
due to later dependencies, but it could be shifted back from kthread
to softirq.  This would certainly decrease dependence on the scheduler,
at least in the common case where ksoftirqd does not run.

							Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ