lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20131112095516.GB32441@localhost>
Date:	Tue, 12 Nov 2013 17:55:16 +0800
From:	Fengguang Wu <fengguang.wu@...el.com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Michael wang <wangyun@...ux.vnet.ibm.com>,
	Ingo Molnar <mingo@...nel.org>, linux-kernel@...r.kernel.org
Subject: Re: [sched/get_online_cpus] INFO: task swapper/0:1 blocked for more
 than 120 seconds.

On Mon, Nov 11, 2013 at 05:20:22PM +0100, Peter Zijlstra wrote:
> On Mon, Nov 11, 2013 at 03:47:11PM +0800, Michael wang wrote:
> > Hi, Fengguang
> > 
> > On 11/10/2013 06:16 PM, Fengguang Wu wrote:
> > > Greetings,
> > > 
> > > I got the below dmesg and the first bad commit is
> > 
> > I guess this will disappear when '!CONFIG_RCU_BOOST'...
> > 
> > AFAIK, if the rsp was in boost mode, we count on smpboot-thread
> > 'rcu_cpu_thread_spec' to finish the callback, which will be
> > parked before do sync-rcu inside _cpu_down(), if that was true,
> > then the sync will never finish...
> > 
> > May be some brainless fix like this?
> > 
> > 
> > 
> > diff --git a/kernel/cpu.c b/kernel/cpu.c
> > index 63aa50d..aa24338 100644
> > --- a/kernel/cpu.c
> > +++ b/kernel/cpu.c
> > @@ -306,7 +306,6 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen)
> >                                 __func__, cpu);
> >                 goto out_release;
> >         }
> > -       smpboot_park_threads(cpu);
> >  
> >         /*
> >          * By now we've cleared cpu_active_mask, wait for all preempt-disabled
> > @@ -321,6 +320,8 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen)
> >  #endif
> >         synchronize_rcu();
> >  
> > +       smpboot_park_threads(cpu);
> > +
> >         /*
> >          * So now all preempt/rcu users must observe !cpu_active().
> >          */
> 
> Good thinking.. Wu did this cure stuff?

Yes, it fixed the problem.

Tested-by: Fengguang Wu <fengguang.wu@...el.com>


/kernel/i386-randconfig-j3-11101308/484f4e66a6a1102edf02407479f6f7632aade0f3

+--------------------------------------------------+--------------+--------------+
|                                                  | e5137b50a064 | 484f4e66a6a1 |
+--------------------------------------------------+--------------+--------------+
| boot_successes                                   | 42           | 100          |
| boot_failures                                    | 58           |              |
| INFO:task_blocked_for_more_than_seconds          | 58           |              |
| Kernel_panic-not_syncing:hung_task:blocked_tasks | 58           |              |
+--------------------------------------------------+--------------+--------------+

/kernel/x86_64-randconfig-x4-1108/484f4e66a6a1102edf02407479f6f7632aade0f3

+------------------------------------------------------------------------------------+-----------+--------------+--------------+
|                                                                                    | v3.12-rc7 | e5137b50a064 | 484f4e66a6a1 |
+------------------------------------------------------------------------------------+-----------+--------------+--------------+
| boot_successes                                                                     | 59        | 34           | 100          |
| has_kernel_error_warning                                                           | 4         |              |              |
| BUG:kernel_early_hang_without_any_printk_output                                    | 4         |              |              |
| boot_failures                                                                      | 0         | 66           |              |
| INFO:task_blocked_for_more_than_seconds                                            | 0         | 66           |              |
| INFO:NMI_handler(arch_trigger_all_cpu_backtrace_handler)took_too_long_to_run:msecs | 0         | 55           |              |
| Kernel_panic-not_syncing:hung_task:blocked_tasks                                   | 0         | 66           |              |
+------------------------------------------------------------------------------------+-----------+--------------+--------------+

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ