lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 11 Sep 2009 12:08:52 -0700
From:	Suresh Siddha <suresh.b.siddha@...el.com>
To:	Xiao Guangrong <xiaoguangrong@...fujitsu.com>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
	"mm-commits@...r.kernel.org" <mm-commits@...r.kernel.org>,
	"jens.axboe@...cle.com" <jens.axboe@...cle.com>,
	"mingo@...e.hu" <mingo@...e.hu>,
	"nickpiggin@...oo.com.au" <nickpiggin@...oo.com.au>,
	"rusty@...tcorp.com.au" <rusty@...tcorp.com.au>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: +
 generic-ipi-fix-the-race-between-generic_smp_call_function_-and-hotplug_cfd.patch
 added to -mm tree

On Fri, 2009-09-11 at 00:45 -0700, Xiao Guangrong wrote:
> I think that get_online_cpus() and stop_machine() can't avoid this
> race,
> 
> 1: For the first example in my patch's changlog:
> 
>       CPU A:                         CPU B
> 
>  smp_call_function_many(wait=0)	    ......
>    cpu_down()                      ......
>   hotplug_cfd() ->                 ......
>  free_cpumask_var(cfd->cpumask)  (receive function IPI interrupte)
>                                 /* read cfd->cpumask */
>                           generic_smp_call_function_interrupt() ->
>                          cpumask_test_and_clear_cpu(cpu, data->cpumask)
> 
>                          	CRASH!!!
> 
> CPU A call smp_call_function_many(wait=0) that want CPU B to call
> a specific function, after smp_call_function_many() return, we let
> CPU A offline immediately. Unfortunately, if CPU B receives this
> IPI interrupt after CPU A down, it will crash like above description.

How can cpu B receive the IPI interrupt after cpu A is down?

As part of the cpu A going down, we first do the stop machine. i.e.,
schedule the stop machine worker threads on each cpu. So, by the time
all the worker threads on all the cpu's get scheduled and synchronized,
ipi on B should get delivered.

> 
> 2: For the second example in my patch's changlog:
> 
> If CPU B is dying, like below:
> 
> _cpu_down()
> {
> 	......
> 	
> 	/* We suppose that have below sequences:
> 	 * before call __stop_machine(), CPU B is online (in cpu_online_mask),
> 	 * in this time, CPU A call smp_call_function_many(wait=0) and want
> 	 * CPU B to call a specific function, after CPU A finish it, CPU B
> 	 * go to __stop_machine() and disable it's interrupt
> 	 * (suppose CPU B not receive IPI interrupt in this time now)
> 	 */
> 	err = __stop_machine(take_cpu_down, &tcd_param, cpumask_of(cpu));
> 	......
> }
> 
> Now, CPU B is down, but it's not handle CPU A's request, it cause that
> can't clean the CSD_FLAG_LOCK flag of CPU A's cfd_data, if CPU A
> call smp_call_function_many() next time. it will block in 
> csd_lock() -> csd_lock_wait(data) forever.

Here also, by the time the stop machine threads are scheduled on cpu B,
cpu B should service that IPI.

smp_call_function with wait=0 will still ensure that IPI's are
registered at the destination cpu. But smp_call_function call will
return before the interrupt handler is run and completed.

So, by the time we schedule stop machine threads and when they are all
online and get synchronized (after this point only we disable interrupts
by moving to STOPMACHINE_DISABLE_IRQ state), we should have serviced the
pending smp call function IPI's.

I am still not convinced about the need for this patch. Am I missing
something? Please elaborate the need with more sequence of steps.

thanks,
suresh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ