[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080502203346.GC3956@ucw.cz>
Date: Fri, 2 May 2008 22:33:46 +0200
From: Pavel Machek <pavel@....cz>
To: Hidetoshi Seto <seto.hidetoshi@...fujitsu.com>
Cc: Rusty Russell <rusty@...tcorp.com.au>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/3] patches for stop_machine
Hi!
> Hi Rusty and all,
>
> This is a proposal of minor improvement for kernel/stop_machine.c
>
> [PATCH 1/3] stop_machine: short exit path for if we cannot create enough threads
> [PATCH 2/3] stop_machine: add timeout for child thread deployment
> [PATCH 3/3] stop_machine: add stopmachine_timeout sysctl entry
>
> The main topic is "how about adding timeout for stop_machine?"
> I think it will act as a safety net.
>
> For example (of silly situation), system can hung with following way:
>
> # ./silly.sh
> run an evil loop task on AP
> pid 6138's current affinity mask: ff
> pid 6138's new affinity mask: fe
> to pretend lock up, chrt -f -p 99 6138
> loop[6138] is on CPU #4
> to do stopmachine, try to off #7
> echo 0 > /sys/devices/system/cpu/cpu7/online
> (never return)
>
> After applying patch set here, it can be prevented.
>
> # ./silly.sh
> :
> echo 0 > /sys/devices/system/cpu/cpu7/online
> stopmachine: Failed to stop machine in time(5s). Are there any CPUs on file?
> ./silly.sh: line 22: echo: write error: Device or resource busy
> offline is failed
I'd expect at least WARN_ON here. -EBUSY is not good enough indication
that one of your cpus is now dead.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists