[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4660534E.6050903@cfl.rr.com>
Date: Fri, 01 Jun 2007 13:11:42 -0400
From: Mark Hounschell <dmarkh@....rr.com>
To: Oleg Nesterov <oleg@...sign.ru>
CC: Mark Hounschell <markh@...pro.net>,
Andrew Morton <akpm@...ux-foundation.org>,
linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...e.hu>
Subject: Re: floppy.c soft lockup
Oleg Nesterov wrote:
> On 06/01, Mark Hounschell wrote:
>> Oleg Nesterov wrote:
>>> Yes, but see above. flush_scheduled_work() needs a cooperation from events/2
>>> which is bound to CPU 2.
>>>
>> Again I don't understand why flush_scheduled_work() running on behalf of a process
>> affinitized to processor-1 requires cooperation from events/2 (affinitized to processor-2)
>> when there is an events/1 already affinitized to processor 1?
>
> flush_workqueue() blocks until any scheduled work on any CPU has run to
> completion. If we have some work_struct pending on CPU 2, it can be completed
> only when events/2 executes it.
>
>>> If you changed irq/X/smp_affinity, the patch I sent should help, because
>>> floppy_work can't be scheduled on CPU 2, but still I don't think it is right
>>> to run 100% cpu-bound RT-process.
>> The patch you sent helps with no other intervention from me. But then so does
>> the patch mentioned in the original post. I am able to bang on the floppies pretty
>> hard doing all kinds of things with no trouble using either.
>
> This patch replaces flush_scheduled_work() with cancel_work_sync(). The latter
> can still hang if the floppy interrupt happens on CPU 2 and does schedule_bh(),
> events/2 starts running floppy_work->func() and preempted by RT-thread. This is
> very unlikely, but possible.
>
>>>From your original post:
> >
> > The application only runs on SMP machines and uses process and irq affinities
>
> if the floppy interrupt can't happen on CPU 2, the above scenario is not possible.
>
All the irq affinities but one are set to processor-1. The only irq not
is from an rtom (Real-Time Option Module). It's irq is handled by
processor-2.
>> As far as a 100% cpu-bound RT-process goes, well I say I don't intentionally relinquish
>> the processor but it's not really 100% cpu-bound. Running xosview I see some spare time.
>
> Well, I don't know what is xosview, sorry :) so I don't understand what does
> "spare time" precisely mean. If this thread does some i/o or something which
> can sleep, then...
>
I don't understand the _real_ meaning of spare time either but xosview
is just a little graphical window showing information obtained from the
proc fs i think.
> OK. In that case we may have another reason for deadlock, say a pending
> floppy_work needs open_lock or test_and_set_bit(0, &fdc_busy).
>
> Could you apply the trivial patch below, and change the i/o thread to do
>
> prctl(1234); // hangs ???
> printf(something);
> ioctl(Q->DevSpec1, FDSETPRM, &medprm); // this hangs
>
> to see if prctl() hangs or not? This way we can narrow the problem.
> (of course, you can just kill the above ioctl() if this is possible).
>
> Thanks!
>
> Oleg.
>
> --- OLD/kernel/sys.c~ 2007-04-03 13:05:02.000000000 +0400
> +++ OLD/kernel/sys.c 2007-06-01 18:56:22.000000000 +0400
> @@ -2147,6 +2147,11 @@ asmlinkage long sys_prctl(int option, un
> {
> long error;
>
> + if (option == 1234) {
> + flush_scheduled_work();
> + return 0;
> + }
> +
> error = security_task_prctl(option, arg2, arg3, arg4, arg5);
> if (error)
> return error;
>
> -
Ok the prctl never returned. I just replaced the ioctl with it and added
a printf before and after. I only get the one before. The thread is hung
at this point just as if I'd done the ioctl?
Regards
Mark
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists