[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20140312153741.07251019@notabene.brown>
Date: Wed, 12 Mar 2014 15:37:41 +1100
From: NeilBrown <neilb@...e.de>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Alexander Viro <viro@...iv.linux.org.uk>,
linux-raid@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-fsdevel@...r.kernel.org, "majianpeng" <majianpeng@...il.com>
Subject: Re: [PATCH] poll/wait/md: allow module to safely support 'poll' on
/proc files
On Tue, 11 Mar 2014 21:19:24 -0700 Andrew Morton <akpm@...ux-foundation.org>
wrote:
> On Wed, 12 Mar 2014 14:10:25 +1100 NeilBrown <neilb@...e.de> wrote:
>
> > On Tue, 11 Mar 2014 20:03:31 -0700 Andrew Morton <akpm@...ux-foundation.org>
> > wrote:
> >
> > > On Wed, 12 Mar 2014 13:36:38 +1100 NeilBrown <neilb@...e.de> wrote:
> > >
> > > >
> > > > The md driver currently supports 'poll' on /proc/mdstat.
> > > > This is unsafe as if the md-mod module is removed while a 'poll'
> > > > or 'select' is outstanding on /proc/mdstat, an oops occurs
> > > > when the syscall completes.
> > > > poll_freewait() will call remove_wait_queue() on a wait_queue_head_t
> > > > which was local to the module which no-longer exists.
> > > >
> > > > This problem is particular to /proc. Most filesystems do not
> > > > allow the module to be unloaded while any files are open on it.
> > > > /proc only blocks module unloading while a file_operations
> > > > call is currently active into the module, not while the file is open.
> > > > kernfs has this property too but kernfs allocates a wait_queue_head_t
> > > > in its internal data structures so the module doesn't need to provide
> > > > one.
> > > > (A previous patch to add a similar allocation to procfs was not
> > > > accepted).
> > >
> > > By who, me? I was hoping we could somehow keep the implementation
> > > contained within md. I don't think I actually looked at it to any
> > > significant extent!
> > >
> > > Was hoping that viro would pipe up.
> >
> > Was not accepted by anybody. Everybody is guilty :-)
> > You were at least nice enough to comment (thanks).
> >
> > I think I like this version better so it might not be a problem that the
> > previous version was not accepted. Depends on what the scheduler guys think
> > though....
>
> OK..
>
> > > > ...
> > > >
> > > > +/**
> > > > + * wait_queue_purge - remove all waiter from a wait_queue
> > > > + * @q: The queue to be purged
> > > > + *
> > > > + * Unlink all pending waiters from the queue.
> > > > + * This can be used prior to freeing a queue providing all waiters are
> > > > + * prepared for queue purging.
> > > > + * Waiters must call remove_wait_queue_puregeable() rather than
> > > > + * remove_wait_queue().
> > > > + *
> > > > + */
> > > > +void wait_queue_purge(wait_queue_head_t *q)
> > > > +{
> > > > + spin_lock(&q->lock);
> > > > + while (!list_empty(&q->task_list))
> > > > + list_del_init(q->task_list.next);
> > > > + spin_unlock(&q->lock);
> > > > + synchronize_rcu();
> > > > +}
> > > > +EXPORT_SYMBOL(wait_queue_purge);
> > >
> > > I don't get this. If a task is waiting on that wait_queue_head_t, how
> > > does it get woken?
> >
> > This is only expected to be used by tasks which have some other means of
> > being woken up. Possibly a timeout passed to 'select' or 'poll', possibly a
> > signal.
>
> Oh. So the caller is supposed to take the tasks off the queue via
> wait_queue_purge(), then to wake them up (which implies the caller has
> a second way of looking the tasks up).
No. The caller of wait_queue_purge() doesn't care beyond purging the queue.
The process that are waiting will be in 'select' or 'poll' or similar and
there is no guarantee that anything will ever happen to wake them up.
They can just keep waiting for all I care.
I most cases they will have set their own timeout (or should have done).
I could wake_up() the wait_queue before purging it. Then the callers would
wake up, attempt IO, probably get EINVAL or something because the /proc file
has been de-registered and hopefully move on with life.
But that is really independent of the implementation of wait_queue_purge().
It is entirely possible for a process to be waiting on multiple events (And
in the select/poll case it is very common) and if one event wants to say
"this event is never ever going to happen so I'm detaching the queue", then
there is no particular reason to do anything but detach the queue.
>
> And the tasks themselves ... do they need to know what happened? If
> they run remove_wait_queue() then will still take
> wait_queue_head_t.lock, so the calling task need to somehow keep the
> wait_queue_head_t alive until everyone has woken and cleared off.
The tasks themselves mustn't run remove_wait_queue(). They must run
remove_wait_queue_purgeable(). This check if the remove has already happened
and avoids taking the spinlock (which no longer exists) in that case.
You cannot just call wait_queue_purge() on any old queue - only one that is
known to be potentially purged and waiters know to use
remove_wait_queue_purge().
Currently that is only wait_queues that are passed to poll_wait() inside a
->poll() method.
>
> Complex! Could we please get that all fleshed out in the changelog and
> kerneldoc?
I thought the important parts (the waiter having to call
remove_wait_queue_purgeable()) already was in the kerneldoc.
Thanks,
NeilBrown
>
> > >
> > > > +/**
> > > > + * remove_wait_queue_puregeable - remove_wait_queue if wait_queue_purge might be used.
> > > > + * @q: the queue, which may already be purged, to remove from
> > > > + * @wait: the waiter to remove
> > > > + *
> > > > + * Remove a waiter from a queue if it hasn't already been purged.
> > > > + * If the queue has already been purged then task_list will be empty.
> > > > + * If it isn't then it is still safe to lock the queue and remove
> > > > + * the task.
> > > > + */
> > > > +void remove_wait_queue_purgeable(wait_queue_head_t *q, wait_queue_t *wait)
> > > > +{
> > > > + unsigned long flags;
> > > > +
> > > > + rcu_read_lock();
> > > > + if (!list_empty(&wait->task_list)) {
> > > > + spin_lock_irqsave(&q->lock, flags);
> > >
> > > Mixture of spin_lock_irqsave() here and spin_lock() in
> > > wait_queue_purge() looks odd.
> >
> > True - I was copying remove_wait_queue(). Maybe I should have just called
> > remove_wait_queue() directly (we don't actually need that _init on the
> > list_del).
>
> lockdep should have complained about the spin_lock().
Download attachment "signature.asc" of type "application/pgp-signature" (829 bytes)
Powered by blists - more mailing lists