[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120501121833.6c613d33@notabene.brown>
Date: Tue, 1 May 2012 12:18:33 +1000
From: NeilBrown <neilb@...e.de>
To: Arve Hjønnevåg <arve@...roid.com>
Cc: "Rafael J. Wysocki" <rjw@...k.pl>,
Linux PM list <linux-pm@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>,
Magnus Damm <magnus.damm@...il.com>, markgross@...gnar.org,
Matthew Garrett <mjg@...hat.com>,
Greg KH <gregkh@...uxfoundation.org>,
John Stultz <john.stultz@...aro.org>,
Brian Swetland <swetland@...gle.com>,
Alan Stern <stern@...land.harvard.edu>,
Dmitry Torokhov <dmitry.torokhov@...il.com>,
"Srivatsa S. Bhat" <srivatsa.bhat@...ux.vnet.ibm.com>
Subject: Re: [RFC][PATCH 5/8] epoll: Add a flag, EPOLLWAKEUP, to prevent
suspend while epoll events are ready
On Mon, 30 Apr 2012 17:52:08 -0700 Arve Hjønnevåg <arve@...roid.com> wrote:
> On Sun, Apr 29, 2012 at 6:58 PM, NeilBrown <neilb@...e.de> wrote:
> > On Thu, 26 Apr 2012 20:49:51 -0700 Arve Hjønnevåg <arve@...roid.com> wrote:
> ...
> >> I keep the wakeup-source active whenever the epitem is on a list
> >> (ep->rdllist or the local txlist). The temporary txlist is modified
> >> without holding the lock that protects ep->rdllist. It is easier to
> >> use a separate wakeup source to prevent suspend while this list is
> >> manipulated than trying to maintain the wakeup-source state in a
> >> different way than the existing eventpoll state. I think this only
> >> causes real problems if the same epoll file is used for frequent
> >> non-wakeup events (e.g. a gyro) and wakeup events. You should be able
> >> to work around this by using two epoll files.
> >
> > Thanks for the explanation. I can now see more clearly how your patch works.
> > I can also see why you might need the ep->ws wakeup_source. However I don't
> > like it.
> >
> > If it acted purely as a lock and prevented suspend while it was active then
> > it would be fine. However it doesn't. It also aborts any current suspend
> > attempt - so it is externally visible.
> > The way your code it written, *any* call to epoll_wait will abort the current
> > suspend cycle, even if it is called by a completely non-privileged user.
>
> With the patch I posted Friday, a non-privileged user will not be able
> to pass EPOLLWAKEUP and have the wakeup-source created.
Ahhh yes. I hadn't noticed that you only create ep->ws the first time that
an epi->ws is created. So that aspect is fine, thanks.
>
> > That may not obviously be harmful, but it makes the precise semantics of the
> > system call something quite non-obvious, and it is much better to have a very
> > clean semantic.
> > As you say, it can probably be worked-around but code is much safer when you
> > don't need to work-around things.
> >
> > I see two alternatives:
> > 1/ set the 'wakeup' flag on the whole epoll-fd, not on the individual events
> > that it is asked to monitor. i.e. add a new flag to epoll_create1()
> > instead of to epoll_ctl events.
> > Then you just need a single wakeup_source for the fd which is active
> > whenever any event is ready.
> >
> > This interface might be generally nicer, I'm not sure.
> >
> > 2/ Find a way to get rid of ep->ws.
> > Thinking about it more, I again think it isn't needed.
> > The reason is that suspend is already exclusive with any process running in
> > kernel context.
> > One of the first things suspend does is to freeze all process and (for
> > regular non-kernel-thread processes) this happens by sending a virtual
> > signal which is acted up when the process returns from a system call or
> > returns from a context switch. So while any given system call is running
> > (e.g. epoll_wait) suspend is blocked. When epoll_wait sets
> > TASK_INTERRUPTIBLE the 'freeze' signal will interrupt it of course, but
> > this is the only point where suspend can interfere with epoll_wait, and you
> > aren't holding ep->ws then anyway.
> > Hopefully Rafael will correct me if I got that outline wrong. But even if
> > I did, I think we need to get rid of ep->ws.
> >
>
> If ep_scan_ready_list is only called from freezable threads, then
> ep->ws is not strictly needed, but without it another suspend attempt
> will be triggered if there are not other wakeup-sources active. I'm
> also not sure if it could get called from a non-freezable thread since
> other subsystems can call it through the poll hook.
I can see that triggering another suspend cycle that we know will fail is not
ideal - every thread needs to move from 'idle' to 'frozen' and back again.
I wonder if a lighter-weight mechanism might achieve that better. It feels
like wakeup_source is being used for two very different purposes here - one
to disable suspend while some event is pending, one to lock-out suspend
during a critical piece of code. I feel it would be neater and more
transparent if they were two different mechanisms.
If there were a second mechanism, I would like to see it used in
pm_autosleep_set_state in place of autosleep_ws as well - it seems like a
similar situation.
Maybe adding a second locking mechanism can be added later - I don't think
the current code is wrong, it just seems inelegant.
>
> A third option is to only activate ep->ws when needed. This may may work:
I think this is an improvement if we stay with using a wakeup_source to
protect the critical code section. If we come up with a different mechanism,
them it is not necessary.
>
> > Also, I think it is important to clearly document how to use this safely.
> > You suggested that if any EPOLLWAKEUP event is ready, then suspend will
> > remain disabled not only until the event is handled, but also until the next
> > call to epoll_wait. That sounds like very useful semantics, but it isn't at
> > all explicit in the patch. I think it should be made very clear in
> > eventpoll.h how the flag can be used. (and then eventually get this into a
> > man page of course).
> >
>
> OK
>
> >>
> >> >> One last item that doesn't really belong here - but it is in context.
> >> >>
> >> >> This mechanism is elegant because it provides a single implementation that
> >> >> provides wakeup_source for almost any sort of device. I would like to do the
> >> >> same thing for interrupts.
> >> >> Most (maybe all) of the wakeup device on my phone have an interrupt where the
> >> >> body is run in a thread. When the thread has done it's work the event is
> >> >> visible to userspace so the EPOLLWAKEUP mechanism is all that is needed to
> >> >> complete the path to user-space (or for my user-space solution, nothing else
> >> >> is needed once it is visible to user-space).
> >> >> So we just need to ensure a clear path from the "top half" interrupt handler
> >> >> to the threaded handler.
> >> >> So I imagine attaching a wakeup source to every interrupt for which 'wakeup'
> >> >> is enabled, activating it when the top-half starts and relaxing it when the
> >> >> bottom-half completes. With this in place, almost all drivers would get
> >> >> wakeup_source handling for free.
> >> >> Does this seem reasonable to you.
> >> >
> >> > Yes, it does.
> >> >
> >>
> >> How useful is that? Suspend already synchronizes with interrupt
> >> handlers and will not proceed until they have returned. Are threaded
> >> interrupts handlers not always run at that stage? For drivers that use
> >> work-queues instead of a threaded interrupt handler, I think the
> >> suspend-blocking work-queue patch I wrote a while back is convenient.
> >>
> >
> > Maybe it isn't useful at all - I'm still working this stuff out.
> >
> > Yes, threaded interrupts are run "straight away", but what exactly does that
> > mean? And in particular, is there any interlocking to ensure they run
> > before suspend gets stop the CPU? Maybe the scheduling priority of the
> > different threads is enough to make sure this works, as irq_threads are
> > SCHED_FIFO and the suspending thread almost certainly isn't. But is that
> > still a guarantee on an SMP machine? irq_threads aren't freezable so suspend
> > won't block on them for that reason..
> >
> > I really just want to be sure that some interlock is in place to ensure that
> > the threaded interrupt handler runs before suspend absolutely commits to
> > suspending. If that is already the case, when what I suggest isn't needed as
> > you suggest. Do you know of such an interlock?
> >
>
> Normal interrupts are disabled during suspend. This synchronizes with
> the interrupt handler, and pending wakeup interrupts abort suspend. I
> have not looked at this code since threaded interrupt handlers were
> added, so there could be bugs there.
>
If disabling an interrupt ensured that the interrupt thread was idle, and
thus synchronised with both interrupt handlers, that would be good...
And it does. suspend_device_irqs() calls synchronize_irq() on each suspended
irq which waits for any irq thread to be idle. So it's all good.
i.e. in an interrupt is marked for WAKEUP, and its handler (whether threaded
or not) calls wake_up on a wait_queue_head that is used by 'poll' to detect
activity, then using EPOLLWAKEUP is sufficient to collect those events
without racing with suspend. So we don't need to teach multiple subsystems
about wakeup_sources - eventpoll and the interrupt handling can do it all.
Excellent.
Thanks,
NeilBrown
Download attachment "signature.asc" of type "application/pgp-signature" (829 bytes)
Powered by blists - more mailing lists