[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120430115819.633c2cb6@notabene.brown>
Date: Mon, 30 Apr 2012 11:58:19 +1000
From: NeilBrown <neilb@...e.de>
To: Arve Hjønnevåg <arve@...roid.com>
Cc: "Rafael J. Wysocki" <rjw@...k.pl>,
Linux PM list <linux-pm@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>,
Magnus Damm <magnus.damm@...il.com>, markgross@...gnar.org,
Matthew Garrett <mjg@...hat.com>,
Greg KH <gregkh@...uxfoundation.org>,
John Stultz <john.stultz@...aro.org>,
Brian Swetland <swetland@...gle.com>,
Alan Stern <stern@...land.harvard.edu>,
Dmitry Torokhov <dmitry.torokhov@...il.com>,
"Srivatsa S. Bhat" <srivatsa.bhat@...ux.vnet.ibm.com>
Subject: Re: [RFC][PATCH 5/8] epoll: Add a flag, EPOLLWAKEUP, to prevent
suspend while epoll events are ready
On Thu, 26 Apr 2012 20:49:51 -0700 Arve Hjønnevåg <arve@...roid.com> wrote:
> 2012/4/26 Rafael J. Wysocki <rjw@...k.pl>:
> > On Thursday, April 26, 2012, NeilBrown wrote:
> >> On Sun, 22 Apr 2012 23:22:43 +0200 "Rafael J. Wysocki" <rjw@...k.pl> wrote:
> >>
> >> > From: Arve Hjønnevåg <arve@...roid.com>
> >> >
> >> > When an epoll_event, that has the EPOLLWAKEUP flag set, is ready, a
> >> > wakeup_source will be active to prevent suspend. This can be used to
> >> > handle wakeup events from a driver that support poll, e.g. input, if
> >> > that driver wakes up the waitqueue passed to epoll before allowing
> >> > suspend.
> >> >
> >> > The current implementation uses an extra wakeup_source when
> >> > ep_scan_ready_list runs. This can cause problems if a single thread
> >> > is polling on wakeup events and frequent non-wakeup events (events
> >> > usually arrive during thread freezing) using the same epoll file.
> >>
> >> This is quite neat.
> >>
> >> If I understand it correctly, you register file descriptors with epoll_ctl()
> >> on an fd created with epoll_create(), and set the new EPOLLWAKEUP flag.
> >> Then when a regular 'poll' or 'select' on the epoll fd reports that it is
> >> readable you:
>
> I think it makes more sense to use epoll_wait than mixing this with
> select or poll.
>
> >> - get a wakelock
> This may not be needed, since epoll does not reevaluate its state
> until you call into it again (at least using epoll_wait).
>
> >> - use epoll_wait to collect the events
> >> - process the events
> >> - release your wakelock
> >> - go back to poll() or select() on the epoll fd.
> >> Correct? As long as there are ready events with EPOLLWAKEUP set a
> >> wakeup_source is held active and the system won't go to sleep.
> >>
> >> My concern with this is about permissions. It appears that any process could
> >> wait of some fd (maybe a pipe they created themselves) with EPOLLWAKEUP, and
> >> then simply never epoll_wait() for the event. Then they would be keeping
> >> the system awake. I don't think that is acceptable.
> >
> > I wonder what Arve has to say to that, but let me say that on systems without
> > autosleep every process can go into an infinite busy loop which is going to
> > drain battery relatively quickly just as well and I don't see why that's so
> > much different.
> >
>
> I still think is useful to limit access to this feature. On a phone, a
> process stuck in an infinite loop will increase battery drain, but if
> this process does not have permission to prevent suspend, then this is
> only catastrophic if another process that have that permission is
> preventing suspend. I think we should add a capability for this.
> Assuming you agree, do want me to create a separate patch for that
> adds a capability, or roll it into this one.
>
> >> So there needs to be some way to limit who can effectively block suspend by
> >> using EPOLLWAKEUP.
> >> (This is one of the reasons I like an all-user-space solution. Policy issues
> >> like this can easily be decided in user-space but are clumsy to put into the
> >> kernel).
> >>
> >> Also, I'm having trouble understanding the ep->ws wakeup_source.
> >> The epi->ws makes lots of sense and I think I understand it all.
> >> However I don't see why you need a wakeup_source for the 'struct eventpoll'.
> >>
> >> Every time that 'poll' decides to call the ->poll fop for the eventpoll, this
> >> wakeup_source will be activated and deactivated which will abort any current
> >> suspend cycle even if there are no events to report.
> >>
> >> I suspect it can just go away.
> >
> > I'll leave this one entirely to Arve, if you don't mind. :-)
> >
>
> I keep the wakeup-source active whenever the epitem is on a list
> (ep->rdllist or the local txlist). The temporary txlist is modified
> without holding the lock that protects ep->rdllist. It is easier to
> use a separate wakeup source to prevent suspend while this list is
> manipulated than trying to maintain the wakeup-source state in a
> different way than the existing eventpoll state. I think this only
> causes real problems if the same epoll file is used for frequent
> non-wakeup events (e.g. a gyro) and wakeup events. You should be able
> to work around this by using two epoll files.
Thanks for the explanation. I can now see more clearly how your patch works.
I can also see why you might need the ep->ws wakeup_source. However I don't
like it.
If it acted purely as a lock and prevented suspend while it was active then
it would be fine. However it doesn't. It also aborts any current suspend
attempt - so it is externally visible.
The way your code it written, *any* call to epoll_wait will abort the current
suspend cycle, even if it is called by a completely non-privileged user.
That may not obviously be harmful, but it makes the precise semantics of the
system call something quite non-obvious, and it is much better to have a very
clean semantic.
As you say, it can probably be worked-around but code is much safer when you
don't need to work-around things.
I see two alternatives:
1/ set the 'wakeup' flag on the whole epoll-fd, not on the individual events
that it is asked to monitor. i.e. add a new flag to epoll_create1()
instead of to epoll_ctl events.
Then you just need a single wakeup_source for the fd which is active
whenever any event is ready.
This interface might be generally nicer, I'm not sure.
2/ Find a way to get rid of ep->ws.
Thinking about it more, I again think it isn't needed.
The reason is that suspend is already exclusive with any process running in
kernel context.
One of the first things suspend does is to freeze all process and (for
regular non-kernel-thread processes) this happens by sending a virtual
signal which is acted up when the process returns from a system call or
returns from a context switch. So while any given system call is running
(e.g. epoll_wait) suspend is blocked. When epoll_wait sets
TASK_INTERRUPTIBLE the 'freeze' signal will interrupt it of course, but
this is the only point where suspend can interfere with epoll_wait, and you
aren't holding ep->ws then anyway.
Hopefully Rafael will correct me if I got that outline wrong. But even if
I did, I think we need to get rid of ep->ws.
Also, I think it is important to clearly document how to use this safely.
You suggested that if any EPOLLWAKEUP event is ready, then suspend will
remain disabled not only until the event is handled, but also until the next
call to epoll_wait. That sounds like very useful semantics, but it isn't at
all explicit in the patch. I think it should be made very clear in
eventpoll.h how the flag can be used. (and then eventually get this into a
man page of course).
>
> >> One last item that doesn't really belong here - but it is in context.
> >>
> >> This mechanism is elegant because it provides a single implementation that
> >> provides wakeup_source for almost any sort of device. I would like to do the
> >> same thing for interrupts.
> >> Most (maybe all) of the wakeup device on my phone have an interrupt where the
> >> body is run in a thread. When the thread has done it's work the event is
> >> visible to userspace so the EPOLLWAKEUP mechanism is all that is needed to
> >> complete the path to user-space (or for my user-space solution, nothing else
> >> is needed once it is visible to user-space).
> >> So we just need to ensure a clear path from the "top half" interrupt handler
> >> to the threaded handler.
> >> So I imagine attaching a wakeup source to every interrupt for which 'wakeup'
> >> is enabled, activating it when the top-half starts and relaxing it when the
> >> bottom-half completes. With this in place, almost all drivers would get
> >> wakeup_source handling for free.
> >> Does this seem reasonable to you.
> >
> > Yes, it does.
> >
>
> How useful is that? Suspend already synchronizes with interrupt
> handlers and will not proceed until they have returned. Are threaded
> interrupts handlers not always run at that stage? For drivers that use
> work-queues instead of a threaded interrupt handler, I think the
> suspend-blocking work-queue patch I wrote a while back is convenient.
>
Maybe it isn't useful at all - I'm still working this stuff out.
Yes, threaded interrupts are run "straight away", but what exactly does that
mean? And in particular, is there any interlocking to ensure they run
before suspend gets stop the CPU? Maybe the scheduling priority of the
different threads is enough to make sure this works, as irq_threads are
SCHED_FIFO and the suspending thread almost certainly isn't. But is that
still a guarantee on an SMP machine? irq_threads aren't freezable so suspend
won't block on them for that reason..
I really just want to be sure that some interlock is in place to ensure that
the threaded interrupt handler runs before suspend absolutely commits to
suspending. If that is already the case, when what I suggest isn't needed as
you suggest. Do you know of such an interlock?
Thanks,
NeilBrown
Download attachment "signature.asc" of type "application/pgp-signature" (829 bytes)
Powered by blists - more mailing lists