[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240203-freuden-frucht-b598f8cca27d@brauner>
Date: Sat, 3 Feb 2024 17:46:56 +0100
From: Christian Brauner <brauner@...nel.org>
To: Oleg Nesterov <oleg@...hat.com>
Cc: Andy Lutomirski <luto@...capital.net>,
"Eric W. Biederman" <ebiederm@...ssion.com>, Tycho Andersen <tycho@...ho.pizza>, linux-api@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/3] pidfd_poll: report POLLHUP when pid_task() == NULL
On Sat, Feb 03, 2024 at 01:04:26PM +0100, Oleg Nesterov wrote:
> Christian, I apologize for my terse and unclear emails yesterday,
> I was a bit busy.
Oh don't worry, I didn't take it badly in any way. That was all totally
ok me getting the keys api wrong is not your problem. ;)
>
> On 02/02, Oleg Nesterov wrote:
> >
> > On 02/02, Christian Brauner wrote:
> > >
> > > > I think we need a simpler patch. I was going to send it as 4/4, but I'd
> > > > like to think more, _perhaps_ we can also discriminate the PIDFD_THREAD
> > > > and non-PIDFD_THREAD waiters. I'll try to make the patch(es) tomorrow or
> > >
> > > Right, I didn't go that far.
>
> OK, so lets forget about the PIDFD_THREAD waiters for the moment.
> Then everything is trivial, please see below.
>
> > > > 1. we can't use wake_up_poll(), it passes nr_exclusive => 1
> > >
> > > Bah. So we need the same stuff we did for io_uring and use
> > > __wake_up() directly. Or we add wake_up_all_poll() and convert the other
> > > three callsites:
> >
> > ...
> >
> > > +#define wake_up_all_poll(x, m) \
> > > + __wake_up(x, TASK_NORMAL, 0, poll_to_key(m))
> >
> > Agreed, but I think this + s/wake_up/wake_up_all_poll/ conversions
> > need a separate patch.
>
> And if it was not clear I like this change! In fact I thought about
> the new helper too, but I didn't realize that it already have the
> users.
>
> > > -void do_notify_pidfd(struct task_struct *task)
> > > +void pidfd_wake_up_poll(struct task_struct *task, bool dead)
> > > {
> > > - struct pid *pid;
> > > -
> > > WARN_ON(task->exit_state == 0);
> > > - pid = task_pid(task);
> > > - wake_up_all(&pid->wait_pidfd);
> > > + WARN_ON(mask == 0);
> > > + wake_up_all_poll(&task_pid(task)->wait_pidfd,
> > > + EPOLLIN | EPOLLRDNORM | dead ? EPOLLHUP : 0);
> >
> > No...
> >
> > This is still overcomplicated and is not right.
> ^^^^^^^^^^^^^^^^
> Sorry, sorry, I misread your change, "dead" is always false so it has
> no effect and thus the change is correct.
>
> But why do we need this arg? All we need is the trivial one-liner:
>
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -2025,7 +2025,8 @@ void do_notify_pidfd(struct task_struct *task)
>
> WARN_ON(task->exit_state == 0);
> pid = task_pid(task);
> - wake_up_all(&pid->wait_pidfd);
> + __wake_up(&pid->wait_pidfd, TASK_NORMAL, 0,
> + poll_to_key(EPOLLIN | EPOLLRDNORM));
Ok, care to just send me a full patch for this?
Just paste it here and I'll pick it.
> }
>
> /*
>
> and I was going to send the patch above as 4/4, but then decided
> to delay it, see below.
>
> We can rename do_notify_pidfd() if you wish, and of course the
> new wake_up_all_poll() helper you proposed makes sense, but this
> is another story.
If at all then absolutely a separate patch.
>
> As for __change_pid(). In this case wake_up_all() is fine, we can
> change it to use wake_up_all_poll() too for consistency, but this
> is not strictly necessary and in fact "key = 0" makes a bit more
> sense imo.
Ok, I see.
>
> And just in case... previously I said that (perhaps) it can use
> __wake_up_pollfree() but no, this would be obviously wrong.
>
> ------------------------------------------------------------------
> Now let's recall about the PIDFD_THREAD waiters. exit_notify() does
>
> /*
> * sub-thread or delay_group_leader(), wake up the
> * PIDFD_THREAD waiters.
> */
> if (!thread_group_empty(tsk))
> do_notify_pidfd(tsk);
>
> and it would be nice to not wakeup the non-PIDFD_THREAD waiters.
>
> I was thinking about something like the changes below but
>
> - I am NOT sure it will work! I need to read the code
> in fs/select.c
>
> - in fact I am not sure this makes a lot of sense, and
> the hack in pidfd_poll() doesn't look very nice even
> _if_ it can work.
First, good idea to make this work. :)
Second, I agree that this looks ugly. ;/
So here's a very likely a stupid idea. To make that clean we essentially
need kernel private information that can't be set in userspace (Btw,
look at EPOLL_URING_WAKE which is similar in that it cannot be set from
userspace. It's not the same thing ofc but it is a private bit.). Which
is the gist of your proposal in a way.
So we would have to grab a new private bit in the epoll flag space. A
subsystem (e.g., pidfd, seccomp) could use that private bit to avoid
spurious wakeups such as this. I'm sure that this would be useful to
others?
It's not something we need to do right now but my point is that this
would be more palatable if this coulde be made generally useful.
The only thing that's ugly is that we'd raise that bit in ->poll() but
maybe that's acceptable through a helper? If we'd wanted to set it from
epoll itself we'd need a new fop most likely. :/
>
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -2081,6 +2081,13 @@ static __poll_t pidfd_poll(struct file *file, struct poll_table_struct *pts)
> struct task_struct *task;
> __poll_t poll_flags = 0;
>
> + if (thread && pts && pts->_qproc) {
> + // We are not registered yet. Update the key to mark
> + // us a a PIDFD_THREAD waiter, __pollwait() will copy
> + // this ->_key to poll_table_entry->key.
> + if (pts->_key & EPOLLIN) // exclude the POLLHUP-only waiters
> + pts->_key |= EPOLLMSG; // random flag
> + }
> poll_wait(file, &pid->wait_pidfd, pts);
> /*
> * Depending on PIDFD_THREAD, inform pollers when the thread
>
> Now, do_notify_pidfd() can do
>
> if (!thread_group_empty(tsk))
> mask = EPOLLMSG; // matches the hack in pidfd_poll
> else
> mask = EPOLLIN | EPOLLRDNORM;
>
> __wake_up(..., poll_to_key(mask));
>
> Yes, in this case it makes more sense to pass !thread_group_empty()
> as a "bool thread" argument.
>
> ---------------------------------------------------------------------
>
> What do you think?
>
> I am starting to think that I shouldn't have delayed the 1st trivial
> change. Feel free to push it, with or without rename, with or without
> the new wake_up_all_poll() helper, I am fine either way. But please
> don't add the new "int dead" argument, afaics it makes no sense.
I would never go above your head, don't worry. :)
Powered by blists - more mailing lists