[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAH5fLgjB-ia+UhE1P8gOxHTdjSJJ1=xKSS0c75AvGA91uo_fEw@mail.gmail.com>
Date: Mon, 30 Sep 2024 15:30:01 +0200
From: Alice Ryhl <aliceryhl@...gle.com>
To: Carlos Llamas <cmllamas@...gle.com>
Cc: Yu-Ting Tseng <yutingtseng@...gle.com>, Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Arve Hjønnevåg <arve@...roid.com>,
Todd Kjos <tkjos@...roid.com>, Martijn Coenen <maco@...roid.com>,
Joel Fernandes <joel@...lfernandes.org>, Christian Brauner <brauner@...nel.org>,
Suren Baghdasaryan <surenb@...gle.com>, linux-kernel@...r.kernel.org, kernel-team@...roid.com,
stable@...r.kernel.org
Subject: Re: [PATCH v2 6/8] binder: allow freeze notification for dead nodes
On Fri, Sep 27, 2024 at 6:32 PM Carlos Llamas <cmllamas@...gle.com> wrote:
>
> On Fri, Sep 27, 2024 at 06:15:40PM +0200, Alice Ryhl wrote:
> > On Fri, Sep 27, 2024 at 6:13 PM Yu-Ting Tseng <yutingtseng@...gle.com> wrote:
> > >
> > > On Fri, Sep 27, 2024 at 12:19 AM Alice Ryhl <aliceryhl@...gle.com> wrote:
> > > >
> > > > On Fri, Sep 27, 2024 at 1:37 AM Carlos Llamas <cmllamas@...gle.com> wrote:
> > > > >
> > > > > Alice points out that binder_request_freeze_notification() should not
> > > > > return EINVAL when the relevant node is dead [1]. The node can die at
> > > > > any point even if the user input is valid. Instead, allow the request
> > > > > to be allocated but skip the initial notification for dead nodes. This
> > > > > avoids propagating unnecessary errors back to userspace.
> > > > >
> > > > > Fixes: d579b04a52a1 ("binder: frozen notification")
> > > > > Cc: stable@...r.kernel.org
> > > > > Suggested-by: Alice Ryhl <aliceryhl@...gle.com>
> > > > > Link: https://lore.kernel.org/all/CAH5fLghapZJ4PbbkC8V5A6Zay-_sgTzwVpwqk6RWWUNKKyJC_Q@mail.gmail.com/ [1]
> > > > > Signed-off-by: Carlos Llamas <cmllamas@...gle.com>
> > > > > ---
> > > > > drivers/android/binder.c | 28 +++++++++++++---------------
> > > > > 1 file changed, 13 insertions(+), 15 deletions(-)
> > > > >
> > > > > diff --git a/drivers/android/binder.c b/drivers/android/binder.c
> > > > > index 73dc6cbc1681..415fc9759249 100644
> > > > > --- a/drivers/android/binder.c
> > > > > +++ b/drivers/android/binder.c
> > > > > @@ -3856,7 +3856,6 @@ binder_request_freeze_notification(struct binder_proc *proc,
> > > > > {
> > > > > struct binder_ref_freeze *freeze;
> > > > > struct binder_ref *ref;
> > > > > - bool is_frozen;
> > > > >
> > > > > freeze = kzalloc(sizeof(*freeze), GFP_KERNEL);
> > > > > if (!freeze)
> > > > > @@ -3872,32 +3871,31 @@ binder_request_freeze_notification(struct binder_proc *proc,
> > > > > }
> > > > >
> > > > > binder_node_lock(ref->node);
> > > > > -
> > > > > - if (ref->freeze || !ref->node->proc) {
> > > > > - binder_user_error("%d:%d invalid BC_REQUEST_FREEZE_NOTIFICATION %s\n",
> > > > > - proc->pid, thread->pid,
> > > > > - ref->freeze ? "already set" : "dead node");
> > > > > + if (ref->freeze) {
> > > > > + binder_user_error("%d:%d BC_REQUEST_FREEZE_NOTIFICATION already set\n",
> > > > > + proc->pid, thread->pid);
> > > > > binder_node_unlock(ref->node);
> > > > > binder_proc_unlock(proc);
> > > > > kfree(freeze);
> > > > > return -EINVAL;
> > > > > }
> > > > > - binder_inner_proc_lock(ref->node->proc);
> > > > > - is_frozen = ref->node->proc->is_frozen;
> > > > > - binder_inner_proc_unlock(ref->node->proc);
> > > > >
> > > > > binder_stats_created(BINDER_STAT_FREEZE);
> > > > > INIT_LIST_HEAD(&freeze->work.entry);
> > > > > freeze->cookie = handle_cookie->cookie;
> > > > > freeze->work.type = BINDER_WORK_FROZEN_BINDER;
> > > > > - freeze->is_frozen = is_frozen;
> > > > > -
> > > > > ref->freeze = freeze;
> > > > >
> > > > > - binder_inner_proc_lock(proc);
> > > > > - binder_enqueue_work_ilocked(&ref->freeze->work, &proc->todo);
> > > > > - binder_wakeup_proc_ilocked(proc);
> > > > > - binder_inner_proc_unlock(proc);
> > > > > + if (ref->node->proc) {
> > > > > + binder_inner_proc_lock(ref->node->proc);
> > > > > + freeze->is_frozen = ref->node->proc->is_frozen;
> > > > > + binder_inner_proc_unlock(ref->node->proc);
> > > > > +
> > > > > + binder_inner_proc_lock(proc);
> > > > > + binder_enqueue_work_ilocked(&freeze->work, &proc->todo);
> > > > > + binder_wakeup_proc_ilocked(proc);
> > > > > + binder_inner_proc_unlock(proc);
> > > >
> > > > This is not a problem with your change ... but, why exactly are we
> > > > scheduling the BINDER_WORK_FROZEN_BINDER right after creating it? For
> > > > death notications, we only schedule it immediately if the process is
> > > > dead. So shouldn't we only schedule it if the process is not frozen?
>
> For death notifications, we only care about a remote binder's death.
> Unlike freeze, in which we have a state that can toggle at any point.
> This is important for suspending and resuming transactions to a node.
>
> Sending the freeze notification immediately allows for (1) userspace
> knowing the current state of the remote node and (2) avoiding a race
> with BINDER_FREEZE ioctl in which we could miss a freeze/thaw.
>
> > > > And if the answer is that frozen notifications are always sent
> > > > immediately to notify about the current state, then we should also
> > > > send one for a dead process ... maybe. I guess a dead process is not
> > > > frozen?
> > > Yes this is to immediately notify about the current state (frozen or
> > > unfrozen). A dead process is in neither state so it feels more correct
> > > not to send either?
> >
> > Okay.
> >
> > On the other hand, I can easily imagine userspace code being written
> > with the assumption that it'll always get a notification immediately.
> > That would probably result in deadlocks in the edge case where the
> > process happens to be dead.
>
> There are different ways to proceed with this dead node scenario:
>
> 1. return ESRCH
> 2. silently fail and don't allocate a ref->freeze
> 3. allocate a ref->freeze but don't notify the current state
> 4. allocate and send a "fake" state notification.
>
> I like 1 just because it is technically the correct thing to do from the
> driver's perspective. However, it does complicate things in userspace as
> we've discussed. Option 2, could work but it would also fail with EINVAL
> if a "clear notification" is sent later anyway. Option 3 changes the
> behavior of guaranteeing a notification upon success. Option 4 can cause
> trouble on how a "not-frozen" notification is handled in userspace e.g
> start sending transactions.
>
> As you can see there is no clear winner here, we have to compromise
> something and option #3 is the best we can do IMO.
I am happy with both #3 and #4. I think #1 and #2 are problematic
because they will lead to userspace getting errors on correct use of
Binder.
Alice
Powered by blists - more mailing lists