linux-kernel - Re: [PATCH] futex: don't leak robust

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAH4c4jLjSBxbd3bqkdgcCSWqXURratANgnbq9negrSU283xHpg@mail.gmail.com>
Date: Wed, 11 Jun 2025 19:33:39 +0530
From: Pranav Tyagi <pranav.tyagi03@...il.com>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>, 
	Darren Hart <dvhart@...radead.org>, Davidlohr Bueso <dave@...olabs.net>, 
	André Almeida <andrealmeid@...lia.com>, 
	linux-kernel@...r.kernel.org, jann@...jh.net, keescook@...omium.org, 
	skhan@...uxfoundation.org, linux-kernel-mentees@...ts.linux.dev
Subject: Re: [PATCH] futex: don't leak robust_list pointer on exec race

On Mon, Jun 9, 2025 at 3:15 PM Thomas Gleixner <tglx@...utronix.de> wrote:
>
> On Sat, Jun 07 2025 at 12:14, Pranav Tyagi wrote:
> > Previously, the get_robust_list() and compat_get_robust_list() syscalls
> > used rcu_read_lock() to access the target task's robust list pointer.
>
> This is not previously. It's how the code is right now and you want to
> describe the status as is.
>
> > However, rcu_read_lock() is not sufficient when the operation may sleep
> > (e.g., locking or user-spaces access), nor does it prevent the task from
> > exiting while the syscall is in progress.
>
> 1) There is no sleeping operation in this code path.
>
> 2) Of course it does not prevent the task from exiting, it only prevents
>    the task struct from being freed. But what's the actual problem with
>    that?
>
>    task->robust_list contains the tasks robust list pointer even after
>    the task exited.
>
> > This patch replaces rcu_read_lock() with
>
> git grep 'This patch' Documentation/process/
>
> > get_task_struct()/put_task_struct() to ensure the task is pinned during
> > syscall execution prevent use-after-free.
>
> Which use after free? You fail to explain the actual problem in the
> first place and then you describe a solution for the non explained
> problem.
>
> > Additionally, the robust_list pointer can be modified during exec()
> > causing a race condition if accessed concurrently.
>
> Sure, but what is the effect of that race condition?
>
> begin_new_exec() -> exec_mmap() -> exec_mm_release() ->
> futex_exec_release()
>
> The latter sets task::robust_list to NULL. So this either sees the real
> pointer or NULL. The actual problem is?
>
> A similar concurrency race exists between sys_get_robust_list() and
> sys_set_robust_list(), no?
>
> > ... To fix this we
>
> We do nothing. See
>
> https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#changelog
>
> > introduce down_read_killable(&exec_update_lock), a read lock on a
> > rw_semaphore protecting signal_struct members that change during exec.
>
> > This read-write semaphore allows multiple concurrent readers of
> > robust_list, while exec() takes the write lock to modify it, ensuring
> > synchronization.
>
> Synchronization of what?
>
> > This prevents an attacker from determining the robust_list or
> > compat_robust_list userspace pointer of a process created by executing
> > a setuid binary. Such an attack could be performed by racing
> > get_robust_list() with setuid execution. The impact of this issue is that
> > the attacker could theoretically bypass ASLR when attacking setuid
> > binaries.
>
> It can theoretically bypass ASLR. That's impressive given the fact that
> there are a gazillion ways to do so.
>
> Now let me ask the obvious question. How is the new code so much more
> protective?
>
> T1                                      T2
>
>                                         exit();
> sys_get_robust_list(T2)
>    lock();
>                                         lock();
>    head = T2->robust_list;
>    unlock();
>    copy_to_user(head);                  // T2 mops up
>    ....
>    return_to_user();
>
> T1 has the pointer, which is not longer valid. So what is this magic
> serialization actually preventing?
>
> Surely not the fact that the robust list pointer of T2 can be invalid
> immediately after T1 retrieved it.
>
> > Overall, this patch fixes a use-after-free and race condition by
> > properly pinning the task and synchronizing access to robust_list,
> > improving syscall safety and security.
>
> By handwaving....
>
> The real issue what Jann's original series is trying to address is that
> the ptrace_may_access() check is racy against a concurrent exec(). But
> that's nowhere mentionened in the above word salad.
>
> That said, I'm not opposed against the change per se, but it needs to
> come with a proper explanation of the actual problem and a description
> of the real world relevance of the issue in the existing code.
>

Hi,

Thank you for the detailed feedback. I appreciate the time and
insight you’ve shared, and I’ve learned a lot from it. I hope you’ll kindly
excuse the shortcomings in my initial changelog. I’m still new to kernel
work and upstreaming. I understand that my changelog
didn’t clearly convey the actual problem.

Does the revised version below address the concerns more effectively
or does it still need a bit more seasoning?

"Currently, sys_get_robust_list() and compat_get_robust_list() perform a
ptrace_may_access() check to verify if the calling task is allowed to
query another task’s robust_list pointer. However, this check is racy
against a concurrent exec() in the target process.

During exec(), a task's credentials and memory mappings can change, and
the task may transition from a non-privileged binary to a privileged one
(e.g., a setuid binary). If get_robust_list() performs ptrace_may_access()
before this transition, it may wrongly allow access to sensitive
information after the target becomes privileged.

To address this, a read lock is taken on signal->exec_update_lock prior
to invoking ptrace_may_access() and accessing the robust_list. This
ensures that the target task's exec state remains stable during the
check, allowing for consistent and synchronized validation of
credentials."

> Let me look at the code.
>
> > diff --git a/kernel/futex/syscalls.c b/kernel/futex/syscalls.c
> > index 4b6da9116aa6..27e44a304271 100644
> > --- a/kernel/futex/syscalls.c
> > +++ b/kernel/futex/syscalls.c
> > @@ -53,31 +53,43 @@ SYSCALL_DEFINE3(get_robust_list, int, pid,
> >       unsigned long ret;
> >       struct task_struct *p;
> >
> > -     rcu_read_lock();
> > -
> > -     ret = -ESRCH;
> > -     if (!pid)
> > +     if (!pid) {
> >               p = current;
> > -     else {
> > +             get_task_struct(p);
> > +     } else {
> > +             rcu_read_lock();
> >               p = find_task_by_vpid(pid);
> > +             /* pin the task to permit dropping the RCU read lock before
>
> This is not networking code. So please use proper comment style:
>
> https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#comment-style
>
> > +              * acquiring the semaphore
> > +              */
> > +             if (p)
> > +                     get_task_struct(p);
> > +             rcu_read_unlock();
> >               if (!p)
> > -                     goto err_unlock;
> > +                     return -ESRCH;
> >       }
> >
> > +     ret = down_read_killable(&p->signal->exec_update_lock);
>
> Lacks a comment explaining what this is actually protecting.
>
> >       if (put_user(sizeof(*head), len_ptr))
> >               return -EFAULT;
> >       return put_user(ptr_to_compat(head), head_ptr);
> >
> >  err_unlock:
> > -     rcu_read_unlock();
> > -
> > +     up_read(&p->signal->exec_update_lock);
> > +err_put:
> > +     put_task_struct(p);
> >       return ret;
>
> You really did not have a better idea than copying all of that logic
> into the compat code?

As I’m still learning, I wasn’t quite sure how to avoid
duplication there. Would factoring out the common logic into a helper function
be the right direction? I’d be grateful for your suggestion.

>
> Thanks,
>
>         tglx