linux-kernel - Re: [PATCH] futex: don't leak robust

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87cybdp7to.ffs@tglx>
Date: Mon, 09 Jun 2025 11:45:07 +0200
From: Thomas Gleixner <tglx@...utronix.de>
To: Pranav Tyagi <pranav.tyagi03@...il.com>, Ingo Molnar <mingo@...hat.com>,
 Peter Zijlstra <peterz@...radead.org>, Darren Hart <dvhart@...radead.org>,
 Davidlohr Bueso <dave@...olabs.net>, André Almeida
 <andrealmeid@...lia.com>, linux-kernel@...r.kernel.org
Cc: jann@...jh.net, keescook@...omium.org, skhan@...uxfoundation.org,
 linux-kernel-mentees@...ts.linux.dev, Pranav Tyagi
 <pranav.tyagi03@...il.com>
Subject: Re: [PATCH] futex: don't leak robust_list pointer on exec race

On Sat, Jun 07 2025 at 12:14, Pranav Tyagi wrote:
> Previously, the get_robust_list() and compat_get_robust_list() syscalls
> used rcu_read_lock() to access the target task's robust list pointer.

This is not previously. It's how the code is right now and you want to
describe the status as is.

> However, rcu_read_lock() is not sufficient when the operation may sleep
> (e.g., locking or user-spaces access), nor does it prevent the task from
> exiting while the syscall is in progress.

1) There is no sleeping operation in this code path.

2) Of course it does not prevent the task from exiting, it only prevents
   the task struct from being freed. But what's the actual problem with
   that?

   task->robust_list contains the tasks robust list pointer even after
   the task exited. 

> This patch replaces rcu_read_lock() with

git grep 'This patch' Documentation/process/

> get_task_struct()/put_task_struct() to ensure the task is pinned during
> syscall execution prevent use-after-free.

Which use after free? You fail to explain the actual problem in the
first place and then you describe a solution for the non explained
problem.

> Additionally, the robust_list pointer can be modified during exec()
> causing a race condition if accessed concurrently.

Sure, but what is the effect of that race condition?

begin_new_exec() -> exec_mmap() -> exec_mm_release() ->
futex_exec_release()

The latter sets task::robust_list to NULL. So this either sees the real
pointer or NULL. The actual problem is?

A similar concurrency race exists between sys_get_robust_list() and
sys_set_robust_list(), no?

> ... To fix this we

We do nothing. See

https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#changelog

> introduce down_read_killable(&exec_update_lock), a read lock on a
> rw_semaphore protecting signal_struct members that change during exec.

> This read-write semaphore allows multiple concurrent readers of
> robust_list, while exec() takes the write lock to modify it, ensuring
> synchronization.

Synchronization of what?

> This prevents an attacker from determining the robust_list or
> compat_robust_list userspace pointer of a process created by executing
> a setuid binary. Such an attack could be performed by racing
> get_robust_list() with setuid execution. The impact of this issue is that
> the attacker could theoretically bypass ASLR when attacking setuid
> binaries.

It can theoretically bypass ASLR. That's impressive given the fact that
there are a gazillion ways to do so.

Now let me ask the obvious question. How is the new code so much more
protective?

T1                                      T2

                                        exit();
sys_get_robust_list(T2)
   lock();
   					lock();
   head = T2->robust_list;
   unlock();                            
   copy_to_user(head);                  // T2 mops up
   ....
   return_to_user();

T1 has the pointer, which is not longer valid. So what is this magic
serialization actually preventing?

Surely not the fact that the robust list pointer of T2 can be invalid
immediately after T1 retrieved it.

> Overall, this patch fixes a use-after-free and race condition by
> properly pinning the task and synchronizing access to robust_list,
> improving syscall safety and security.

By handwaving....

The real issue what Jann's original series is trying to address is that
the ptrace_may_access() check is racy against a concurrent exec(). But
that's nowhere mentionened in the above word salad.

That said, I'm not opposed against the change per se, but it needs to
come with a proper explanation of the actual problem and a description
of the real world relevance of the issue in the existing code.

Let me look at the code.

> diff --git a/kernel/futex/syscalls.c b/kernel/futex/syscalls.c
> index 4b6da9116aa6..27e44a304271 100644
> --- a/kernel/futex/syscalls.c
> +++ b/kernel/futex/syscalls.c
> @@ -53,31 +53,43 @@ SYSCALL_DEFINE3(get_robust_list, int, pid,
>  	unsigned long ret;
>  	struct task_struct *p;
>  
> -	rcu_read_lock();
> -
> -	ret = -ESRCH;
> -	if (!pid)
> +	if (!pid) {
>  		p = current;
> -	else {
> +		get_task_struct(p);
> +	} else {
> +		rcu_read_lock();
>  		p = find_task_by_vpid(pid);
> +		/* pin the task to permit dropping the RCU read lock before

This is not networking code. So please use proper comment style:

https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#comment-style

> +		 * acquiring the semaphore
> +		 */
> +		if (p)
> +			get_task_struct(p);
> +		rcu_read_unlock();
>  		if (!p)
> -			goto err_unlock;
> +			return -ESRCH;
>  	}
>  
> +	ret = down_read_killable(&p->signal->exec_update_lock);

Lacks a comment explaining what this is actually protecting.

>  	if (put_user(sizeof(*head), len_ptr))
>  		return -EFAULT;
>  	return put_user(ptr_to_compat(head), head_ptr);
>  
>  err_unlock:
> -	rcu_read_unlock();
> -
> +	up_read(&p->signal->exec_update_lock);
> +err_put:
> +	put_task_struct(p);
>  	return ret;

You really did not have a better idea than copying all of that logic
into the compat code?

Thanks,

        tglx