[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGudoHGwEYg7mpkD+deUhNT4TmYUmSgKr_xEVoNVUaQXsUhzGw@mail.gmail.com>
Date: Sun, 14 Sep 2025 19:48:10 +0200
From: Mateusz Guzik <mjguzik@...il.com>
To: Oleg Nesterov <oleg@...hat.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, Christian Brauner <brauner@...nel.org>,
Jiri Slaby <jirislaby@...nel.org>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/2] fix the racy usage of task_lock(tsk->group_leader) in
sys_prlimit64() paths
On Sun, Sep 14, 2025 at 1:11 PM Oleg Nesterov <oleg@...hat.com> wrote:
>
> The usage of task_lock(tsk->group_leader) in sys_prlimit64()->do_prlimit()
> path is very broken.
>
> sys_prlimit64() does get_task_struct(tsk) but this only protects task_struct
> itself. If tsk != current and tsk is not a leader, this process can exit/exec
> and task_lock(tsk->group_leader) may use the already freed task_struct.
>
> Another problem is that sys_prlimit64() can race with mt-exec which changes
> ->group_leader. In this case do_prlimit() may take the wrong lock, or (worse)
> ->group_leader may change between task_lock() and task_unlock().
>
> Change sys_prlimit64() to take tasklist_lock when necessary. This is not
> nice, but I don't see a better fix for -stable.
>
> Cc: stable@...r.kernel.org
> Fixes: c022a0acad53 ("rlimits: implement prlimit64 syscall")
I think this is more accurate:
Fixes: 18c91bb2d872 ("prlimit: do not grab the tasklist_lock")
Unfortunately this syscall is used by glibc to get/set limits, the
good news is that almost all real-world calls (AFAICS) with the
calling task as the target. As in, performance-wise, this should not
be a regression and I agree it is more than adequate for stable.
As for something more longterm, what would you think about
synchronizing changes with a lock within ->signal? Preferably for
reading (the most common use case) this would use sequence counters.
Bonus points for avoiding any task ref/lock manipulation if task ==
current (again the most common case in real-world usage).
signal_struct already has holes, so things can be rearranged so that
the struct would not grow above what it is now.
I had a patch somewhere to that extent I could not be bothered to
finish, if this sounds like you a plan I may get around to it.
> Signed-off-by: Oleg Nesterov <oleg@...hat.com>
> ---
> kernel/sys.c | 22 ++++++++++++++++++++--
> 1 file changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sys.c b/kernel/sys.c
> index 1e28b40053ce..36d66ff41611 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -1734,6 +1734,7 @@ SYSCALL_DEFINE4(prlimit64, pid_t, pid, unsigned int, resource,
> struct rlimit old, new;
> struct task_struct *tsk;
> unsigned int checkflags = 0;
> + bool need_tasklist;
> int ret;
>
> if (old_rlim)
> @@ -1760,8 +1761,25 @@ SYSCALL_DEFINE4(prlimit64, pid_t, pid, unsigned int, resource,
> get_task_struct(tsk);
> rcu_read_unlock();
>
> - ret = do_prlimit(tsk, resource, new_rlim ? &new : NULL,
> - old_rlim ? &old : NULL);
> + need_tasklist = !same_thread_group(tsk, current);
> + if (need_tasklist) {
> + /*
> + * Ensure we can't race with group exit or de_thread(),
> + * so tsk->group_leader can't be freed or changed until
> + * read_unlock(tasklist_lock) below.
> + */
> + read_lock(&tasklist_lock);
> + if (!pid_alive(tsk))
> + ret = -ESRCH;
> + }
> +
> + if (!ret) {
> + ret = do_prlimit(tsk, resource, new_rlim ? &new : NULL,
> + old_rlim ? &old : NULL);
> + }
> +
> + if (need_tasklist)
> + read_unlock(&tasklist_lock);
>
> if (!ret && old_rlim) {
> rlim_to_rlim64(&old, &old64);
> --
> 2.25.1.362.g51ebf55
>
>
Powered by blists - more mailing lists