lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGudoHGwEYg7mpkD+deUhNT4TmYUmSgKr_xEVoNVUaQXsUhzGw@mail.gmail.com>
Date: Sun, 14 Sep 2025 19:48:10 +0200
From: Mateusz Guzik <mjguzik@...il.com>
To: Oleg Nesterov <oleg@...hat.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, Christian Brauner <brauner@...nel.org>, 
	Jiri Slaby <jirislaby@...nel.org>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/2] fix the racy usage of task_lock(tsk->group_leader) in
 sys_prlimit64() paths

On Sun, Sep 14, 2025 at 1:11 PM Oleg Nesterov <oleg@...hat.com> wrote:
>
> The usage of task_lock(tsk->group_leader) in sys_prlimit64()->do_prlimit()
> path is very broken.
>
> sys_prlimit64() does get_task_struct(tsk) but this only protects task_struct
> itself. If tsk != current and tsk is not a leader, this process can exit/exec
> and task_lock(tsk->group_leader) may use the already freed task_struct.
>
> Another problem is that sys_prlimit64() can race with mt-exec which changes
> ->group_leader. In this case do_prlimit() may take the wrong lock, or (worse)
> ->group_leader may change between task_lock() and task_unlock().
>
> Change sys_prlimit64() to take tasklist_lock when necessary. This is not
> nice, but I don't see a better fix for -stable.
>
> Cc: stable@...r.kernel.org
> Fixes: c022a0acad53 ("rlimits: implement prlimit64 syscall")

I think this is more accurate:
Fixes: 18c91bb2d872 ("prlimit: do not grab the tasklist_lock")

Unfortunately this syscall is used by glibc to get/set limits, the
good news is that almost all real-world calls (AFAICS) with the
calling task as the target. As in, performance-wise, this should not
be a regression and I agree it is more than adequate for stable.

As for something more longterm, what would you think about
synchronizing changes with a lock within ->signal? Preferably for
reading (the most common use case) this would use sequence counters.
Bonus points for avoiding any task ref/lock manipulation if task ==
current (again the most common case in real-world usage).

signal_struct already has holes, so things can be rearranged so that
the struct would not grow above what it is now.

I had a patch somewhere to that extent I could not be bothered to
finish, if this sounds like you a plan I may get around to it.


> Signed-off-by: Oleg Nesterov <oleg@...hat.com>
> ---
>  kernel/sys.c | 22 ++++++++++++++++++++--
>  1 file changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sys.c b/kernel/sys.c
> index 1e28b40053ce..36d66ff41611 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -1734,6 +1734,7 @@ SYSCALL_DEFINE4(prlimit64, pid_t, pid, unsigned int, resource,
>         struct rlimit old, new;
>         struct task_struct *tsk;
>         unsigned int checkflags = 0;
> +       bool need_tasklist;
>         int ret;
>
>         if (old_rlim)
> @@ -1760,8 +1761,25 @@ SYSCALL_DEFINE4(prlimit64, pid_t, pid, unsigned int, resource,
>         get_task_struct(tsk);
>         rcu_read_unlock();
>
> -       ret = do_prlimit(tsk, resource, new_rlim ? &new : NULL,
> -                       old_rlim ? &old : NULL);
> +       need_tasklist = !same_thread_group(tsk, current);
> +       if (need_tasklist) {
> +               /*
> +                * Ensure we can't race with group exit or de_thread(),
> +                * so tsk->group_leader can't be freed or changed until
> +                * read_unlock(tasklist_lock) below.
> +                */
> +               read_lock(&tasklist_lock);
> +               if (!pid_alive(tsk))
> +                       ret = -ESRCH;
> +       }
> +
> +       if (!ret) {
> +               ret = do_prlimit(tsk, resource, new_rlim ? &new : NULL,
> +                               old_rlim ? &old : NULL);
> +       }
> +
> +       if (need_tasklist)
> +               read_unlock(&tasklist_lock);
>
>         if (!ret && old_rlim) {
>                 rlim_to_rlim64(&old, &old64);
> --
> 2.25.1.362.g51ebf55
>
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ