[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=whfoVN6wiP5VHekckvqivRhpB+b1FnwyWEjz1SB2FN6HQ@mail.gmail.com>
Date: Fri, 2 Jul 2021 15:13:18 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Alexey Gladkov <legion@...nel.org>
Cc: "Eric W. Biederman" <ebiederm@...ssion.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Linux Containers <containers@...ts.linux.dev>
Subject: Re: [PATCH] ucounts: Fix UCOUNT_RLIMIT_SIGPENDING counter leak
On Fri, Jul 2, 2021 at 10:55 AM Alexey Gladkov <legion@...nel.org> wrote:
>
> @@ -424,10 +424,10 @@ __sigqueue_alloc(int sig, struct task_struct *t, gfp_t gfp_flags,
> * changes from/to zero.
> */
> rcu_read_lock();
> - ucounts = task_ucounts(t);
> + ucounts = ucounts_new = task_ucounts(t);
> sigpending = inc_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1);
> if (sigpending == 1)
> - ucounts = get_ucounts(ucounts);
> + ucounts_new = get_ucounts(ucounts);
> rcu_read_unlock();
I think this is still problematic.
If get_ucounts() fails, we can't just drop the RCU lock and (later)
use "ucounts" that we hold no reference to.
Or am I missing something? I'm not entirely sure about the lifetime of
that RCU protection, and I do note that "task_ucounts()" uses
"task_cred_xxx()", which already does
rcu_read_lock()/rcu_read_unlock() in the actual access.
So I'm thinking the code could/should be written something like this instead:
diff --git a/kernel/signal.c b/kernel/signal.c
index f6371dfa1f89..40781b197227 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -422,22 +422,33 @@ __sigqueue_alloc(int sig, struct task_struct
*t, gfp_t gfp_flags,
* NOTE! A pending signal will hold on to the user refcount,
* and we get/put the refcount only when the sigpending count
* changes from/to zero.
+ *
+ * And if the ucount rlimit overflowed, we do not get to use it at all.
*/
rcu_read_lock();
ucounts = task_ucounts(t);
sigpending = inc_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1);
- if (sigpending == 1)
- ucounts = get_ucounts(ucounts);
+ switch (sigpending) {
+ case 1:
+ if (likely(get_ucounts(ucounts)))
+ break;
+
+ dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1);
+ fallthrough;
+ case LONG_MAX:
+ rcu_read_unlock();
+ return NULL;
+ }
rcu_read_unlock();
- if (override_rlimit || (sigpending < LONG_MAX && sigpending <=
task_rlimit(t, RLIMIT_SIGPENDING))) {
+ if (override_rlimit || sigpending <= task_rlimit(t,
RLIMIT_SIGPENDING)) {
q = kmem_cache_alloc(sigqueue_cachep, gfp_flags);
} else {
print_dropped_signal(sig);
}
if (unlikely(q == NULL)) {
- if (ucounts && dec_rlimit_ucounts(ucounts,
UCOUNT_RLIMIT_SIGPENDING, 1))
+ if (dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1))
put_ucounts(ucounts);
} else {
INIT_LIST_HEAD(&q->list);
(and no, I'm not sure it's a good idea to make that use a "switch()" -
maybe the LONG_MAX case should be a "if (unlikely())" thing after the
rcu_read_ulock() instead?
Hmm?
The alternate thing is to say "No, Linus, you're a nincompoop and
wrong, that RCU protection is a non-issue because we hold a reference
to the task, and task_ucounts() will not change, so the RCU read lock
doesn't do anything".
Although then I would think the rcu_read_lock/rcu_read_unlock here is
entirely pointless.
Linus
Powered by blists - more mailing lists