[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZnQ4Q72H7oNOnJ4Y@jlelli-thinkpadt14gen4.remote.csb>
Date: Thu, 20 Jun 2024 16:10:11 +0200
From: Juri Lelli <juri.lelli@...hat.com>
To: Wander Lairson Costa <wander@...hat.com>
Cc: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Valentin Schneider <vschneid@...hat.com>,
"open list:SCHEDULER" <linux-kernel@...r.kernel.org>,
rrendec@...hat.com
Subject: Re: [PATCH v2] sched/deadline: fix task_struct reference leak
On 20/06/24 09:56, Wander Lairson Costa wrote:
> During the execution of the following stress test with linux-rt:
>
> stress-ng --cyclic 30 --timeout 30 --minimize --quiet
>
> kmemleak frequently reported a memory leak concerning the task_struct:
>
> unreferenced object 0xffff8881305b8000 (size 16136):
> comm "stress-ng", pid 614, jiffies 4294883961 (age 286.412s)
> object hex dump (first 32 bytes):
> 02 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 .@..............
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> debug hex dump (first 16 bytes):
> 53 09 00 00 00 00 00 00 00 00 00 00 00 00 00 00 S...............
> backtrace:
> [<00000000046b6790>] dup_task_struct+0x30/0x540
> [<00000000c5ca0f0b>] copy_process+0x3d9/0x50e0
> [<00000000ced59777>] kernel_clone+0xb0/0x770
> [<00000000a50befdc>] __do_sys_clone+0xb6/0xf0
> [<000000001dbf2008>] do_syscall_64+0x5d/0xf0
> [<00000000552900ff>] entry_SYSCALL_64_after_hwframe+0x6e/0x76
>
> The issue occurs in start_dl_timer(), which increments the task_struct
> reference count and sets a timer. The timer callback, dl_task_timer,
> is supposed to decrement the reference count upon expiration. However,
> if enqueue_task_dl() is called before the timer expires and cancels it,
> the reference count is not decremented, leading to the leak.
>
> This patch fixes the reference leak by ensuring the task_struct
> reference count is properly decremented when the timer is canceled.
>
> ---
>
> Changelog:
>
> v2:
> * Add the fixes tag
> * Add a comment about canceling the timer while the callback was running
Uh, looks like these bits above should come after the SoB section so
that they are ignored when applying the patch? But, maybe, Peter, you
can fix that if you decide to take this version?
>
> Signed-off-by: Wander Lairson Costa <wander@...hat.com>
> Fixes: feff2e65efd8 ("sched/deadline: Unthrottle PI boosted threads while enqueuing")
Apart from that
Acked-by: Juri Lelli <juri.lelli@...hat.com>
Thanks!
Juri
> ---
> kernel/sched/deadline.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index c75d1307d86d..9bedd148f007 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1804,8 +1804,13 @@ static void enqueue_task_dl(struct rq *rq, struct task_struct *p, int flags)
> * The replenish timer needs to be canceled. No
> * problem if it fires concurrently: boosted threads
> * are ignored in dl_task_timer().
> + *
> + * If the timer callback was running (hrtimer_try_to_cancel == -1),
> + * it will eventually call put_task_struct().
> */
> - hrtimer_try_to_cancel(&p->dl.dl_timer);
> + if (hrtimer_try_to_cancel(&p->dl.dl_timer) == 1 &&
> + !dl_server(&p->dl))
> + put_task_struct(p);
> p->dl.dl_throttled = 0;
> }
> } else if (!dl_prio(p->normal_prio)) {
> --
> 2.45.2
>
Powered by blists - more mailing lists