linux-kernel - Re: [bug report] resctrl high memory comsumption

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALvZod7qkn9OZUa0GLAyBv0QBt4A0=APdEqWp1RxMbok8mn03w@mail.gmail.com>
Date:   Wed, 8 Jan 2020 13:20:19 -0800
From:   Shakeel Butt <shakeelb@...gle.com>
To:     Fenghua Yu <fenghua.yu@...el.com>
Cc:     Reinette Chatre <reinette.chatre@...el.com>,
        Borislav Petkov <bp@...en8.de>,
        LKML <linux-kernel@...r.kernel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, x86@...nel.org
Subject: Re: [bug report] resctrl high memory comsumption

On Wed, Jan 8, 2020 at 12:12 PM Fenghua Yu <fenghua.yu@...el.com> wrote:
>
> On Wed, Jan 08, 2020 at 09:07:41AM -0800, Shakeel Butt wrote:
> > Hi,
> >
> > Recently we had a bug in the system software writing the same pids to
> > the tasks file of resctrl group multiple times. The resctrl code
> > allocates "struct task_move_callback" for each such write and call
> > task_work_add() for that task to handle it on return to user-space
> > without checking if such request already exist for that particular
> > task. The issue arises for long sleeping tasks which has thousands for
> > such request queued to be handled. On our production, we notice
> > thousands of tasks having thousands of such requests and taking GiBs
> > of memory for "struct task_move_callback". I am not very familiar with
> > the code to judge if task_work_cancel() is the right approach or just
> > checking closid/rmid before doing task_work_add().
> >
>
> Thank you for reporting the issue, Shakeel!
>
> Could you please check if the following patch fixes the issue?
> From 3c23c39b6a44fdfbbbe0083d074dcc114d7d7f1c Mon Sep 17 00:00:00 2001
> From: Fenghua Yu <fenghua.yu@...el.com>
> Date: Wed, 8 Jan 2020 19:53:33 +0000
> Subject: [RFC PATCH] x86/resctrl: Fix redundant task movements
>
> Currently a task can be moved to a rdtgroup multiple times.
> But, this can cause multiple task works are added, waste memory
> and degrade performance.
>
> To fix the issue, only move the task to a rdtgroup when the task
> is not in the rdgroup. Don't try to move the task to the rdtgroup
> again when the task is already in the rdtgroup.
>
> Reported-by: Shakeel Butt <shakeelb@...gle.com>
> Signed-off-by: Fenghua Yu <fenghua.yu@...el.com>
> ---
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 2e3b06d6bbc6..75300c4a5969 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -546,6 +546,17 @@ static int __rdtgroup_move_task(struct task_struct *tsk,
>         struct task_move_callback *callback;
>         int ret;
>
> +       /* If the task is already in rdtgrp, don't move the task. */
> +       if ((rdtgrp->type == RDTCTRL_GROUP && tsk->closid == rdtgrp->closid &&
> +           tsk->rmid == rdtgrp->mon.rmid) ||
> +           (rdtgrp->type == RDTMON_GROUP &&
> +            rdtgrp->mon.parent->closid == tsk->closid &&
> +            tsk->rmid == rdtgrp->mon.rmid)) {
> +               rdt_last_cmd_puts("Task is already in the rdgroup\n");
> +
> +               return -EINVAL;

Why not just return success if the task is already in that group (i.e.
just follow the cgroup behavior).

> +       }
> +
>         callback = kzalloc(sizeof(*callback), GFP_KERNEL);
>         if (!callback)
>                 return -ENOMEM;
> --
> 2.19.1
>