lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250217105007.45ba8cb4@gandalf.local.home>
Date: Mon, 17 Feb 2025 10:50:07 -0500
From: Steven Rostedt <rostedt@...dmis.org>
To: Harshit Agarwal <harshit@...anix.com>, Peter Zijlstra
 <peterz@...radead.org>
Cc: Ingo Molnar <mingo@...hat.com>, Juri Lelli <juri.lelli@...hat.com>,
 Vincent Guittot <vincent.guittot@...aro.org>, Dietmar Eggemann
 <dietmar.eggemann@....com>, Ben Segall <bsegall@...gle.com>, Mel Gorman
 <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
 linux-kernel@...r.kernel.org, Jon Kohler <jon@...anix.com>, Gauri
 Patwardhan <gauri.patwardhan@...anix.com>, Rahul Chunduru
 <rahul.chunduru@...anix.com>, Will Ton <william.ton@...anix.com>
Subject: Re: [PATCH v2] sched/rt: Fix race in push_rt_task


FYI,

You should always send a new patch version as a separate thread. That's
because they can get lost in the thread and makes it harder for maintainers
to know what the next version of the patch is. I've picked the wrong patch
version before because there was another version sent that I missed.

On Thu, 13 Feb 2025 17:54:34 +0000
Harshit Agarwal <harshit@...anix.com> wrote:

> Solution
> ========
> The solution here is fairly simple. After obtaining the lock (at 4a),
> the check is enhanced to make sure that the task is still at the head of
> the pushable tasks list. If not, then it is anyway not suitable for
> being pushed out. The fix also removes any conditions that are no longer
> needed.
> 
> Testing
> =======
> The fix is tested on a cluster of 3 nodes, where the panics due to this
> are hit every couple of days. A fix similar to this was deployed on such
> cluster and was stable for more than 30 days.

May also want to add:

  Since 'is_migration_disabled()' a faster check than the others, it was moved
  to be the first check for consistency.

> 
> Co-developed-by: Jon Kohler <jon@...anix.com>
> Signed-off-by: Jon Kohler <jon@...anix.com>
> Co-developed-by: Gauri Patwardhan <gauri.patwardhan@...anix.com>
> Signed-off-by: Gauri Patwardhan <gauri.patwardhan@...anix.com>
> Co-developed-by: Rahul Chunduru <rahul.chunduru@...anix.com>
> Signed-off-by: Rahul Chunduru <rahul.chunduru@...anix.com>
> Signed-off-by: Harshit Agarwal <harshit@...anix.com>
> Tested-by: Will Ton <william.ton@...anix.com>
> ---

You can add here (after the above three dashes), how this version is
different from the last version. The text below the dashes and before the
patch is ignored by git, but is useful for reviewers. For instance:

Changes since v1: https://lore.kernel.org/all/20250211054646.23987-1-harshit@nutanix.com/

- Removed the redundant checks that task != pick_next_pushable_task() already has


Notice I added a link to the previous version. This helps find the previous
version without having to make this version a reply to it.

>  kernel/sched/rt.c | 54 +++++++++++++++++++++++------------------------
>  1 file changed, 26 insertions(+), 28 deletions(-)
> 
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index 4b8e33c615b1..4762dd3f50c5 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -1885,6 +1885,27 @@ static int find_lowest_rq(struct task_struct *task)
>  	return -1;
>  }
>  

Otherwise,

Reviewed-by: Steven Rostedt (Google) <rostedt@...dmis.org>

Peter, could you pick this up?

-- Steve

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ