linux-kernel - Re: [External] Re: Fwd: WARNING: CPU: 13 PID: 3837105 at kernel/sched/sched.h:1561 __cfsb_csd

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <55e2861e-9722-08f8-2c49-966035ff4218@bytedance.com>
Date:   Thu, 31 Aug 2023 16:48:29 +0800
From:   Hao Jia <jiahao.os@...edance.com>
To:     Benjamin Segall <bsegall@...gle.com>,
        Bagas Sanjaya <bagasdotme@...il.com>
Cc:     Vincent Guittot <vincent.guittot@...aro.org>,
        "Peter Zijlstra (Intel)" <peterz@...radead.org>,
        Igor Raits <igor.raits@...il.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Linux Regressions <regressions@...ts.linux.dev>,
        Linux Stable <stable@...r.kernel.org>
Subject: Re: [External] Re: Fwd: WARNING: CPU: 13 PID: 3837105 at
 kernel/sched/sched.h:1561 __cfsb_csd_unthrottle+0x149/0x160



On 2023/8/31 Benjamin Segall wrote:
Hi,

> Bagas Sanjaya <bagasdotme@...il.com> writes:
> 
>> Hi,
>>
>> I notice a regression report on Bugzilla [1]. Quoting from it:
>>
>>> Hello, we recently got a few kernel crashes with following backtrace. Happened on 6.4.12 (and 6.4.11 I think) but did not happen (I think) on 6.4.4.
>>>
>>> [293790.928007] ------------[ cut here ]------------
>>> [293790.929905] rq->clock_update_flags & RQCF_ACT_SKIP
>>> [293790.929919] WARNING: CPU: 13 PID: 3837105 at kernel/sched/sched.h:1561 __cfsb_csd_unthrottle+0x149/0x160
>>> [293790.933694] Modules linked in: [...]
>>> [293790.946262] Unloaded tainted modules: edac_mce_amd(E):1
>>> [293790.956625] CPU: 13 PID: 3837105 Comm: QueryWorker-30f Tainted: G        W   E      6.4.12-1.gdc.el9.x86_64 #1
>>> [293790.957963] Hardware name: RDO OpenStack Compute/RHEL, BIOS edk2-20230301gitf80f052277c8-2.el9 03/01/2023
>>> [293790.959681] RIP: 0010:__cfsb_csd_unthrottle+0x149/0x160
>>
>> See Bugzilla for the full thread.
>>
>> Anyway, I'm adding this regression to regzbot:
>>
>> #regzbot introduced: ebb83d84e49b54 https://bugzilla.kernel.org/show_bug.cgi?id=217843
>>
>> Thanks.
>>
>> [1]: https://bugzilla.kernel.org/show_bug.cgi?id=217843
> 
> The code in question is literally "rq_lock; update_rq_clock;
> rq_clock_start_loop_update (the warning)", which suggests to me that
> RQCF_ACT_SKIP is somehow leaking from somewhere else?

If I understand correctly, rq->clock_update_flags may be set to 
RQCF_ACT_SKIP after __schedule() holds the rq lock, and sometimes the rq 
lock may be released briefly in __schedule(), such as newidle_balance(). 
At this time Other CPUs hold this rq lock, and then calling 
rq_clock_start_loop_update() may trigger this warning.

This warning check might be wrong. We need to add assert_clock_updated() 
to check that the rq clock has been updated before calling 
rq_clock_start_loop_update().

Maybe some things can be like this?

From: Hao Jia <jiahao.os@...edance.com>
Date: Thu, 31 Aug 2023 11:38:54 +0800
Subject: [PATCH] sched/core: Fix wrong warning check in 
rq_clock_start_loop_update()

Commit ebb83d84e49b54 ("sched/core: Avoid multiple
calling update_rq_clock() in __cfsb_csd_unthrottle()")
add "rq->clock_update_flags & RQCF_ACT_SKIP" warning in
rq_clock_start_loop_update().
But this warning is inaccurate and may be triggered
incorrectly in the following situations:

     CPU0                                      CPU1

__schedule()
   *rq->clock_update_flags <<= 1;*   unregister_fair_sched_group()
   pick_next_task_fair+0x4a/0x410      destroy_cfs_bandwidth()
     newidle_balance+0x115/0x3e0       for_each_possible_cpu(i) *i=0*
       rq_unpin_lock(this_rq, rf)      __cfsb_csd_unthrottle()
       raw_spin_rq_unlock(this_rq)
                                       rq_lock(*CPU0_rq*, &rf)
                                       rq_clock_start_loop_update()
                                       rq->clock_update_flags & 
RQCF_ACT_SKIP <--

       raw_spin_rq_lock(this_rq)

So we remove this wrong check. Add assert_clock_updated() to
check that rq clock has been updated before calling
rq_clock_start_loop_update(). And use the variable rq_clock_flags
in rq_clock_start_loop_update() to record the previous state of
rq->clock_update_flags. Correspondingly, restore rq->clock_update_flags
through rq_clock_flags in rq_clock_stop_loop_update() to prevent
losing its previous information.

Fixes: ebb83d84e49b ("sched/core: Avoid multiple calling 
update_rq_clock() in __cfsb_csd_unthrottle()")
Cc: stable@...r.kernel.org
Reported-by: Igor Raits <igor.raits@...il.com>
Reported-by: Bagas Sanjaya <bagasdotme@...il.com>
Signed-off-by: Hao Jia <jiahao.os@...edance.com>
---
  kernel/sched/fair.c  | 10 ++++++----
  kernel/sched/sched.h | 12 +++++++-----
  2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 911d0063763c..0f6557c82a4c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5679,6 +5679,7 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
  #ifdef CONFIG_SMP
  static void __cfsb_csd_unthrottle(void *arg)
  {
+	unsigned int rq_clock_flags;
  	struct cfs_rq *cursor, *tmp;
  	struct rq *rq = arg;
  	struct rq_flags rf;
@@ -5691,7 +5692,7 @@ static void __cfsb_csd_unthrottle(void *arg)
  	 * Do it once and skip the potential next ones.
  	 */
  	update_rq_clock(rq);
-	rq_clock_start_loop_update(rq);
+	rq_clock_start_loop_update(rq, &rq_clock_flags);

  	/*
  	 * Since we hold rq lock we're safe from concurrent manipulation of
@@ -5712,7 +5713,7 @@ static void __cfsb_csd_unthrottle(void *arg)

  	rcu_read_unlock();

-	rq_clock_stop_loop_update(rq);
+	rq_clock_stop_loop_update(rq, &rq_clock_flags);
  	rq_unlock(rq, &rf);
  }

@@ -6230,6 +6231,7 @@ static void __maybe_unused 
update_runtime_enabled(struct rq *rq)
  /* cpu offline callback */
  static void __maybe_unused unthrottle_offline_cfs_rqs(struct rq *rq)
  {
+	unsigned int rq_clock_flags;
  	struct task_group *tg;

  	lockdep_assert_rq_held(rq);
@@ -6239,7 +6241,7 @@ static void __maybe_unused 
unthrottle_offline_cfs_rqs(struct rq *rq)
  	 * set_rq_offline(), so we should skip updating
  	 * the rq clock again in unthrottle_cfs_rq().
  	 */
-	rq_clock_start_loop_update(rq);
+	rq_clock_start_loop_update(rq, &rq_clock_flags);

  	rcu_read_lock();
  	list_for_each_entry_rcu(tg, &task_groups, list) {
@@ -6264,7 +6266,7 @@ static void __maybe_unused 
unthrottle_offline_cfs_rqs(struct rq *rq)
  	}
  	rcu_read_unlock();

-	rq_clock_stop_loop_update(rq);
+	rq_clock_stop_loop_update(rq, &rq_clock_flags);
  }

  bool cfs_task_bw_constrained(struct task_struct *p)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 04846272409c..ff2864f202f5 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1558,20 +1558,22 @@ static inline void 
rq_clock_cancel_skipupdate(struct rq *rq)
   * when using list_for_each_entry_*)
   * rq_clock_start_loop_update() can be called after updating the clock
   * once and before iterating over the list to prevent multiple update.
+ * And use @rq_clock_flags to record the previous state of 
rq->clock_update_flags.
   * After the iterative traversal, we need to call 
rq_clock_stop_loop_update()
- * to clear RQCF_ACT_SKIP of rq->clock_update_flags.
+ * to restore rq->clock_update_flags through @rq_clock_flags.
   */
-static inline void rq_clock_start_loop_update(struct rq *rq)
+static inline void rq_clock_start_loop_update(struct rq *rq, unsigned 
int *rq_clock_flags)
  {
  	lockdep_assert_rq_held(rq);
-	SCHED_WARN_ON(rq->clock_update_flags & RQCF_ACT_SKIP);
+	assert_clock_updated(rq);
+	*rq_clock_flags = rq->clock_update_flags;
  	rq->clock_update_flags |= RQCF_ACT_SKIP;
  }

-static inline void rq_clock_stop_loop_update(struct rq *rq)
+static inline void rq_clock_stop_loop_update(struct rq *rq, unsigned 
int *rq_clock_flags)
  {
  	lockdep_assert_rq_held(rq);
-	rq->clock_update_flags &= ~RQCF_ACT_SKIP;
+	rq->clock_update_flags = *rq_clock_flags;
  }

  struct rq_flags {
--
2.20.1