[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wj33F7LKQS3wCYJtb_yCzYhbjPzFqhVS_ZPPNOWTTMHFQ@mail.gmail.com>
Date: Thu, 27 Dec 2018 17:36:47 -0800
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Tejun Heo <tj@...nel.org>
Cc: Vincent Guittot <vincent.guittot@...aro.org>,
Sargun Dhillon <sargun@...gun.me>,
Xie XiuQi <xiexiuqi@...wei.com>,
Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>, xiezhipeng1@...wei.com,
huawei.libin@...wei.com,
linux-kernel <linux-kernel@...r.kernel.org>,
Dmitry Adamushko <dmitry.adamushko@...il.com>,
Rik van Riel <riel@...riel.com>
Subject: Re: [PATCH] sched: fix infinity loop in update_blocked_averages
On Thu, Dec 27, 2018 at 5:15 PM Tejun Heo <tj@...nel.org> wrote:
>
> I'm pretty sure enqueue_entity() *has* to be called with rq lock.
> unthrottle_cfs_rq() is called from tg_set_cfs_bandwidth(),
> distribute_cfs_runtime() and unthrottle_offline_cfs_rqs. The first
> two grabs the rq_lock just around the calls and the last one has a
> lockdep assert on the rq_lock. What am I missing?
No, I think you're right, and I just didn't follow things deep enough,
didn't see any rq locking in the loop in unthrottle_offline_cfs_rqs(),
and didn't realize that the rq is locked by the caller.
> > But that still makes me go "how come is this only noticed 18 months
> > after the fact"?
>
> Unless I'm totally confused, which is definitely possible, I don't
> think there's a race condition and the only bug is the
> tmp_alone_branch pointer getting dangled, which maybe doesn't happen
> all that much?
Ahh. That would explain the list corruption. The next
list_add_leaf_cfs_rq() could try to add to a removed entry.
How would you reset it? Do something like
rq->tmp_alone_branch = &rq->leaf_cfs_rq_list;
for every removal, or make it conditional on it matching the removed entry?
Linus
Powered by blists - more mailing lists