linux-kernel - Re: [PATCH] sched: fix infinity loop in update_blocked

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHk-=wj33F7LKQS3wCYJtb_yCzYhbjPzFqhVS_ZPPNOWTTMHFQ@mail.gmail.com>
Date:   Thu, 27 Dec 2018 17:36:47 -0800
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Tejun Heo <tj@...nel.org>
Cc:     Vincent Guittot <vincent.guittot@...aro.org>,
        Sargun Dhillon <sargun@...gun.me>,
        Xie XiuQi <xiexiuqi@...wei.com>,
        Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>, xiezhipeng1@...wei.com,
        huawei.libin@...wei.com,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Dmitry Adamushko <dmitry.adamushko@...il.com>,
        Rik van Riel <riel@...riel.com>
Subject: Re: [PATCH] sched: fix infinity loop in update_blocked_averages

On Thu, Dec 27, 2018 at 5:15 PM Tejun Heo <tj@...nel.org> wrote:
>
> I'm pretty sure enqueue_entity() *has* to be called with rq lock.
> unthrottle_cfs_rq() is called from tg_set_cfs_bandwidth(),
> distribute_cfs_runtime() and unthrottle_offline_cfs_rqs.  The first
> two grabs the rq_lock just around the calls and the last one has a
> lockdep assert on the rq_lock.  What am I missing?

No, I think you're right, and I just didn't follow things deep enough,
didn't see any rq locking in the loop in unthrottle_offline_cfs_rqs(),
and didn't realize that the rq is locked by the caller.

> > But that still makes me go "how come is this only noticed 18 months
> > after the fact"?
>
> Unless I'm totally confused, which is definitely possible, I don't
> think there's a race condition and the only bug is the
> tmp_alone_branch pointer getting dangled, which maybe doesn't happen
> all that much?

Ahh. That would explain the list corruption. The next
list_add_leaf_cfs_rq() could try to add to a removed entry.

How would you reset it? Do something like

       rq->tmp_alone_branch = &rq->leaf_cfs_rq_list;

for every removal, or make it conditional on it matching the removed entry?

            Linus