linux-kernel - Re: [PATCH] sched/fair: sync expires_seq in distribute_cfs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAM_iQpVkU9Qf_V+DvUuqxLqsiGN=Qg+Uyt1Kpiu5HuQVRi=KqQ@mail.gmail.com>
Date:   Fri, 3 Aug 2018 11:57:31 -0700
From:   Cong Wang <xiyou.wangcong@...il.com>
To:     Xunlei Pang <xlpang@...ux.alibaba.com>
Cc:     Ben Segall <bsegall@...gle.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH] sched/fair: sync expires_seq in distribute_cfs_runtime()

On Tue, Jul 31, 2018 at 8:24 PM Xunlei Pang <xlpang@...ux.alibaba.com> wrote:
>
> Let's see the unthrottle cases.
> 1. for the periodic timer
> distribute_cfs_runtime updates the throttled cfs_rq->runtime_expires to
> be a new value, so expire_cfs_rq_runtime does nothing because of:
>   rq_clock(rq_of(cfs_rq)) - cfs_rq->runtime_expires < 0
>
> Afterwards assign_cfs_rq_runtime() will sync its expires_seq.

Is there any guarantee rq_clock(cfs_rq) is always ahead of
cfs_rq->runtime_expires in this case?

I doubt, because cfs_rq->runtime_expires could be assigned
by a sched_clock() on a different CPU running the periodic timer.

Also, rq_clock() is behind sched_clock() on the same CPU too,
sometimes it is merely hundreds of nanoseconds, sometimes it is
tens of thousands nanoseconds in my environment. (I have a
different patch to address this, but still not sure if it is correct.)


>
> 2. for the slack timer
> the two expires_seq should be the same, so if clock drift happens soon,
> expire_cfs_rq_runtime regards it as true clock drift:
>   cfs_rq->runtime_expires += TICK_NSEC
> If it happens that global expires_seq advances, it also doesn't matter,
> expire_cfs_rq_runtime will clear the stale expire_cfs_rq_runtime as
> expected.

Hmm, looks like due to the runtime_refresh_within() check in
slack timer.



>
> >
> >>
> >> Nothing /important/ goes wrong because distribute_cfs_runtime only fills
> >> runtime_remaining up to 1, not a real amount.
> >
> > No, runtime_remaining is updated right before expire_cfs_rq_runtime():
> >
> > static void __account_cfs_rq_runtime(struct cfs_rq *cfs_rq, u64 delta_exec)
> > {
> >         /* dock delta_exec before expiring quota (as it could span periods) */
> >         cfs_rq->runtime_remaining -= delta_exec;
> >         expire_cfs_rq_runtime(cfs_rq);
> >
> > so almost certainly it can't be 1.
>
> I think Ben means it firstly gets a distributtion of 1 to run after
> unthrottling, soon it will have a negative runtime_remaining, and go
> to assign_cfs_rq_runtime().

That is obvious, being 1 in distribute_cfs_runtime is not relevant to the
discussion here.