linux-kernel - Re: [kernel-sched-cputime] question about probable bug in cputime

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 28 Jun 2017 08:03:07 +0200
From:   Frans Klaver <fransklaver@...il.com>
To:     "Gustavo A. R. Silva" <garsilva@...eddedor.com>
Cc:     Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [kernel-sched-cputime] question about probable bug in cputime_adjust()

On Wed, Jun 28, 2017 at 7:35 AM, Frans Klaver <fransklaver@...il.com> wrote:
> On Wed, Jun 28, 2017 at 1:03 AM, Gustavo A. R. Silva
> <garsilva@...eddedor.com> wrote:
>>
>> Hello everybody,
>>
>> While looking into Coverity ID 1371643 I ran into the following piece of
>> code at kernel/sched/cputime.c:568:
>>
>> 568/*
>> 569 * Adjust tick based cputime random precision against scheduler runtime
>> 570 * accounting.
>> 571 *
>> 572 * Tick based cputime accounting depend on random scheduling timeslices
>> of a
>> 573 * task to be interrupted or not by the timer.  Depending on these
>> 574 * circumstances, the number of these interrupts may be over or
>> 575 * under-optimistic, matching the real user and system cputime with a
>> variable
>> 576 * precision.
>> 577 *
>> 578 * Fix this by scaling these tick based values against the total runtime
>> 579 * accounted by the CFS scheduler.
>> 580 *
>> 581 * This code provides the following guarantees:
>> 582 *
>> 583 *   stime + utime == rtime
>> 584 *   stime_i+1 >= stime_i, utime_i+1 >= utime_i
>> 585 *
>> 586 * Assuming that rtime_i+1 >= rtime_i.
>> 587 */
>> 588static void cputime_adjust(struct task_cputime *curr,
>> 589                           struct prev_cputime *prev,
>> 590                           u64 *ut, u64 *st)
>> 591{
>> 592        u64 rtime, stime, utime;
>> 593        unsigned long flags;
>> 594
>> 595        /* Serialize concurrent callers such that we can honour our
>> guarantees */
>> 596        raw_spin_lock_irqsave(&prev->lock, flags);
>> 597        rtime = curr->sum_exec_runtime;
>> 598
>> 599        /*
>> 600         * This is possible under two circumstances:
>> 601         *  - rtime isn't monotonic after all (a bug);
>> 602         *  - we got reordered by the lock.
>> 603         *
>> 604         * In both cases this acts as a filter such that the rest of the
>> code
>> 605         * can assume it is monotonic regardless of anything else.
>> 606         */
>> 607        if (prev->stime + prev->utime >= rtime)
>> 608                goto out;
>> 609
>> 610        stime = curr->stime;
>> 611        utime = curr->utime;
>> 612
>> 613        /*
>> 614         * If either stime or both stime and utime are 0, assume all
>> runtime is
>> 615         * userspace. Once a task gets some ticks, the monotonicy code at
>> 616         * 'update' will ensure things converge to the observed ratio.
>> 617         */
>> 618        if (stime == 0) {
>> 619                utime = rtime;
>> 620                goto update;
>> 621        }
>> 622
>> 623        if (utime == 0) {
>> 624                stime = rtime;
>> 625                goto update;
>> 626        }
>> 627
>> 628        stime = scale_stime(stime, rtime, stime + utime);
>> 629
>> 630update:
>> 631        /*
>> 632         * Make sure stime doesn't go backwards; this preserves
>> monotonicity
>> 633         * for utime because rtime is monotonic.
>> 634         *
>> 635         *  utime_i+1 = rtime_i+1 - stime_i
>> 636         *            = rtime_i+1 - (rtime_i - utime_i)
>> 637         *            = (rtime_i+1 - rtime_i) + utime_i
>> 638         *            >= utime_i
>> 639         */
>> 640        if (stime < prev->stime)
>> 641                stime = prev->stime;
>> 642        utime = rtime - stime;
>> 643
>> 644        /*
>> 645         * Make sure utime doesn't go backwards; this still preserves
>> 646         * monotonicity for stime, analogous argument to above.
>> 647         */
>> 648        if (utime < prev->utime) {
>> 649                utime = prev->utime;
>> 650                stime = rtime - utime;
>> 651        }
>> 652
>> 653        prev->stime = stime;
>> 654        prev->utime = utime;
>> 655out:
>> 656        *ut = prev->utime;
>> 657        *st = prev->stime;
>> 658        raw_spin_unlock_irqrestore(&prev->lock, flags);
>> 659}
>>
>>
>> The issue here is that the value assigned to variable utime at line 619 is
>> overwritten at line 642, which would make such variable assignment useless.
>
> It isn't completely useless, since utime is used in line 628 to calculate stime.

Oh, I missed 'goto update'. Never mind about this one.


>> But I'm suspicious that such assignment is actually correct and that line
>> 642 should be included into the IF block at line 640. Something similar to
>> the following patch:
>>
>> --- a/kernel/sched/cputime.c
>> +++ b/kernel/sched/cputime.c
>> @@ -637,9 +637,10 @@ static void cputime_adjust(struct task_cputime *curr,
>>          *            = (rtime_i+1 - rtime_i) + utime_i
>>          *            >= utime_i
>>          */
>> -       if (stime < prev->stime)
>> +       if (stime < prev->stime) {
>>                 stime = prev->stime;
>> -       utime = rtime - stime;
>> +               utime = rtime - stime;
>> +       }
>>
>>
>> If you confirm this, I will send a patch in a full and proper form.
>>
>> I'd really appreciate your comments.
>
> If you do that, how would you meet the guarantee made in line 583?
>
> Frans