linux-kernel - Re: [PATCH] sched: fair: Use the earliest break even

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <jhjh7z40y6p.mognet@arm.com>
Date:   Wed, 04 Mar 2020 18:31:10 +0000
From:   Valentin Schneider <valentin.schneider@....com>
To:     Daniel Lezcano <daniel.lezcano@...aro.org>
Cc:     Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        "open list\:SCHEDULER" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] sched: fair: Use the earliest break even


On Wed, Mar 04 2020, Daniel Lezcano wrote:
>> With that said, that comment actually raises a valid point: picking
>> recently idled CPUs might give us warmer cache. So by using the break
>> even stat, we can end up picking CPUs with colder caches (have been
>> idling for longer) than the current logic would. I suppose more testing
>> will tell us where we stand.
>
> Actually I'm not sure this comment still applies. If the CPU is powered
> down, the cache is flushed or we can be picking up CPU with their cache
> totally trashed by interrupt processing.
>
>>> +++ b/kernel/sched/sched.h
>>> @@ -1015,6 +1015,7 @@ struct rq {
>>>  #ifdef CONFIG_CPU_IDLE
>>>       /* Must be inspected within a rcu lock section */
>>>       struct cpuidle_state	*idle_state;
>>> +	s64			break_even;
>>
>> Why signed? This should be purely positive, right?
>
> It should be, but s64 complies with the functions ktime_to_ns signature.
>
> static inline s64 ktime_to_ns(const ktime_t kt)
>

Would there be harm then in simply storing:

  ktime_get_ns() + idle_state->exit_latency_ns

(which is u64)?

>>>  #endif
>>>  };
>>>
>>> @@ -1850,6 +1851,16 @@ static inline struct cpuidle_state *idle_get_state(struct rq *rq)
>>>
>>>       return rq->idle_state;
>>>  }
>>> +
>>> +static inline void idle_set_break_even(struct rq *rq, s64 break_even)
>>> +{
>>> +	rq->break_even = break_even;
>>> +}
>>> +
>>> +static inline s64 idle_get_break_even(struct rq *rq)
>>> +{
>>> +	return rq->break_even;
>>> +}
>>
>> I'm not super familiar with the callsites for setting idle states,
>> what's the locking situation there? Do we have any rq lock?
>
> It is safe, we are under rcu, this section was discussed several years
> ago when introducing the idle_set_state():
>
>  https://lkml.org/lkml/2014/9/19/332
>

Thanks for the link!

So while we (should) have the relevant barriers, there can still be
concurrent writing (from the CPU entering/leaving idle) and reading
(from the CPU gathering stats).

rcu_dereference() gives you READ_ONCE(), and the rcu_assign_pointer()
should give you an smp_store_release(). What I'm thinking here is, if we
have reasons not to use the RCU primitives, we should at least slap some
READ/WRITE_ONCE() to the accesses. Also, can RCU even do anything about
scalar values like the break even you're storing?

>> In find_idlest_group_cpu() we're in a read-side RCU section, so the
>> idle_state is protected (speaking of which, why isn't idle_get_state()
>> using rcu_dereference()?), but that's doesn't cover the break even.
>>
>> IIUC at the very least we may want to give them the READ/WRITE_ONCE()
>> treatment.
>>