[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ae8942f9-fe89-e059-fe83-f2a6d133f8e0@quicinc.com>
Date: Tue, 29 Aug 2023 17:33:37 -0700
From: Elliot Berman <quic_eberman@...cinc.com>
To: Pavan Kondeti <quic_pkondeti@...cinc.com>
CC: Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
"Rafael J. Wysocki" <rafael@...nel.org>,
Pavel Machek <pavel@....cz>,
"Thomas Gleixner" <tglx@...utronix.de>, <kernel@...cinc.com>,
<linux-arm-msm@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
<linux-pm@...r.kernel.org>,
Prakash Viswalingam <quic_prakashv@...cinc.com>
Subject: Re: [PATCH] freezer,sched: Use saved_state to reduce some spurious
wakeups
On 8/28/2023 10:22 PM, Pavan Kondeti wrote:
> On Mon, Aug 28, 2023 at 10:33:04AM -0700, Elliot Berman wrote:
>> After commit f5d39b020809 ("freezer,sched: Rewrite core freezer logic"),
>> tasks that are in TASK_FREEZABLE state and end up getting frozen are
>
> TASK_FREEZABLE state and what? Pls check once.
>
>> always woken up. Prior to that commit, tasks could ask freezer to
>> consider them "frozen enough" via freezer_do_not_conut(). As described
>> in Peter's commit, the reason for this change is to prevent these tasks
>> from being woken before SMP is back. The commit introduced a
>> TASK_FREEZABLE state which allows freezer to immediately mark the task
>> as TASK_FROZEN without waking up the task. On the thaw path, the task is
>> woken up even if the task didn't need to wake up and goes back to its
>> TASK_(UN)INTERRUPTIBLE state. Although these tasks are capable of
>> handling of the wakeup, we can observe a power/perf impact from the
>> extra wakeup.
>>
>> We observed on Android many tasks wait in the TASK_FREEZABLE state
>> (particularly due to many of them being binder clients). We observed
>> nearly 4x the number of tasks and a corresponding (almost) linear increase in
>> latency and power consumption when thawing the system. The latency
>> increased from ~15ms to ~50ms.
>>
>> Save the state of TASK_FREEZABLE tasks and restore it after thawing the
>> task without waking the task up. If the task received a wake up for the
>> saved_state before thawing, then the task is still woken upon thawing.
>>
>> Re-use saved_state from RT sleeping spinlocks because freezer doesn't
>> consider TASK_RTLOCK_WAIT freezable.
>>
>> Reported-by: Prakash Viswalingam <quic_prakashv@...cinc.com>
>> Signed-off-by: Elliot Berman <quic_eberman@...cinc.com>
>> ---
>> For testing purposes, I use these commands can help see how many tasks were
>> woken during thawing:
>>
>> 1. Setup:
>> mkdir /sys/kernel/tracing/instances/freezer
>> cd /sys/kernel/tracing/instances/freezer
>> echo 0 > tracing_on ; echo > trace
>> echo power:suspend_resume > set_event
>> echo 'enable_event:sched:sched_wakeup if action == \"thaw_processes\" && start == 1' > events/power/suspend_resume/trigger
>> echo 'traceoff if action == \"thaw_processes\" && start == 0' > events/power/suspend_resume/trigger
>> echo 1 > tracing_on
>>
>> 2. Let kernel go to suspend
>>
>> 3. After kernel's back up:
>> cat /sys/kernel/tracing/instances/freezer/trace | grep sched_wakeup | grep -o "pid=[0-9]*" | sort -u | wc -l
>> ---
>> include/linux/sched.h | 4 ++--
>> kernel/freezer.c | 15 +++++++++++++--
>> kernel/sched/core.c | 21 +++++++++++++--------
>> 3 files changed, 28 insertions(+), 12 deletions(-)
>>
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> index eed5d65b8d1f..e4ade5a18df2 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -746,8 +746,8 @@ struct task_struct {
>> #endif
>> unsigned int __state;
>>
>> -#ifdef CONFIG_PREEMPT_RT
>> - /* saved state for "spinlock sleepers" */
>> +#if IS_ENABLED(CONFIG_PREEMPT_RT) || IS_ENABLED(CONFIG_FREEZER)
>> + /* saved state for "spinlock sleepers" and freezer */
>> unsigned int saved_state;
>> #endif
>>
>> diff --git a/kernel/freezer.c b/kernel/freezer.c
>> index 4fad0e6fca64..6222cbfd97ab 100644
>> --- a/kernel/freezer.c
>> +++ b/kernel/freezer.c
>> @@ -71,7 +71,11 @@ bool __refrigerator(bool check_kthr_stop)
>> for (;;) {
>> bool freeze;
>>
>> + raw_spin_lock_irq(¤t->pi_lock);
>> set_current_state(TASK_FROZEN);
>> + /* unstale saved_state so that __thaw_task() will wake us up */
>> + current->saved_state = TASK_RUNNING;
>> + raw_spin_unlock_irq(¤t->pi_lock);
>>
>> spin_lock_irq(&freezer_lock);
>> freeze = freezing(current) && !(check_kthr_stop && kthread_should_stop());
>> @@ -129,6 +133,7 @@ static int __set_task_frozen(struct task_struct *p, void *arg)
>> WARN_ON_ONCE(debug_locks && p->lockdep_depth);
>> #endif
>>
>> + p->saved_state = p->__state;
>> WRITE_ONCE(p->__state, TASK_FROZEN);
>> return TASK_FROZEN;
>> }
>> @@ -174,10 +179,16 @@ bool freeze_task(struct task_struct *p)
>> * state in p->jobctl. If either of them got a wakeup that was missed because
>> * TASK_FROZEN, then their canonical state reflects that and the below will
>> * refuse to restore the special state and instead issue the wakeup.
>> + *
>> + * Otherwise, restore the saved_state before the task entered freezer. For
>> + * typical tasks in the __refrigerator(), saved_state == 0 so nothing happens
>> + * here. For tasks which were TASK_NORMAL | TASK_FREEZABLE, their initial state
>> + * is returned unless they got an expected wakeup. Then they will be woken up as
>> + * TASK_FROZEN back in __thaw_task().
>> */
>
> Thanks for the detailed comment. The change looks good to me.
>
>> static int __set_task_special(struct task_struct *p, void *arg)
>> {
>> - unsigned int state = 0;
>> + unsigned int state = p->saved_state;
>>
>> if (p->jobctl & JOBCTL_TRACED)
>> state = TASK_TRACED;
>> @@ -188,7 +199,7 @@ static int __set_task_special(struct task_struct *p, void *arg)
>> if (state)
>> WRITE_ONCE(p->__state, state);
>>
>> - return state;
>> + return state & ~TASK_FROZEN;
>> }
>
> void __thaw_task(struct task_struct *p)
> {
>
> ...
>
> if (lock_task_sighand(p, &flags2)) {
> /* TASK_FROZEN -> TASK_{STOPPED,TRACED} */
> bool ret = task_call_func(p, __set_task_special, NULL);
> unlock_task_sighand(p, &flags2);
> if (ret)
> goto unlock;
> }
>
> wake_up_state(p, TASK_FROZEN);
> unlock:
> spin_unlock_irqrestore(&freezer_lock, flags);
> }
>
>
> The comment there about task change needs update. I feel the "ret"
> should be renamed approriately to indicate whether wakeup is needed
> or not.
>
> Now that we have saved_state capturing the previous and any state change
> while task is frozen, can that be used and remove the job control and
> associated locking here? for ex: if saved_state is running, we need to
> wakeup otherwise, simply restore the __state from saved_state.
>
>>
>> void __thaw_task(struct task_struct *p)
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index a68d1276bab0..815d955764a5 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -3992,13 +3992,17 @@ static void ttwu_queue(struct task_struct *p, int cpu, int wake_flags)
>> * The caller holds p::pi_lock if p != current or has preemption
>> * disabled when p == current.
>> *
>> - * The rules of PREEMPT_RT saved_state:
>> + * The rules of saved_state:
>> *
>> * The related locking code always holds p::pi_lock when updating
>> * p::saved_state, which means the code is fully serialized in both cases.
>> *
>> - * The lock wait and lock wakeups happen via TASK_RTLOCK_WAIT. No other
>> - * bits set. This allows to distinguish all wakeup scenarios.
>> + * For PREEMPT_RT, the lock wait and lock wakeups happen via TASK_RTLOCK_WAIT.
>> + * No other bits set. This allows to distinguish all wakeup scenarios.
>> + *
>> + * For FREEZER, the wakeup happens via TASK_FROZEN. No other bits set. This
>> + * allows us to prevent early wakeup of tasks before they can be run on
>> + * asymmetric ISA architectures (eg ARMv9).
>> */
>> static __always_inline
>> bool ttwu_state_match(struct task_struct *p, unsigned int state, int *success)
>> @@ -4013,13 +4017,14 @@ bool ttwu_state_match(struct task_struct *p, unsigned int state, int *success)
>> return true;
>> }
>>
>> -#ifdef CONFIG_PREEMPT_RT
>> +#if IS_ENABLED(CONFIG_PREEMPT_RT) || IS_ENABLED(CONFIG_FREEZER)
>> /*
>> * Saved state preserves the task state across blocking on
>> - * an RT lock. If the state matches, set p::saved_state to
>> - * TASK_RUNNING, but do not wake the task because it waits
>> - * for a lock wakeup. Also indicate success because from
>> - * the regular waker's point of view this has succeeded.
>> + * an RT lock or TASK_FREEZABLE tasks. If the state matches,
>> + * set p::saved_state to TASK_RUNNING, but do not wake the task
>> + * because it waits for a lock wakeup or __thaw_task(). Also
>> + * indicate success because from the regular waker's point of
>> + * view this has succeeded.
>> *
>> * After acquiring the lock the task will restore p::__state
>> * from p::saved_state which ensures that the regular
>>
>> ---
>> base-commit: 6995e2de6891c724bfeb2db33d7b87775f913ad1
>> change-id: 20230817-avoid-spurious-freezer-wakeups-9f8619680b3a
>
> Your patch seems based on v6.4. You might want to resend the patch on
> v6.5 to take the below commit into account.
>
> 1c06918788e8a ("sched: Consider task_struct::saved_state in
> wait_task_inactive()")
Thanks for pointing this out! I am pretty sure that with that change, we
can remove the checks for the jobctl and also remove the
lock_task_sighand(). I'll run some tests.
Powered by blists - more mailing lists