linux-kernel - Re: [PATCH 3/4] sched/psi: extract update

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <CAJuCfpFMrpL16RMp5CNJUKuPM5nTUmRvSNjK+zaxhRcdArLVhg@mail.gmail.com>
Date:   Tue, 21 Mar 2023 20:40:52 -0700
From:   Suren Baghdasaryan <surenb@...gle.com>
To:     Domenico Cerasuolo <cerasuolodomenico@...il.com>
Cc:     linux-kernel@...r.kernel.org, peterz@...radead.org,
        brauner@...nel.org, chris@...isdown.name, hannes@...xchg.org
Subject: Re: [PATCH 3/4] sched/psi: extract update_triggers side effect

On Tue, Mar 21, 2023 at 3:18 AM Domenico Cerasuolo
<cerasuolodomenico@...il.com> wrote:
>
> Hi Suren, thanks for all the feedback! This makes sense, I only have one doubt, if we set update_total flag to window_update() and update_triggers(), that flag would be ignored when the caller is psi_avgs_work(), this would be happening in the next patch in the set.

I don't see why the update_triggers part should be conceptually
different between RT and unprivileged triggers. Could you please
explain?

> What do you think if I just remove this change from the patchset and then work on a solution after the iterations on the main change are completed? This was in fact just an attempt to clean up.
> I'll apply your suggested changes on the other patches, wait a bit for comments from someone else and then send V2.
>
> On Tue, Mar 21, 2023 at 12:00 AM Suren Baghdasaryan <surenb@...gle.com> wrote:
>>
>> On Thu, Mar 9, 2023 at 9:08 AM Domenico Cerasuolo
>> <cerasuolodomenico@...il.com> wrote:
>> >
>> > The update of rtpoll_total inside update_triggers can be moved out of
>> > the function since changed_states has the same information as the
>> > update_total flag used in the function. Besides the simplification of
>> > the function, with the next patch it would become an unwanted side
>> > effect needed only for PSI_POLL.
>>
>> (changed_states & group->rtpoll_states) and update_total flag are not
>> really equivalent. update_total flag depends on the difference between
>> group->polling_total[state] and group->total[PSI_POLL][state] while
>> changed_states depends on the difference between groupc->times and
>> groupc->times_prev. groupc->times_prev is updated every time
>> collect_percpu_times() is called and there are 3 places where that
>> happens: from psi_avgs_work(), from psi_poll_work() and from
>> psi_show(). group->polling_total[state] is updated only from
>> psi_poll_work(). Therefore the deltas between these values might not
>> always be in-sync.
>>
>> Consider the following sequence as an example:
>>
>> psi_poll_work()
>> ...
>> psi_avgs_work()/psi_show()
>>   collect_percpu_times() // we detect a change in a monitored state
>> ...
>> psi_poll_work()
>>   collect_percpu_times() // this time no change in monitored states
>>   update_triggers() // group->polling_total[state] !=
>> group->total[PSI_POLL][state]
>>
>> In the last psi_poll_work() collect_percpu_times() recorded no change
>> in monitored states, so (changed_states & group->rtpoll_states) == 0,
>> however since the last time psi_poll_work() was called there was
>> actually a change in monitored states recorded by the first
>> collect_percpu_times(), therefore (group->polling_total[t->state] !=
>> total[t->state]) and we should update the totals. With your change we
>> will miss that update.
>>
>> I think you can easily fix that by introducing update_triggers as an
>> output parameter in window_update() like this:
>>
>> static u64 window_update(struct psi_window *win, u64 now, u64 value,
>> bool *update_triggers) {
>>     *update_total = false;
>> ...
>>     if (new_stall) {
>>         *update_total = true;
>> ...
>> }
>>
>> static void psi_rtpoll_work(struct psi_group *group) {
>> +       bool update_triggers;
>> ...
>> -       if (now >= group->rtpoll_next_update)
>> +       if (now >= group->rtpoll_next_update) {
>>                 group->rtpoll_next_update = update_triggers(group,
>> now, &update_triggers);
>> +               if (update_triggers)
>> +                       memcpy(group->rtpoll_total, group->total[PSI_POLL],
>> +                                  sizeof(group->rtpoll_total));
>> +       }
>> }
>>
>>
>> >
>> > Suggested-by: Johannes Weiner <hannes@...xchg.org>
>> > Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@...il.com>
>> > ---
>> >  kernel/sched/psi.c | 20 +++++---------------
>> >  1 file changed, 5 insertions(+), 15 deletions(-)
>> >
>> > diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c
>> > index a3d0b5cf797a..476941c1cbea 100644
>> > --- a/kernel/sched/psi.c
>> > +++ b/kernel/sched/psi.c
>> > @@ -433,7 +433,6 @@ static u64 window_update(struct psi_window *win, u64 now, u64 value)
>> >  static u64 update_triggers(struct psi_group *group, u64 now)
>> >  {
>> >         struct psi_trigger *t;
>> > -       bool update_total = false;
>> >         u64 *total = group->total[PSI_POLL];
>> >
>> >         /*
>> > @@ -456,14 +455,6 @@ static u64 update_triggers(struct psi_group *group, u64 now)
>> >                  * events without dropping any).
>> >                  */
>> >                 if (new_stall) {
>> > -                       /*
>> > -                        * Multiple triggers might be looking at the same state,
>> > -                        * remember to update group->polling_total[] once we've
>> > -                        * been through all of them. Also remember to extend the
>> > -                        * polling time if we see new stall activity.
>> > -                        */
>> > -                       update_total = true;
>> > -
>> >                         /* Calculate growth since last update */
>> >                         growth = window_update(&t->win, now, total[t->state]);
>> >                         if (!t->pending_event) {
>> > @@ -484,11 +475,6 @@ static u64 update_triggers(struct psi_group *group, u64 now)
>> >                 /* Reset threshold breach flag once event got generated */
>> >                 t->pending_event = false;
>> >         }
>> > -
>> > -       if (update_total)
>> > -               memcpy(group->rtpoll_total, total,
>> > -                               sizeof(group->rtpoll_total));
>> > -
>> >         return now + group->rtpoll_min_period;
>> >  }
>> >
>> > @@ -686,8 +672,12 @@ static void psi_rtpoll_work(struct psi_group *group)
>> >                 goto out;
>> >         }
>> >
>> > -       if (now >= group->rtpoll_next_update)
>> > +       if (now >= group->rtpoll_next_update) {
>> >                 group->rtpoll_next_update = update_triggers(group, now);
>> > +               if (changed_states & group->rtpoll_states)
>> > +                       memcpy(group->rtpoll_total, group->total[PSI_POLL],
>> > +                                  sizeof(group->rtpoll_total));
>> > +       }
>> >
>> >         psi_schedule_rtpoll_work(group,
>> >                 nsecs_to_jiffies(group->rtpoll_next_update - now) + 1,
>> > --
>> > 2.34.1
>> >