[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJuCfpF4pdREUYvhU6zDB67fjZ2R-wn9XQbA3k7u7_e_jMr7xw@mail.gmail.com>
Date: Tue, 17 May 2022 10:35:46 -0700
From: Suren Baghdasaryan <surenb@...gle.com>
To: Chen Wandun <chenwandun@...wei.com>
Cc: Alex Shi <seakeel@...il.com>, LKML <linux-kernel@...r.kernel.org>,
Johannes Weiner <hannes@...xchg.org>,
Alex Shi <alexs@...nel.org>, Jonathan Corbet <corbet@....net>,
"open list:DOCUMENTATION" <linux-doc@...r.kernel.org>
Subject: Re: [PATCH 1/2] psi: add support for multi level pressure stall trigger
On Tue, May 17, 2022 at 5:46 AM Chen Wandun <chenwandun@...wei.com> wrote:
>
>
>
> 在 2022/5/16 16:43, Suren Baghdasaryan 写道:
> > On Mon, May 16, 2022 at 1:21 AM Suren Baghdasaryan <surenb@...gle.com> wrote:
> >> On Sun, May 15, 2022 at 11:20 PM Alex Shi <seakeel@...il.com> wrote:
> >>>
> >>>
> >>> On 5/16/22 11:35, Chen Wandun wrote:
> >>>> Nowadays, psi events are triggered when stall time exceed
> >>>> stall threshold, but no any different between these events.
> >>>>
> >>>> Actually, events can be divide into multi level, each level
> >>>> represent a different stall pressure, that is help to identify
> >>>> pressure information more accurately.
> >> IIUC by defining min and max, you want the trigger to activate when
> >> the stall is between min and max thresholds. But I don't see why you
> >> would need that. If you want to have several levels, you can create
> >> multiple triggers and monitor them separately. For your example, that
> >> would be:
> >>
> >> echo "some 150000 1000000" > /proc/pressure/memory
> >> echo "some 350000 1000000" > /proc/pressure/memory
> >>
> >> Your first trigger will fire whenever the stall exceeds 150ms within
> >> each 1sec and the second one will trigger when it exceeds 350ms. It is
> >> true that if the stall jumps sharply above 350ms, you would get both
> >> triggers firing. I'm guessing that's why you want this functionality
> >> so that 150ms trigger does not fire when 350ms one is firing but why
> >> is that a problem? Can't userspace pick the highest level one and
> >> ignore all the lower ones when this happens? Or are you addressing
> >> some other requirement?
> >>
> >>>> echo "some 150000 350000 1000000" > /proc/pressure/memory would
> >>> This breaks the old ABI. And why you need this new function?
> >> Both great points.
> > BTW, I think the additional max_threshold parameter could be
> > implemented in a backward compatible way so that the old API is not
> > broken:
> >
> > arg_count = sscanf(buf, "some %u %u %u", &min_threshold_us, &arg2, &arg3);
> > if (arg_count < 2) return ERR_PTR(-EINVAL);
> > if (arg_count < 3) {
> > max_threshold_us = INT_MAX;
> > window_us = arg2;
> > } else {
> > max_threshold_us = arg2;
> > window_us = arg3;
> > }
> OK
>
> Thanks.
> > But again, the motivation still needs to be explained.
> we want do different operation for different stall level,
> just as prev email explain, multi trigger is also OK in old
> ways, but it is a litter complex.
Ok, so the issue can be dealt with in the userspace but would make it
simpler if max_threashold is supported by the kernel. I can buy this
argument if the kernel implementation is not complex and max_threshold
is added in a way that does not break current users. I believe both
conditions can be met.
>
> >
> >>> Thanks
> >>>
> >>>> add [150ms, 350ms) threshold for partial memory stall measured
> >>>> within 1sec time window.
> >>>>
> >>>> Signed-off-by: Chen Wandun <chenwandun@...wei.com>
> >>>> ---
> >>>> include/linux/psi_types.h | 3 ++-
> >>>> kernel/sched/psi.c | 19 +++++++++++++------
> >>>> 2 files changed, 15 insertions(+), 7 deletions(-)
> >>>>
> >>>> diff --git a/include/linux/psi_types.h b/include/linux/psi_types.h
> >>>> index c7fe7c089718..2b1393c8bf90 100644
> >>>> --- a/include/linux/psi_types.h
> >>>> +++ b/include/linux/psi_types.h
> >>>> @@ -119,7 +119,8 @@ struct psi_trigger {
> >>>> enum psi_states state;
> >>>>
> >>>> /* User-spacified threshold in ns */
> >>>> - u64 threshold;
> >>>> + u64 min_threshold;
> >>>> + u64 max_threshold;
> >>>>
> >>>> /* List node inside triggers list */
> >>>> struct list_head node;
> >>>> diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c
> >>>> index 6f9533c95b0a..17dd233b533a 100644
> >>>> --- a/kernel/sched/psi.c
> >>>> +++ b/kernel/sched/psi.c
> >>>> @@ -541,7 +541,7 @@ static u64 update_triggers(struct psi_group *group, u64 now)
> >>>>
> >>>> /* Calculate growth since last update */
> >>>> growth = window_update(&t->win, now, total[t->state]);
> >>>> - if (growth < t->threshold)
> >>>> + if (growth < t->min_threshold || growth >= t->max_threshold)
> >>>> continue;
> >>>>
> >>>> t->pending_event = true;
> >>>> @@ -1087,15 +1087,18 @@ struct psi_trigger *psi_trigger_create(struct psi_group *group,
> >>>> {
> >>>> struct psi_trigger *t;
> >>>> enum psi_states state;
> >>>> - u32 threshold_us;
> >>>> + u32 min_threshold_us;
> >>>> + u32 max_threshold_us;
> >>>> u32 window_us;
> >>>>
> >>>> if (static_branch_likely(&psi_disabled))
> >>>> return ERR_PTR(-EOPNOTSUPP);
> >>>>
> >>>> - if (sscanf(buf, "some %u %u", &threshold_us, &window_us) == 2)
> >>>> + if (sscanf(buf, "some %u %u %u", &min_threshold_us,
> >>>> + &max_threshold_us, &window_us) == 3)
> >>>> state = PSI_IO_SOME + res * 2;
> >>>> - else if (sscanf(buf, "full %u %u", &threshold_us, &window_us) == 2)
> >>>> + else if (sscanf(buf, "full %u %u %u", &min_threshold_us,
> >>>> + &max_threshold_us, &window_us) == 3)
> >>>> state = PSI_IO_FULL + res * 2;
> >>>> else
> >>>> return ERR_PTR(-EINVAL);
> >>>> @@ -1107,8 +1110,11 @@ struct psi_trigger *psi_trigger_create(struct psi_group *group,
> >>>> window_us > WINDOW_MAX_US)
> >>>> return ERR_PTR(-EINVAL);
> >>>>
> >>>> + if (min_threshold_us >= max_threshold_us)
> >>>> + return ERR_PTR(-EINVAL);
> >>>> +
> >>>> /* Check threshold */
> >>>> - if (threshold_us == 0 || threshold_us > window_us)
> >>>> + if (max_threshold_us > window_us)
> >>>> return ERR_PTR(-EINVAL);
> >>>>
> >>>> t = kmalloc(sizeof(*t), GFP_KERNEL);
> >>>> @@ -1117,7 +1123,8 @@ struct psi_trigger *psi_trigger_create(struct psi_group *group,
> >>>>
> >>>> t->group = group;
> >>>> t->state = state;
> >>>> - t->threshold = threshold_us * NSEC_PER_USEC;
> >>>> + t->min_threshold = min_threshold_us * NSEC_PER_USEC;
> >>>> + t->max_threshold = max_threshold_us * NSEC_PER_USEC;
> >>>> t->win.size = window_us * NSEC_PER_USEC;
> >>>> window_reset(&t->win, 0, 0, 0);
> >>>>
> > .
>
Powered by blists - more mailing lists