lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJZ5v0jx44heAgXPwCDriGo-xyjX2pUP+PG1cQwjme_=EHo3rg@mail.gmail.com>
Date: Tue, 15 Apr 2025 19:05:42 +0200
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Lukasz Luba <lukasz.luba@....com>
Cc: Changwoo Min <changwoo@...lia.com>, rafael@...nel.org, christian.loehle@....com, 
	tj@...nel.org, len.brown@...el.com, kernel-dev@...lia.com, 
	linux-pm@...r.kernel.org, linux-kernel@...r.kernel.org, pavel@...nel.org
Subject: Re: [PATCH] PM: EM: Add inotify support when the energy model is updated.

On Tue, Apr 15, 2025 at 10:31 AM Lukasz Luba <lukasz.luba@....com> wrote:
>
> Hi Changwoo,
>
> Thanks for the patch.
>
> On 4/15/25 01:42, Changwoo Min wrote:
> > The sched_ext schedulers [1] currently access the energy model through the
> > debugfs to make energy-aware scheduling decisions [2]. The userspace part
> > of a sched_ext scheduler feeds the necessary (post-processed) energy-model
> > information to the BPF part of the scheduler.
>
> This is very interesting use case!
>
> >
> > However, there is a limitation in the current debugfs support of the energy
> > model. When the energy model is updated (em_dev_update_perf_domain), there
> > is no way for the userspace part to know such changes (besides polling the
> > debugfs files).
> >
> > Therefore, add inotify support (IN_MODIFY) when the energy model is
> > updated. With this inotify support, the sched_ext scheduler can monitor the
> > energy model change in userspace using the regular inotify interface and
> > feed the updated energy model information to make energy-aware scheduling
> > decisions.
> >
> > [1] https://lwn.net/Articles/922405/
> > [2] https://github.com/sched-ext/scx/pull/1624
> >
> > Signed-off-by: Changwoo Min <changwoo@...lia.com>
> > ---
> >   kernel/power/energy_model.c | 47 +++++++++++++++++++++++++++++++++++++
> >   1 file changed, 47 insertions(+)
> >
> > diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
> > index d9b7e2b38c7a..0c06e0278df6 100644
> > --- a/kernel/power/energy_model.c
> > +++ b/kernel/power/energy_model.c
> > @@ -14,6 +14,7 @@
> >   #include <linux/cpumask.h>
> >   #include <linux/debugfs.h>
> >   #include <linux/energy_model.h>
> > +#include <linux/fsnotify.h>
> >   #include <linux/sched/topology.h>
> >   #include <linux/slab.h>
> >
> > @@ -156,9 +157,53 @@ static int __init em_debug_init(void)
> >       return 0;
> >   }
> >   fs_initcall(em_debug_init);
> > +
> > +static void em_debug_update_ps(struct em_perf_domain *em_pd, int i,
> > +                            struct dentry *pd)
> > +{
> > +     static const char *names[] = {
> > +             "frequency",
> > +             "power",
> > +             "cost",
> > +             "performance",
> > +             "inefficient",
> > +     };
> > +     struct em_perf_state *table;
> > +     unsigned long freq;
> > +     struct dentry *d, *cd;
> > +     char name[24];
> > +     int j;
> > +
> > +     rcu_read_lock();
> > +     table = em_perf_state_from_pd(em_pd);
> > +     freq = table[i].frequency;
> > +     rcu_read_unlock();
> > +
> > +     snprintf(name, sizeof(name), "ps:%lu", freq);
> > +     d = debugfs_lookup(name, pd);
> > +
> > +     for (j = 0; j < ARRAY_SIZE(names); j++) {
> > +             cd = debugfs_lookup(names[j], d);
> > +             if (!cd)
> > +                     return;
> > +             fsnotify_dentry(cd, FS_MODIFY);
> > +             cond_resched();
> > +     }
> > +}
> > +
> > +static void em_debug_update(struct device *dev)
> > +{
> > +     struct dentry *d;
> > +     int i;
> > +
> > +     d = debugfs_lookup(dev_name(dev), rootdir);
> > +     for (i = 0; i < dev->em_pd->nr_perf_states; i++)
> > +             em_debug_update_ps(dev->em_pd, i, d);
> > +}
> >   #else /* CONFIG_DEBUG_FS */
> >   static void em_debug_create_pd(struct device *dev) {}
> >   static void em_debug_remove_pd(struct device *dev) {}
> > +static void em_debug_update(struct device *dev) {}
> >   #endif
> >
> >   static void em_release_table_kref(struct kref *kref)
> > @@ -323,6 +368,8 @@ int em_dev_update_perf_domain(struct device *dev,
> >
> >       em_table_free(old_table);
> >
> > +     em_debug_update(dev);
> > +
>
> I would move this out of the locked section, below the mutex
> unlock. Looking at the code in em_debug_update() you are trying
> to send such notification for each EM's table entry * number of
> fields, which is heavy. The RCU copy that you get will make sure
> you have consistent view on the data and you don't have to
> be under the mutex lock.
>
> A different question would be if the notification has to be
> that heavy?
> Can we just 'ping' the user-space that there is a change and ask to read
> the new values?
>
> Another question, but this time to Rafael would be if for such use case
> we can use debugfs, or we need a sysfs?

debugfs is not really suitable for this IMV and the problem at hand is
a symptom of that (but there will be more issues in the future
AFAICS).

Before starting to invent new interfaces for user space, though, I'm
wondering why the BPF code cannot obtain the energy model information
from the kernel?

> >       mutex_unlock(&em_pd_mutex);
> >       return 0;
> >   }

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ