[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YUAvSx42abg5S2ym@kroah.com>
Date: Tue, 14 Sep 2021 07:12:43 +0200
From: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
To: Dave Chinner <david@...morbit.com>
Cc: Christoph Hellwig <hch@....de>,
"Rafael J. Wysocki" <rafael@...nel.org>,
Alexander Viro <viro@...iv.linux.org.uk>,
Jens Axboe <axboe@...nel.dk>, Tejun Heo <tj@...nel.org>,
linux-block@...r.kernel.org, linux-xfs@...r.kernel.org,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 13/13] xfs: convert xfs_sysfs attrs to use ->seq_show
On Tue, Sep 14, 2021 at 11:20:29AM +1000, Dave Chinner wrote:
> On Mon, Sep 13, 2021 at 08:27:50AM +0200, Greg Kroah-Hartman wrote:
> > On Mon, Sep 13, 2021 at 07:41:21AM +0200, Christoph Hellwig wrote:
> > > Trivial conversion to the seq_file based sysfs attributes.
> > >
> > > Signed-off-by: Christoph Hellwig <hch@....de>
> > > ---
> > > fs/xfs/xfs_stats.c | 24 +++++-------
> > > fs/xfs/xfs_stats.h | 2 +-
> > > fs/xfs/xfs_sysfs.c | 96 +++++++++++++++++++++++-----------------------
> > > 3 files changed, 58 insertions(+), 64 deletions(-)
> > >
> > > diff --git a/fs/xfs/xfs_stats.c b/fs/xfs/xfs_stats.c
> > > index 20e0534a772c9..71e7a84ba0403 100644
> > > --- a/fs/xfs/xfs_stats.c
> > > +++ b/fs/xfs/xfs_stats.c
> > > @@ -16,10 +16,9 @@ static int counter_val(struct xfsstats __percpu *stats, int idx)
> > > return val;
> > > }
> > >
> > > -int xfs_stats_format(struct xfsstats __percpu *stats, char *buf)
> > > +void xfs_stats_format(struct xfsstats __percpu *stats, struct seq_file *sf)
> > > {
> > > int i, j;
> > > - int len = 0;
> > > uint64_t xs_xstrat_bytes = 0;
> > > uint64_t xs_write_bytes = 0;
> > > uint64_t xs_read_bytes = 0;
> > > @@ -58,13 +57,12 @@ int xfs_stats_format(struct xfsstats __percpu *stats, char *buf)
> > > /* Loop over all stats groups */
> > >
> > > for (i = j = 0; i < ARRAY_SIZE(xstats); i++) {
> > > - len += scnprintf(buf + len, PATH_MAX - len, "%s",
> > > - xstats[i].desc);
> > > + seq_printf(sf, "%s", xstats[i].desc);
> > > +
> > > /* inner loop does each group */
> > > for (; j < xstats[i].endpoint; j++)
> > > - len += scnprintf(buf + len, PATH_MAX - len, " %u",
> > > - counter_val(stats, j));
> > > - len += scnprintf(buf + len, PATH_MAX - len, "\n");
> > > + seq_printf(sf, " %u", counter_val(stats, j));
> > > + seq_printf(sf, "\n");
> > > }
> > > /* extra precision counters */
> > > for_each_possible_cpu(i) {
> > > @@ -74,18 +72,14 @@ int xfs_stats_format(struct xfsstats __percpu *stats, char *buf)
> > > defer_relog += per_cpu_ptr(stats, i)->s.defer_relog;
> > > }
> > >
> > > - len += scnprintf(buf + len, PATH_MAX-len, "xpc %Lu %Lu %Lu\n",
> > > + seq_printf(sf, "xpc %Lu %Lu %Lu\n",
> > > xs_xstrat_bytes, xs_write_bytes, xs_read_bytes);
> > > - len += scnprintf(buf + len, PATH_MAX-len, "defer_relog %llu\n",
> > > - defer_relog);
> > > - len += scnprintf(buf + len, PATH_MAX-len, "debug %u\n",
> > > + seq_printf(sf, "defer_relog %llu\n", defer_relog);
> > > #if defined(DEBUG)
> > > - 1);
> > > + seq_printf(sf, "debug 1\n");
> > > #else
> > > - 0);
> > > + seq_printf(sf, "debug 0\n");
> > > #endif
> > > -
> > > - return len;
> > > }
> >
> > That is a sysfs file? What happened to the "one value per file" rule
> > here?
>
>
> There is no "rule" that says syfs files must contain one value per
> file; the documentation says that one value per file is the
> "preferred" format. Documentation/filesystems/sysfs.rst:
>
> [...]
> Attributes
> ...
> Attributes should be ASCII text files, preferably with only one value
> per file. It is noted that it may not be efficient to contain only one
> value per file, so it is socially acceptable to express an array of
> values of the same type.
> [...]
>
An array of values is one thing like "what is the power states for this
device". A list of different key/value pairs is a totally different
thing entirely.
> We are exposing a large array of integer values here, so multiple
> values per file are explicitly considered an acceptible format.
Not really, that was not the goal of sysfs at all.
> Further, as there are roughly 200 individual stats in this file and
> calculating each stat requires per-cpu aggregation, the the cost of
> calculating and reading each stat individually is prohibitive, not
> just inefficient.
Have you measured it? How often does the file get read and by what
tools?
We have learned from our past mistakes in /proc where we did this in the
past and required keeping obsolete values and constantly tweaking
userspace parsers. That is why we made sysfs one-value-per-file. If
the file is not there, the value is not there, much easier to handle
future changes.
> So, yes, we might have multiple lines in the file that you can frown
> about, but OTOH the file format has been exposed as a kernel ABI for
> a couple of decades via /proc/fs/xfs/stat.
proc had no such rules, but we have learned :)
> Hence exposing it in
> sysfs to provide a more fine-grained breakdown of the stats (per
> mount instead of global) is a no-brainer. We don't have to rewrite
> the parsing engines in multiple userspace monitoring programs to
> extract this information from the kernel - they just create a new
> instance and read a different file and it all just works.
But then you run into the max size restriction on sysfs files
(PAGE_SIZE) and things break down.
Please don't do this.
> Indeed, there's precedence for such /proc file formats in more
> fine-grained sysfs files. e.g. /sys/bus/node/devices/node<n>/vmstat
> and /sys/bus/node/devices/node<n>/meminfo retain the same format
> (and hence userspace parsers) for the per-node stats as /proc/vmstat
> and /proc/meminfo use for the global stats...
And I have complained about those files in the past many times. And
they are running into problems in places dealing with them too.
> tl;dr: the file contains arrays of values, it's inefficient to read
> values one at a time, it's a pre-existing ABI-constrainted file
> format, there's precedence in core kernel statistics
> implementations and the documented guidelines allow this sort of
> usage in these cases.
I would prefer not to do this, and I will not take core sysfs changes to
make this any easier.
Which is one big reason why I don't like just making sysfs use the seq
file api, it would allow stuff like this to propagate to other places in
the kernel.
Maybe I should cut the file size of a sysfs file down to PAGE_SIZE/4 or
less, that might be better :)
thanks,
greg k-h
Powered by blists - more mailing lists