[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAKPOu+82j52NwUV3JUwwtWjcJsDktoMGnG_Sr5JstrPm8qhicQ@mail.gmail.com>
Date:   Wed, 27 Sep 2023 13:22:01 +0200
From:   Max Kellermann <max.kellermann@...os.com>
To:     Ilya Dryomov <idryomov@...il.com>
Cc:     Xiubo Li <xiubli@...hat.com>, Jeff Layton <jlayton@...nel.org>,
        ceph-devel@...r.kernel.org, linux-kernel@...r.kernel.org,
        Venky Shankar <vshankar@...hat.com>,
        Gregory Farnum <gfarnum@...hat.com>
Subject: Re: [PATCH 1/2] fs/ceph/debugfs: make all files world-readable
On Wed, Sep 27, 2023 at 12:53 PM Ilya Dryomov <idryomov@...il.com> wrote:
> > This "ceph" tool requires installing 90 MB of additional Debian
> > packages, which I just tried on a test cluster, and "ceph fs top"
> > fails with "Error initializing cluster client: ObjectNotFound('RADOS
> > object not found (error calling conf_read_file)')". Okay, so I have to
> > configure something.... but .... I don't get why I would want to do
> > that, when I can get the same information from the kernel without
> > installing or configuring anything. This sounds like overcomplexifying
> > the thing for no reason.
>
> I have relayed my understanding of this feature (or rather how it was
> presented to me).  I see where you are coming from, so adding more
> CephFS folks to chime in.
Let me show these folks how badly "ceph fs stats" performs:
 # time ceph fs perf stats
 {"version": 2, "global_counters": ["cap_hit", "read_latency",
"write_latency"[...]
 real    0m0.502s
 user    0m0.393s
 sys    0m0.053s
Now my debugfs-based solution:
 # time cat /sys/kernel/debug/ceph/*/metrics/latency
 item          total       avg_lat(us)     min_lat(us)     max_lat(us)
    stdev(us)
 [...]
 real    0m0.002s
 user    0m0.002s
 sys    0m0.001s
debugfs is more than 200 times faster. It is so fast, it can hardly be
measured by "time" - and most of these 2ms is the overhead for
executing /bin/cat, not for actually reading the debugfs file.
Our kernel-exporter is a daemon process, it only needs a single
pread() system call in each iteration, it has even less overhead.
Integrating the "ceph" tool instead would require forking the process
each time, starting a new Python VM, and so on...
For obtaining real-time latency statistics, the "ceph" script is the
wrong tool for the job.
Max
Powered by blists - more mailing lists
 
