[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMuHMdWUtb_A-uhXrBg6kC9L2zbC_q3m8oCZoq80ZSJvk6mUAA@mail.gmail.com>
Date: Thu, 4 May 2023 19:26:11 +0200
From: Geert Uytterhoeven <geert@...ux-m68k.org>
To: Nhat Pham <nphamcs@...il.com>
Cc: akpm@...ux-foundation.org, hannes@...xchg.org, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, bfoster@...hat.com,
willy@...radead.org, linux-api@...r.kernel.org,
kernel-team@...a.com
Subject: Re: [PATCH v13 2/3] cachestat: implement cachestat syscall
Hi Nhat,
On Wed, May 3, 2023 at 3:38 AM Nhat Pham <nphamcs@...il.com> wrote:
> There is currently no good way to query the page cache state of large
> file sets and directory trees. There is mincore(), but it scales poorly:
> the kernel writes out a lot of bitmap data that userspace has to
> aggregate, when the user really doesn not care about per-page
> information in that case. The user also needs to mmap and unmap each
> file as it goes along, which can be quite slow as well.
>
> Some use cases where this information could come in handy:
> * Allowing database to decide whether to perform an index scan or
> direct table queries based on the in-memory cache state of the
> index.
> * Visibility into the writeback algorithm, for performance issues
> diagnostic.
> * Workload-aware writeback pacing: estimating IO fulfilled by page
> cache (and IO to be done) within a range of a file, allowing for
> more frequent syncing when and where there is IO capacity, and
> batching when there is not.
> * Computing memory usage of large files/directory trees, analogous to
> the du tool for disk usage.
>
> More information about these use cases could be found in the following
> thread:
>
> https://lore.kernel.org/lkml/20230315170934.GA97793@cmpxchg.org/
>
> This patch implements a new syscall that queries cache state of a file
> and summarizes the number of cached pages, number of dirty pages, number
> of pages marked for writeback, number of (recently) evicted pages, etc.
> in a given range. Currently, the syscall is only wired in for x86
> architecture.
>
> NAME
> cachestat - query the page cache statistics of a file.
>
> SYNOPSIS
> #include <sys/mman.h>
>
> struct cachestat_range {
> __u64 off;
> __u64 len;
> };
>
> struct cachestat {
> __u64 nr_cache;
> __u64 nr_dirty;
> __u64 nr_writeback;
> __u64 nr_evicted;
> __u64 nr_recently_evicted;
> };
>
> int cachestat(unsigned int fd, struct cachestat_range *cstat_range,
> struct cachestat *cstat, unsigned int flags);
>
> DESCRIPTION
> cachestat() queries the number of cached pages, number of dirty
> pages, number of pages marked for writeback, number of evicted
> pages, number of recently evicted pages, in the bytes range given by
> `off` and `len`.
>
> An evicted page is a page that is previously in the page cache but
> has been evicted since. A page is recently evicted if its last
> eviction was recent enough that its reentry to the cache would
> indicate that it is actively being used by the system, and that
> there is memory pressure on the system.
>
> These values are returned in a cachestat struct, whose address is
> given by the `cstat` argument.
>
> The `off` and `len` arguments must be non-negative integers. If
> `len` > 0, the queried range is [`off`, `off` + `len`]. If `len` ==
> 0, we will query in the range from `off` to the end of the file.
>
> The `flags` argument is unused for now, but is included for future
> extensibility. User should pass 0 (i.e no flag specified).
>
> Currently, hugetlbfs is not supported.
>
> Because the status of a page can change after cachestat() checks it
> but before it returns to the application, the returned values may
> contain stale information.
>
> RETURN VALUE
> On success, cachestat returns 0. On error, -1 is returned, and errno
> is set to indicate the error.
>
> ERRORS
> EFAULT cstat or cstat_args points to an invalid address.
>
> EINVAL invalid flags.
>
> EBADF invalid file descriptor.
>
> EOPNOTSUPP file descriptor is of a hugetlbfs file
>
> Signed-off-by: Nhat Pham <nphamcs@...il.com>
> ---
> arch/x86/entry/syscalls/syscall_32.tbl | 1 +
> arch/x86/entry/syscalls/syscall_64.tbl | 1 +
This should be wired up on each and every architecture.
Currently we're getting
<stdin>:1567:2: warning: #warning syscall cachestat not implemented [-Wcpp]
in linux-next for all the missing architectures.
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@...ux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
Powered by blists - more mailing lists