[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAP-5=fXV6b2g9QxPe1EsSTcHZpSPq+EAR71jvpBGW7ehydN+Uw@mail.gmail.com>
Date: Thu, 6 Feb 2025 09:54:07 -0800
From: Ian Rogers <irogers@...gle.com>
To: Krzysztof Łopatowski <krzysztof.m.lopatowski@...il.com>
Cc: namhyung@...nel.org, acme@...nel.org, linux-perf-users@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] perf: Improve startup time by reducing unnecessary stat() calls
On Thu, Feb 6, 2025 at 3:35 AM Krzysztof Łopatowski
<krzysztof.m.lopatowski@...il.com> wrote:
>
> When testing perf trace on NixOS, I noticed significant startup delays:
> - `ls`: ~2ms
> - `strace ls`: ~10ms
> - `perf trace ls`: ~550ms
>
> Profiling showed that 51% of the time is spent reading files,
> 26% in loading BPF programs, and 11% in `newfstatat`.
>
> This patch optimizes module path exploration by avoiding `stat()` calls
> unless necessary. For filesystems that do not implement `d_type`
> (DT_UNKNOWN), it falls back to the old behavior.
> See `readdir(3)` for details.
>
> This reduces `perf trace ls` time to ~500ms.
>
> A more thorough startup optimization based on command parameters would
> be ideal, but that is a larger effort.
Hi Krzysztof,
Thanks for the contribution! I did a series and a new io_dir set of
primitives. The last version of which is:
https://lore.kernel.org/lkml/20231207050433.1426834-1-irogers@google.com/
That did something very similar along with mainly memory usage
optimizations. In patch2:
```
+static inline bool io_dir__is_dir(const struct io_dir *iod, struct
io_dirent64 *dent)
+{
+ if (dent->d_type == DT_UNKNOWN) {
+ struct stat st;
+
+ if (fstatat(iod->dirfd, dent->d_name, &st, /*flags=*/0))
+ return false;
+
+ if (S_ISDIR(st.st_mode)) {
+ dent->d_type = DT_DIR;
+ return true;
+ }
+ }
+ return dent->d_type == DT_DIR;
+}
```
I stopped pursuing the series as the maintainers were complaining
about unpopular libcs/platforms missing system call definitions
(getdents) and the series breaking on those platforms. I tried to go
the usual feature testing route, etc. but we seemed to have entered
into wac-a-mole wrt those libcs/platforms and my patience had worn
thin. I carry the changes in Google's tree where the libc/platform
issue isn't a concern.
I mention this as I think that series may be a better route than this
change as it solves a little bit more of the performance issue. I can
and do rebase the changes for Google in the tree:
https://github.com/googleprodkernel/linux-perf
I don't mind this patch as an expedient, obvious performance win.
Thanks,
Ian
Powered by blists - more mailing lists