lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <CAC2o3DLWwag6FCrVVppKF=_VRqiCcabrbXycdfsE_c9mfVQ_Vw@mail.gmail.com>
Date:   Thu, 19 Nov 2020 16:25:27 +0800
From:   Fox Chen <foxhlchen@...il.com>
To:     Brice Goglin <brice.goglin@...il.com>
Cc:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>, x86@...nel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/4] drivers core: Introduce CPU type sysfs interface

On Fri, Nov 13, 2020 at 2:15 PM Brice Goglin <brice.goglin@...il.com> wrote:
>
>
> Le 12/11/2020 à 11:49, Greg Kroah-Hartman a écrit :
>
> On Thu, Nov 12, 2020 at 10:10:57AM +0100, Brice Goglin wrote:
>
> Le 12/11/2020 à 07:42, Greg Kroah-Hartman a écrit :
>
> On Thu, Nov 12, 2020 at 07:19:48AM +0100, Brice Goglin wrote:
>
> Le 07/10/2020 à 07:15, Greg Kroah-Hartman a écrit :
>
> On Tue, Oct 06, 2020 at 08:14:47PM -0700, Ricardo Neri wrote:
>
> On Tue, Oct 06, 2020 at 09:37:44AM +0200, Greg Kroah-Hartman wrote:
>
> On Mon, Oct 05, 2020 at 05:57:36PM -0700, Ricardo Neri wrote:
>
> On Sat, Oct 03, 2020 at 10:53:45AM +0200, Greg Kroah-Hartman wrote:
>
> On Fri, Oct 02, 2020 at 06:17:42PM -0700, Ricardo Neri wrote:
>
> Hybrid CPU topologies combine CPUs of different microarchitectures in the
> same die. Thus, even though the instruction set is compatible among all
> CPUs, there may still be differences in features (e.g., some CPUs may
> have counters that others CPU do not). There may be applications
> interested in knowing the type of micro-architecture topology of the
> system to make decisions about process affinity.
>
> While the existing sysfs for capacity (/sys/devices/system/cpu/cpuX/
> cpu_capacity) may be used to infer the types of micro-architecture of the
> CPUs in the platform, it may not be entirely accurate. For instance, two
> subsets of CPUs with different types of micro-architecture may have the
> same capacity due to power or thermal constraints.
>
> Create the new directory /sys/devices/system/cpu/types. Under such
> directory, create individual subdirectories for each type of CPU micro-
> architecture. Each subdirectory will have cpulist and cpumap files. This
> makes it convenient for user space to read all the CPUs of the same type
> at once without having to inspect each CPU individually.
>
> Implement a generic interface using weak functions that architectures can
> override to indicate a) support for CPU types, b) the CPU type number, and
> c) a string to identify the CPU vendor and type.
>
> For example, an x86 system with one Intel Core and four Intel Atom CPUs
> would look like this (other architectures have the hooks to use whatever
> directory naming convention below "types" that meets their needs):
>
> user@...t:~$: ls /sys/devices/system/cpu/types
> intel_atom_0  intel_core_0
>
> user@...t:~$ ls /sys/devices/system/cpu/types/intel_atom_0
> cpulist cpumap
>
> user@...t:~$ ls /sys/devices/system/cpu/types/intel_core_0
> cpulist cpumap
>
> user@...t:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpumap
> 0f
>
> user@...t:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpulist
> 0-3
>
> user@...st:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpumap
> 10
>
> user@...t:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpulist
> 4
>
> Thank you for the quick and detailed Greg!
>
> The output of 'tree' sometimes makes it easier to see here, or:
> grep -R . *
> also works well.
>
> Indeed, this would definitely make it more readable.
>
> On non-hybrid systems, the /sys/devices/system/cpu/types directory is not
> created. Add a hook for this purpose.
>
> Why should these not show up if the system is not "hybrid"?
>
> My thinking was that on a non-hybrid system, it does not make sense to
> create this interface, as all the CPUs will be of the same type.
>
> Why not just have this an attribute type in the existing cpuX directory?
> Why do this have to be a totally separate directory and userspace has to
> figure out to look in two different spots for the same cpu to determine
> what it is?
>
> But if the type is located under cpuX, usespace would need to traverse
> all the CPUs and create its own cpu masks. Under the types directory it
> would only need to look once for each type of CPU, IMHO.
>
> What does a "mask" do?  What does userspace care about this?  You would
> have to create it by traversing the directories you are creating anyway,
> so it's not much different, right?
>
> Hello
>
> Sorry for the late reply. As the first userspace consumer of this
> interface [1], I can confirm that reading a single file to get the mask
> would be better, at least for performance reason. On large platforms, we
> already have to read thousands of sysfs files to get CPU topology and
> cache information, I'd be happy not to read one more file per cpu.
>
> Reading these sysfs files is slow, and it does not scale well when
> multiple processes read them in parallel.
>
> Really?  Where is the slowdown?  Would something like readfile() work
> better for you for that?
> https://lore.kernel.org/linux-api/20200704140250.423345-1-gregkh@linuxfoundation.org/
>
> I guess readfile would improve the sequential case by avoiding syscalls
> but it would not improve the parallel case since syscalls shouldn't have
> any parallel issue?
>
> syscalls should not have parallel issues at all.
>
> We've been watching the status of readfile() since it was posted on LKML
> 6 months ago, but we were actually wondering if it would end up being
> included at some point.
>
> It needs a solid reason to be merged.  My "test" benchmarks are fun to
> run, but I have yet to find a real need for it anywhere as the
> open/read/close syscall overhead seems to be lost in the noise on any
> real application workload that I can find.
>
> If you have a real need, and it reduces overhead and cpu usage, I'm more
> than willing to update the patchset and resubmit it.
>
>
> Good, I'll give it at try.
>
>
> How does multiple processes slow anything down, there shouldn't be any
> shared locks here.
>
> When I benchmarked this in 2016, reading a single (small) sysfs file was
> 41x slower when running 64 processes simultaneously on a 64-core Knights
> Landing than reading from a single process. On a SGI Altix UV with 12x
> 8-core CPUs, reading from one process per CPU (12 total) was 60x slower
> (which could mean NUMA affinity matters), and reading from one process
> per core (96 total) was 491x slower.
>
> I will try to find some time to dig further on recent kernels with perf
> and readfile (both machines were running RHEL7).
>
> 2016 was a long time ago in kernel-land, please retest on a kernel.org
> release, not a RHEL monstrosity.
>
>
> Quick test on 5.8.14 from Debian (fairly close to mainline) on a server with 2x20 cores.
>
> I am measuring the time to do open+read+close of /sys/devices/system/cpu/cpu15/topology/die_id 1000 times
>
> With a single process, it takes 2ms (2us per open+read+close, looks OK).
>
> With one process per core (with careful binding, etc), it jumps from 2ms to 190ms (without much variation).
>
> It looks like locks in kernfs_iop_permission and kernfs_dop_revalidate are causing the issue.
>
> I am attaching the perf report callgraph output below.
>
>
>
> There are ways to avoid this
> multiple discoveries by sharing hwloc info through XML or shmem, but it
> will take years before all developers of different runtimes all
> implement this :)
>
> I don't understand, what exactly are you suggesting we do here instead?
>
> I was just saying userspace has ways to mitigate the issue but it will
> take time because many different projects are involved.
>
> I still don't understand, what issue are you referring to?
>
>
> Reading many sysfs files causing the application startup to takes many seconds when launching multiple process at the same time.
>
> Brice
>
>
> # To display the perf.data header info, please use --header/--header-only options.
> #
> #
> # Total Lost Samples: 0
> #
> # Samples: 7K of event 'cycles'
> # Event count (approx.): 5291578622
> #
> # Children      Self  Command        Shared Object      Symbol
> # ........  ........  .............  .................  .......................................
> #
>     99.91%     0.00%  fops_overhead  [kernel.kallsyms]  [k] entry_SYSCALL_64_after_hwframe
>             |
>             ---entry_SYSCALL_64_after_hwframe
>                do_syscall_64
>                |
>                |--98.69%--__x64_sys_openat
>                |          |
>                |           --98.67%--do_sys_openat2
>                |                     |
>                |                      --98.57%--do_filp_open
>                |                                path_openat
>                |                                |
>                |                                |--81.83%--link_path_walk.part.0
>                |                                |          |
>                |                                |          |--52.19%--inode_permission.part.0
>                |                                |          |          |
>                |                                |          |           --51.86%--kernfs_iop_permission
>                |                                |          |                     |
>                |                                |          |                     |--50.92%--__mutex_lock.constprop.0
>                |                                |          |                     |          |
>                |                                |          |                     |           --49.58%--osq_lock
>                |                                |          |                     |
>                |                                |          |                      --0.59%--mutex_unlock
>                |                                |          |
>                |                                |           --29.47%--walk_component
>                |                                |                     |
>                |                                |                      --29.10%--lookup_fast
>                |                                |                                |
>                |                                |                                 --28.76%--kernfs_dop_revalidate
>                |                                |                                           |
>                |                                |                                            --28.29%--__mutex_lock.constprop.0
>                |                                |                                                      |
>                |                                |                                                       --27.65%--osq_lock
>                |                                |
>                |                                |--9.60%--lookup_fast
>                |                                |          |
>                |                                |           --9.50%--kernfs_dop_revalidate
>                |                                |                     |
>                |                                |                      --9.35%--__mutex_lock.constprop.0
>                |                                |                                |
>                |                                |                                 --9.18%--osq_lock
>                |                                |
>                |                                |--6.17%--may_open
>                |                                |          |
>                |                                |           --6.13%--inode_permission.part.0
>                |                                |                     |
>                |                                |                      --6.10%--kernfs_iop_permission
>                |                                |                                |
>                |                                |                                 --5.90%--__mutex_lock.constprop.0
>                |                                |                                           |
>                |                                |                                            --5.80%--osq_lock
>                |                                |
>                |                                 --0.52%--do_dentry_open
>                |
>                 --0.63%--__prepare_exit_to_usermode
>                           |
>                            --0.58%--task_work_run
>
>     99.91%     0.01%  fops_overhead  [kernel.kallsyms]  [k] do_syscall_64
>             |
>              --99.89%--do_syscall_64
>                        |
>                        |--98.69%--__x64_sys_openat
>                        |          |
>                        |           --98.67%--do_sys_openat2
>                        |                     |
>                        |                      --98.57%--do_filp_open
>                        |                                path_openat
>                        |                                |
>                        |                                |--81.83%--link_path_walk.part.0
>                        |                                |          |
>                        |                                |          |--52.19%--inode_permission.part.0
>                        |                                |          |          |
>                        |                                |          |           --51.86%--kernfs_iop_permission
>                        |                                |          |                     |
>                        |                                |          |                     |--50.92%--__mutex_lock.constprop.0
>                        |                                |          |                     |          |
>                        |                                |          |                     |           --49.58%--osq_lock
>                        |                                |          |                     |
>                        |                                |          |                      --0.59%--mutex_unlock
>                        |                                |          |
>                        |                                |           --29.47%--walk_component
>                        |                                |                     |
>                        |                                |                      --29.10%--lookup_fast
>                        |                                |                                |
>                        |                                |                                 --28.76%--kernfs_dop_revalidate
>                        |                                |                                           |
>                        |                                |                                            --28.29%--__mutex_lock.constprop.0
>                        |                                |                                                      |
>                        |                                |                                                       --27.65%--osq_lock
>                        |                                |
>                        |                                |--9.60%--lookup_fast
>                        |                                |          |
>                        |                                |           --9.50%--kernfs_dop_revalidate
>                        |                                |                     |
>                        |                                |                      --9.35%--__mutex_lock.constprop.0
>                        |                                |                                |
>                        |                                |                                 --9.18%--osq_lock
>                        |                                |
>                        |                                |--6.17%--may_open
>                        |                                |          |
>                        |                                |           --6.13%--inode_permission.part.0
>                        |                                |                     |
>                        |                                |                      --6.10%--kernfs_iop_permission
>                        |                                |                                |
>                        |                                |                                 --5.90%--__mutex_lock.constprop.0
>                        |                                |                                           |
>                        |                                |                                            --5.80%--osq_lock
>                        |                                |
>                        |                                 --0.52%--do_dentry_open
>                        |
>                         --0.63%--__prepare_exit_to_usermode
>                                   |
>                                    --0.58%--task_work_run
>
>     98.72%     0.00%  fops_overhead  [unknown]          [k] 0x7379732f73656369
>             |
>             ---0x7379732f73656369
>                __GI___libc_open
>                |
>                 --98.70%--entry_SYSCALL_64_after_hwframe
>                           do_syscall_64
>                           |
>                            --98.66%--__x64_sys_openat
>                                      |
>                                       --98.65%--do_sys_openat2
>                                                 |
>                                                  --98.55%--do_filp_open
>                                                            path_openat
>                                                            |
>                                                            |--81.80%--link_path_walk.part.0
>                                                            |          |
>                                                            |          |--52.16%--inode_permission.part.0
>                                                            |          |          |
>                                                            |          |           --51.86%--kernfs_iop_permission
>                                                            |          |                     |
>                                                            |          |                     |--50.92%--__mutex_lock.constprop.0
>                                                            |          |                     |          |
>                                                            |          |                     |           --49.58%--osq_lock
>                                                            |          |                     |
>                                                            |          |                      --0.59%--mutex_unlock
>                                                            |          |
>                                                            |           --29.47%--walk_component
>                                                            |                     |
>                                                            |                      --29.10%--lookup_fast
>                                                            |                                |
>                                                            |                                 --28.76%--kernfs_dop_revalidate
>                                                            |                                           |
>                                                            |                                            --28.29%--__mutex_lock.constprop.0
>                                                            |                                                      |
>                                                            |                                                       --27.65%--osq_lock
>                                                            |
>                                                            |--9.60%--lookup_fast
>                                                            |          |
>                                                            |           --9.50%--kernfs_dop_revalidate
>                                                            |                     |
>                                                            |                      --9.35%--__mutex_lock.constprop.0
>                                                            |                                |
>                                                            |                                 --9.18%--osq_lock
>                                                            |
>                                                            |--6.17%--may_open
>                                                            |          |
>                                                            |           --6.13%--inode_permission.part.0
>                                                            |                     |
>                                                            |                      --6.10%--kernfs_iop_permission
>                                                            |                                |
>                                                            |                                 --5.90%--__mutex_lock.constprop.0
>                                                            |                                           |
>                                                            |                                            --5.80%--osq_lock
>                                                            |
>                                                             --0.52%--do_dentry_open
>
>     98.72%     0.00%  fops_overhead  libc-2.31.so       [.] __GI___libc_open
>             |
>             ---__GI___libc_open
>                |
>                 --98.70%--entry_SYSCALL_64_after_hwframe
>                           do_syscall_64
>                           |
>                            --98.66%--__x64_sys_openat
>                                      |
>                                       --98.65%--do_sys_openat2
>                                                 |
>                                                  --98.55%--do_filp_open
>                                                            path_openat
>                                                            |
>                                                            |--81.80%--link_path_walk.part.0
>                                                            |          |
>                                                            |          |--52.16%--inode_permission.part.0
>                                                            |          |          |
>                                                            |          |           --51.86%--kernfs_iop_permission
>                                                            |          |                     |
>                                                            |          |                     |--50.92%--__mutex_lock.constprop.0
>                                                            |          |                     |          |
>                                                            |          |                     |           --49.58%--osq_lock
>                                                            |          |                     |
>                                                            |          |                      --0.59%--mutex_unlock
>                                                            |          |
>                                                            |           --29.47%--walk_component
>                                                            |                     |
>                                                            |                      --29.10%--lookup_fast
>                                                            |                                |
>                                                            |                                 --28.76%--kernfs_dop_revalidate
>                                                            |                                           |
>                                                            |                                            --28.29%--__mutex_lock.constprop.0
>                                                            |                                                      |
>                                                            |                                                       --27.65%--osq_lock
>                                                            |
>                                                            |--9.60%--lookup_fast
>                                                            |          |
>                                                            |           --9.50%--kernfs_dop_revalidate
>                                                            |                     |
>                                                            |                      --9.35%--__mutex_lock.constprop.0
>                                                            |                                |
>                                                            |                                 --9.18%--osq_lock
>                                                            |
>                                                            |--6.17%--may_open
>                                                            |          |
>                                                            |           --6.13%--inode_permission.part.0
>                                                            |                     |
>                                                            |                      --6.10%--kernfs_iop_permission
>                                                            |                                |
>                                                            |                                 --5.90%--__mutex_lock.constprop.0
>                                                            |                                           |
>                                                            |                                            --5.80%--osq_lock
>                                                            |
>                                                             --0.52%--do_dentry_open
>
>     98.69%     0.01%  fops_overhead  [kernel.kallsyms]  [k] __x64_sys_openat
>             |
>              --98.67%--__x64_sys_openat
>                        do_sys_openat2
>                        |
>                         --98.57%--do_filp_open
>                                   path_openat
>                                   |
>                                   |--81.83%--link_path_walk.part.0
>                                   |          |
>                                   |          |--52.19%--inode_permission.part.0
>                                   |          |          |
>                                   |          |           --51.86%--kernfs_iop_permission
>                                   |          |                     |
>                                   |          |                     |--50.92%--__mutex_lock.constprop.0
>                                   |          |                     |          |
>                                   |          |                     |           --49.58%--osq_lock
>                                   |          |                     |
>                                   |          |                      --0.59%--mutex_unlock
>                                   |          |
>                                   |           --29.47%--walk_component
>                                   |                     |
>                                   |                      --29.10%--lookup_fast
>                                   |                                |
>                                   |                                 --28.76%--kernfs_dop_revalidate
>                                   |                                           |
>                                   |                                            --28.29%--__mutex_lock.constprop.0
>                                   |                                                      |
>                                   |                                                       --27.65%--osq_lock
>                                   |
>                                   |--9.60%--lookup_fast
>                                   |          |
>                                   |           --9.50%--kernfs_dop_revalidate
>                                   |                     |
>                                   |                      --9.35%--__mutex_lock.constprop.0
>                                   |                                |
>                                   |                                 --9.18%--osq_lock
>                                   |
>                                   |--6.17%--may_open
>                                   |          |
>                                   |           --6.13%--inode_permission.part.0
>                                   |                     |
>                                   |                      --6.10%--kernfs_iop_permission
>                                   |                                |
>                                   |                                 --5.90%--__mutex_lock.constprop.0
>                                   |                                           |
>                                   |                                            --5.80%--osq_lock
>                                   |
>                                    --0.52%--do_dentry_open
>
>     98.67%     0.03%  fops_overhead  [kernel.kallsyms]  [k] do_sys_openat2
>             |
>              --98.65%--do_sys_openat2
>                        |
>                         --98.57%--do_filp_open
>                                   path_openat
>                                   |
>                                   |--81.83%--link_path_walk.part.0
>                                   |          |
>                                   |          |--52.19%--inode_permission.part.0
>                                   |          |          |
>                                   |          |           --51.86%--kernfs_iop_permission
>                                   |          |                     |
>                                   |          |                     |--50.92%--__mutex_lock.constprop.0
>                                   |          |                     |          |
>                                   |          |                     |           --49.58%--osq_lock
>                                   |          |                     |
>                                   |          |                      --0.59%--mutex_unlock
>                                   |          |
>                                   |           --29.47%--walk_component
>                                   |                     |
>                                   |                      --29.10%--lookup_fast
>                                   |                                |
>                                   |                                 --28.76%--kernfs_dop_revalidate
>                                   |                                           |
>                                   |                                            --28.29%--__mutex_lock.constprop.0
>                                   |                                                      |
>                                   |                                                       --27.65%--osq_lock
>                                   |
>                                   |--9.60%--lookup_fast
>                                   |          |
>                                   |           --9.50%--kernfs_dop_revalidate
>                                   |                     |
>                                   |                      --9.35%--__mutex_lock.constprop.0
>                                   |                                |
>                                   |                                 --9.18%--osq_lock
>                                   |
>                                   |--6.17%--may_open
>                                   |          |
>                                   |           --6.13%--inode_permission.part.0
>                                   |                     |
>                                   |                      --6.10%--kernfs_iop_permission
>                                   |                                |
>                                   |                                 --5.90%--__mutex_lock.constprop.0
>                                   |                                           |
>                                   |                                            --5.80%--osq_lock
>                                   |
>                                    --0.52%--do_dentry_open
>
>     98.57%     0.00%  fops_overhead  [kernel.kallsyms]  [k] do_filp_open
>             |
>             ---do_filp_open
>                path_openat
>                |
>                |--81.83%--link_path_walk.part.0
>                |          |
>                |          |--52.19%--inode_permission.part.0
>                |          |          |
>                |          |           --51.86%--kernfs_iop_permission
>                |          |                     |
>                |          |                     |--50.92%--__mutex_lock.constprop.0
>                |          |                     |          |
>                |          |                     |           --49.58%--osq_lock
>                |          |                     |
>                |          |                      --0.59%--mutex_unlock
>                |          |
>                |           --29.47%--walk_component
>                |                     |
>                |                      --29.10%--lookup_fast
>                |                                |
>                |                                 --28.76%--kernfs_dop_revalidate
>                |                                           |
>                |                                            --28.29%--__mutex_lock.constprop.0
>                |                                                      |
>                |                                                       --27.65%--osq_lock
>                |
>                |--9.60%--lookup_fast
>                |          |
>                |           --9.50%--kernfs_dop_revalidate
>                |                     |
>                |                      --9.35%--__mutex_lock.constprop.0
>                |                                |
>                |                                 --9.18%--osq_lock
>                |
>                |--6.17%--may_open
>                |          |
>                |           --6.13%--inode_permission.part.0
>                |                     |
>                |                      --6.10%--kernfs_iop_permission
>                |                                |
>                |                                 --5.90%--__mutex_lock.constprop.0
>                |                                           |
>                |                                            --5.80%--osq_lock
>                |
>                 --0.52%--do_dentry_open
>
>     98.57%     0.01%  fops_overhead  [kernel.kallsyms]  [k] path_openat
>             |
>              --98.56%--path_openat
>                        |
>                        |--81.83%--link_path_walk.part.0
>                        |          |
>                        |          |--52.19%--inode_permission.part.0
>                        |          |          |
>                        |          |           --51.86%--kernfs_iop_permission
>                        |          |                     |
>                        |          |                     |--50.92%--__mutex_lock.constprop.0
>                        |          |                     |          |
>                        |          |                     |           --49.58%--osq_lock
>                        |          |                     |
>                        |          |                      --0.59%--mutex_unlock
>                        |          |
>                        |           --29.47%--walk_component
>                        |                     |
>                        |                      --29.10%--lookup_fast
>                        |                                |
>                        |                                 --28.76%--kernfs_dop_revalidate
>                        |                                           |
>                        |                                            --28.29%--__mutex_lock.constprop.0
>                        |                                                      |
>                        |                                                       --27.65%--osq_lock
>                        |
>                        |--9.60%--lookup_fast
>                        |          |
>                        |           --9.50%--kernfs_dop_revalidate
>                        |                     |
>                        |                      --9.35%--__mutex_lock.constprop.0
>                        |                                |
>                        |                                 --9.18%--osq_lock
>                        |
>                        |--6.17%--may_open
>                        |          |
>                        |           --6.13%--inode_permission.part.0
>                        |                     |
>                        |                      --6.10%--kernfs_iop_permission
>                        |                                |
>                        |                                 --5.90%--__mutex_lock.constprop.0
>                        |                                           |
>                        |                                            --5.80%--osq_lock
>                        |
>                         --0.52%--do_dentry_open
>
>     94.52%     1.30%  fops_overhead  [kernel.kallsyms]  [k] __mutex_lock.constprop.0
>             |
>             |--93.23%--__mutex_lock.constprop.0
>             |          |
>             |          |--92.23%--osq_lock
>             |          |
>             |           --0.55%--mutex_spin_on_owner
>             |
>              --1.30%--0x7379732f73656369
>                        __GI___libc_open
>                        entry_SYSCALL_64_after_hwframe
>                        do_syscall_64
>                        __x64_sys_openat
>                        do_sys_openat2
>                        do_filp_open
>                        path_openat
>                        |
>                         --1.09%--link_path_walk.part.0
>                                   |
>                                    --0.75%--inode_permission.part.0
>                                              kernfs_iop_permission
>                                              __mutex_lock.constprop.0
>
>     92.24%    92.22%  fops_overhead  [kernel.kallsyms]  [k] osq_lock
>             |
>              --92.22%--0x7379732f73656369
>                        __GI___libc_open
>                        entry_SYSCALL_64_after_hwframe
>                        do_syscall_64
>                        __x64_sys_openat
>                        do_sys_openat2
>                        do_filp_open
>                        path_openat
>                        |
>                        |--77.21%--link_path_walk.part.0
>                        |          |
>                        |          |--49.57%--inode_permission.part.0
>                        |          |          kernfs_iop_permission
>                        |          |          __mutex_lock.constprop.0
>                        |          |          osq_lock
>                        |          |
>                        |           --27.64%--walk_component
>                        |                     lookup_fast
>                        |                     kernfs_dop_revalidate
>                        |                     __mutex_lock.constprop.0
>                        |                     osq_lock
>                        |
>                        |--9.18%--lookup_fast
>                        |          kernfs_dop_revalidate
>                        |          __mutex_lock.constprop.0
>                        |          osq_lock
>                        |
>                         --5.80%--may_open
>                                   inode_permission.part.0
>                                   kernfs_iop_permission
>                                   __mutex_lock.constprop.0
>                                   osq_lock
>
>     81.83%     0.03%  fops_overhead  [kernel.kallsyms]  [k] link_path_walk.part.0
>             |
>              --81.80%--link_path_walk.part.0
>                        |
>                        |--52.19%--inode_permission.part.0
>                        |          |
>                        |           --51.86%--kernfs_iop_permission
>                        |                     |
>                        |                     |--50.92%--__mutex_lock.constprop.0
>                        |                     |          |
>                        |                     |           --49.58%--osq_lock
>                        |                     |
>                        |                      --0.59%--mutex_unlock
>                        |
>                         --29.47%--walk_component
>                                   |
>                                    --29.10%--lookup_fast
>                                              |
>                                               --28.76%--kernfs_dop_revalidate
>                                                         |
>                                                          --28.29%--__mutex_lock.constprop.0
>                                                                    |
>                                                                     --27.65%--osq_lock
>
>     58.32%     0.24%  fops_overhead  [kernel.kallsyms]  [k] inode_permission.part.0
>             |
>              --58.08%--inode_permission.part.0
>                        |
>                         --57.97%--kernfs_iop_permission
>                                   |
>                                   |--56.81%--__mutex_lock.constprop.0
>                                   |          |
>                                   |           --55.39%--osq_lock
>                                   |
>                                    --0.73%--mutex_unlock
>
>     57.97%     0.00%  fops_overhead  [kernel.kallsyms]  [k] kernfs_iop_permission
>             |
>             ---kernfs_iop_permission
>                |
>                |--56.81%--__mutex_lock.constprop.0
>                |          |
>                |           --55.39%--osq_lock
>                |
>                 --0.73%--mutex_unlock
>
>     38.71%     0.03%  fops_overhead  [kernel.kallsyms]  [k] lookup_fast
>             |
>              --38.68%--lookup_fast
>                        |
>                         --38.26%--kernfs_dop_revalidate
>                                   |
>                                    --37.64%--__mutex_lock.constprop.0
>                                              |
>                                               --36.83%--osq_lock
>
>     38.26%     0.04%  fops_overhead  [kernel.kallsyms]  [k] kernfs_dop_revalidate
>             |
>              --38.22%--kernfs_dop_revalidate
>                        |
>                         --37.64%--__mutex_lock.constprop.0
>                                   |
>                                    --36.83%--osq_lock
>
>     29.47%     0.03%  fops_overhead  [kernel.kallsyms]  [k] walk_component
>             |
>              --29.44%--walk_component
>                        |
>                         --29.10%--lookup_fast
>                                   |
>                                    --28.76%--kernfs_dop_revalidate
>                                              |
>                                               --28.29%--__mutex_lock.constprop.0
>                                                         |
>                                                          --27.65%--osq_lock
>
>      6.17%     0.03%  fops_overhead  [kernel.kallsyms]  [k] may_open
>             |
>              --6.14%--may_open
>                        |
>                         --6.13%--inode_permission.part.0
>                                   |
>                                    --6.10%--kernfs_iop_permission
>                                              |
>                                               --5.90%--__mutex_lock.constprop.0
>                                                         |
>                                                          --5.80%--osq_lock
>
>      1.22%     0.00%  fops_overhead  [unknown]          [k] 0x5541d68949564100
>             |
>             ---0x5541d68949564100
>                __libc_start_main
>                |
>                |--0.68%--__close
>                |          |
>                |           --0.66%--entry_SYSCALL_64_after_hwframe
>                |                     do_syscall_64
>                |                     |
>                |                      --0.61%--__prepare_exit_to_usermode
>                |                                |
>                |                                 --0.58%--task_work_run
>                |
>                 --0.54%--read
>
>      1.22%     0.00%  fops_overhead  libc-2.31.so       [.] __libc_start_main
>             |
>             ---__libc_start_main
>                |
>                |--0.68%--__close
>                |          |
>                |           --0.66%--entry_SYSCALL_64_after_hwframe
>                |                     do_syscall_64
>                |                     |
>                |                      --0.61%--__prepare_exit_to_usermode
>                |                                |
>                |                                 --0.58%--task_work_run
>                |
>                 --0.54%--read
>
>      1.06%     1.05%  fops_overhead  [kernel.kallsyms]  [k] mutex_unlock
>             |
>              --1.02%--0x7379732f73656369
>                        __GI___libc_open
>                        entry_SYSCALL_64_after_hwframe
>                        do_syscall_64
>                        __x64_sys_openat
>                        do_sys_openat2
>                        do_filp_open
>                        path_openat
>                        |
>                         --0.80%--link_path_walk.part.0
>                                   |
>                                    --0.60%--inode_permission.part.0
>                                              kernfs_iop_permission
>                                              |
>                                               --0.59%--mutex_unlock
>
>      0.88%     0.79%  fops_overhead  [kernel.kallsyms]  [k] mutex_lock
>             |
>              --0.68%--0x7379732f73656369
>                        __GI___libc_open
>                        entry_SYSCALL_64_after_hwframe
>                        do_syscall_64
>                        __x64_sys_openat
>                        do_sys_openat2
>                        do_filp_open
>                        path_openat
>
>      0.68%     0.01%  fops_overhead  libc-2.31.so       [.] __close
>             |
>              --0.67%--__close
>                        |
>                         --0.66%--entry_SYSCALL_64_after_hwframe
>                                   do_syscall_64
>                                   |
>                                    --0.61%--__prepare_exit_to_usermode
>                                              |
>                                               --0.58%--task_work_run
>
>      0.63%     0.05%  fops_overhead  [kernel.kallsyms]  [k] __prepare_exit_to_usermode
>             |
>              --0.58%--__prepare_exit_to_usermode
>                        task_work_run
>
>      0.58%     0.00%  fops_overhead  [kernel.kallsyms]  [k] task_work_run
>             |
>             ---task_work_run
>
>      0.58%     0.10%  fops_overhead  [kernel.kallsyms]  [k] dput
>      0.56%     0.55%  fops_overhead  [kernel.kallsyms]  [k] mutex_spin_on_owner
>             |
>              --0.55%--0x7379732f73656369
>                        __GI___libc_open
>                        entry_SYSCALL_64_after_hwframe
>                        do_syscall_64
>                        __x64_sys_openat
>                        do_sys_openat2
>                        do_filp_open
>                        path_openat
>
>      0.54%     0.00%  fops_overhead  libc-2.31.so       [.] read
>             |
>             ---read
>
>      0.52%     0.12%  fops_overhead  [kernel.kallsyms]  [k] do_dentry_open
>      0.50%     0.00%  fops_overhead  [kernel.kallsyms]  [k] ksys_read
>      0.50%     0.03%  fops_overhead  [kernel.kallsyms]  [k] vfs_read
>      0.46%     0.05%  fops_overhead  [kernel.kallsyms]  [k] __fput
>      0.45%     0.45%  fops_overhead  [kernel.kallsyms]  [k] lockref_put_return
>      0.43%     0.43%  fops_overhead  [kernel.kallsyms]  [k] osq_unlock
>      0.41%     0.08%  fops_overhead  [kernel.kallsyms]  [k] step_into
>      0.41%     0.08%  fops_overhead  [kernel.kallsyms]  [k] __d_lookup
>      0.37%     0.35%  fops_overhead  [kernel.kallsyms]  [k] _raw_spin_lock
>      0.35%     0.03%  fops_overhead  [kernel.kallsyms]  [k] seq_read
>      0.28%     0.01%  fops_overhead  [kernel.kallsyms]  [k] kernfs_fop_open
>      0.27%     0.03%  fops_overhead  [kernel.kallsyms]  [k] kernfs_fop_release
>      0.16%     0.01%  fops_overhead  [kernel.kallsyms]  [k] kernfs_put_open_node
>      0.16%     0.00%  fops_overhead  [kernel.kallsyms]  [k] terminate_walk
>      0.12%     0.01%  fops_overhead  [kernel.kallsyms]  [k] __alloc_file
>      0.12%     0.00%  fops_overhead  [kernel.kallsyms]  [k] alloc_empty_file
>      0.12%     0.01%  fops_overhead  [kernel.kallsyms]  [k] unlazy_walk
>      0.12%     0.05%  fops_overhead  [kernel.kallsyms]  [k] _cond_resched
>      0.12%     0.07%  fops_overhead  [kernel.kallsyms]  [k] call_rcu
>      0.10%     0.00%  fops_overhead  [kernel.kallsyms]  [k] __legitimize_path
>      0.09%     0.05%  fops_overhead  [kernel.kallsyms]  [k] sysfs_kf_seq_show
>      0.09%     0.09%  fops_overhead  [kernel.kallsyms]  [k] generic_permission
>      0.09%     0.07%  fops_overhead  [kernel.kallsyms]  [k] rcu_all_qs
>      0.09%     0.01%  fops_overhead  [kernel.kallsyms]  [k] security_file_open
>      0.08%     0.00%  fops_overhead  [kernel.kallsyms]  [k] security_file_alloc
>      0.08%     0.08%  fops_overhead  [kernel.kallsyms]  [k] lockref_get_not_dead
>      0.08%     0.03%  fops_overhead  [kernel.kallsyms]  [k] kmem_cache_alloc
>      0.08%     0.08%  fops_overhead  [kernel.kallsyms]  [k] apparmor_file_open
>      0.07%     0.05%  fops_overhead  [kernel.kallsyms]  [k] kfree
>      0.05%     0.05%  fops_overhead  [kernel.kallsyms]  [k] kernfs_fop_read
>      0.05%     0.05%  fops_overhead  [kernel.kallsyms]  [k] set_nlink
>      0.05%     0.01%  fops_overhead  [kernel.kallsyms]  [k] kernfs_seq_start
>      0.05%     0.03%  fops_overhead  [kernel.kallsyms]  [k] path_init
>      0.05%     0.00%  fops_overhead  [kernel.kallsyms]  [k] __x64_sys_close
>      0.05%     0.00%  fops_overhead  [kernel.kallsyms]  [k] filp_close
>      0.05%     0.05%  fops_overhead  [kernel.kallsyms]  [k] syscall_return_via_sysret
>      0.05%     0.03%  fops_overhead  [kernel.kallsyms]  [k] __kmalloc_node
>      0.05%     0.05%  fops_overhead  [kernel.kallsyms]  [k] rcu_segcblist_enqueue
>      0.04%     0.04%  fops_overhead  [kernel.kallsyms]  [k] vfs_open
>      0.04%     0.04%  fops_overhead  [kernel.kallsyms]  [k] _raw_spin_lock_irqsave
>      0.04%     0.03%  fops_overhead  [kernel.kallsyms]  [k] sprintf
>      0.04%     0.00%  fops_overhead  [kernel.kallsyms]  [k] dev_attr_show
>      0.04%     0.00%  fops_overhead  [kernel.kallsyms]  [k] die_id_show
>      0.04%     0.04%  fops_overhead  [kernel.kallsyms]  [k] kmem_cache_free
>      0.04%     0.04%  fops_overhead  [kernel.kallsyms]  [k] fsnotify_parent
>      0.04%     0.04%  fops_overhead  [kernel.kallsyms]  [k] security_inode_permission
>      0.04%     0.01%  fops_overhead  [kernel.kallsyms]  [k] __check_object_size
>      0.04%     0.04%  fops_overhead  [kernel.kallsyms]  [k] apparmor_file_alloc_security
>      0.04%     0.00%  fops_overhead  [kernel.kallsyms]  [k] seq_release
>      0.04%     0.04%  fops_overhead  [kernel.kallsyms]  [k] memset_erms
>      0.04%     0.04%  fops_overhead  [kernel.kallsyms]  [k] kernfs_get_active
>      0.03%     0.00%  fops_overhead  [kernel.kallsyms]  [k] try_to_wake_up
>      0.03%     0.00%  fops_overhead  [kernel.kallsyms]  [k] vsnprintf
>      0.03%     0.01%  fops_overhead  [kernel.kallsyms]  [k] mntput_no_expire
>      0.03%     0.03%  fops_overhead  [kernel.kallsyms]  [k] lockref_get
>      0.03%     0.03%  fops_overhead  [kernel.kallsyms]  [k] kernfs_put_active
>      0.03%     0.03%  fops_overhead  [kernel.kallsyms]  [k] fsnotify
>      0.03%     0.03%  fops_overhead  [kernel.kallsyms]  [k] locks_remove_posix
>      0.03%     0.00%  fops_overhead  [kernel.kallsyms]  [k] security_file_permission
>      0.03%     0.03%  fops_overhead  [kernel.kallsyms]  [k] rw_verify_area
>      0.03%     0.03%  fops_overhead  [kernel.kallsyms]  [k] set_root
>      0.03%     0.00%  fops_overhead  [kernel.kallsyms]  [k] nd_jump_root
>      0.03%     0.01%  fops_overhead  [kernel.kallsyms]  [k] wake_up_q
>      0.03%     0.00%  fops_overhead  [kernel.kallsyms]  [k] __mutex_unlock_slowpath.constprop.0
>      0.03%     0.00%  fops_overhead  [kernel.kallsyms]  [k] getname_flags.part.0
>      0.03%     0.03%  fops_overhead  [kernel.kallsyms]  [k] task_work_add
>      0.03%     0.00%  fops_overhead  [kernel.kallsyms]  [k] fput_many
>      0.03%     0.03%  fops_overhead  [kernel.kallsyms]  [k] __legitimize_mnt
>      0.03%     0.01%  fops_overhead  [kernel.kallsyms]  [k] kernfs_seq_stop
>      0.02%     0.02%  fops_overhead  [nfs]              [k] nfs_do_access
>      0.02%     0.00%  fops_overhead  ld-2.31.so         [.] _dl_map_object
>      0.02%     0.00%  fops_overhead  ld-2.31.so         [.] open_path
>      0.02%     0.00%  fops_overhead  ld-2.31.so         [.] __GI___open64_nocancel
>      0.02%     0.00%  fops_overhead  [nfs]              [k] nfs_permission
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] kernfs_seq_next
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] available_idle_cpu
>      0.01%     0.00%  fops_overhead  [unknown]          [k] 0x3931206e69207364
>      0.01%     0.00%  fops_overhead  libc-2.31.so       [.] __GI___libc_write
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] ksys_write
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] vfs_write
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] tty_write
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] n_tty_write
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] pty_write
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] queue_work_on
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] __queue_work
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] select_task_rq_fair
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] native_queued_spin_lock_slowpath
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] slab_free_freelist_hook
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] __list_del_entry_valid
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] memcg_kmem_put_cache
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] __syscall_return_slowpath
>      0.01%     0.01%  fops_overhead  libc-2.31.so       [.] _dl_addr
>      0.01%     0.00%  fops_overhead  [unknown]          [.] 0x756e696c2d34365f
>      0.01%     0.00%  fops_overhead  [unknown]          [.] 0x00007f4b1ca1e000
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] __x86_indirect_thunk_rax
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] __virt_addr_valid
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] locks_remove_file
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] memcpy_erms
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] update_rq_clock
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] entry_SYSCALL_64
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] __check_heap_object
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] apparmor_file_free_security
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] security_file_free
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] __d_lookup_rcu
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] mntput
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] get_unused_fd_flags
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] alloc_slab_page
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] __slab_alloc
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] ___slab_alloc
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] allocate_slab
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] __alloc_fd
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] legitimize_root
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] strncpy_from_user
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] kernfs_refresh_inode
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] build_open_flags
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] strcmp
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] memcg_kmem_get_cache
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] asm_sysvec_apic_timer_interrupt
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] sysvec_apic_timer_interrupt
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] asm_call_sysvec_on_stack
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] __sysvec_apic_timer_interrupt
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] hrtimer_interrupt
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] __hrtimer_run_queues
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] tick_sched_timer
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] tick_sched_handle
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] update_process_times
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] scheduler_tick
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] perf_iterate_ctx
>      0.01%     0.00%  fops_overhead  [unknown]          [k] 0x00007fd34e3a0627
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] __x64_sys_execve
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] do_execve
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] __do_execve_file
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] load_elf_binary
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] elf_map
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] vm_mmap_pgoff
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] do_mmap
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] mmap_region
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] perf_event_mmap
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] perf_iterate_sb
>      0.00%     0.00%  perf_5.8       [unknown]          [k] 0x00007fd34e3a0627
>      0.00%     0.00%  perf_5.8       [kernel.kallsyms]  [k] entry_SYSCALL_64_after_hwframe
>      0.00%     0.00%  perf_5.8       [kernel.kallsyms]  [k] perf_event_exec
>      0.00%     0.00%  perf_5.8       [kernel.kallsyms]  [k] do_syscall_64
>      0.00%     0.00%  perf_5.8       [kernel.kallsyms]  [k] __x64_sys_execve
>      0.00%     0.00%  perf_5.8       [kernel.kallsyms]  [k] do_execve
>      0.00%     0.00%  perf_5.8       [kernel.kallsyms]  [k] __do_execve_file
>      0.00%     0.00%  perf_5.8       [kernel.kallsyms]  [k] load_elf_binary
>      0.00%     0.00%  perf_5.8       [kernel.kallsyms]  [k] begin_new_exec
>      0.00%     0.00%  perf_5.8       [kernel.kallsyms]  [k] native_write_msr
>      0.00%     0.00%  perf_5.8       [kernel.kallsyms]  [k] __intel_pmu_enable_all.constprop.0
>      0.00%     0.00%  perf_5.8       [kernel.kallsyms]  [k] acpi_os_read_memory
>
>
> #
> # (Tip: To count events in every 1000 msec: perf stat -I 1000)
> #
>

Hi Brice,

I wrote a benchmark to do open+read+close on
/sys/devices/system/cpu/cpu0/topology/die_id
https://github.com/foxhlchen/sysfs_benchmark/blob/main/main.c


+    3.39%     3.37%  a.out  [kernel.kallsyms]  [k] mutex_unlock
                                 ◆
+    2.76%     2.74%  a.out  [kernel.kallsyms]  [k] mutex_lock
                                 ▒
+    0.92%     0.42%  a.out  [kernel.kallsyms]  [k]
__mutex_lock.constprop.0                            ▒
     0.38%     0.37%  a.out  [kernel.kallsyms]  [k]
mutex_spin_on_owner                                 ▒
     0.05%     0.05%  a.out  [kernel.kallsyms]  [k] __mutex_init
                                 ▒
     0.01%     0.01%  a.out  [kernel.kallsyms]  [k]
__mutex_lock_slowpath                               ▒
     0.01%     0.00%  a.out  [kernel.kallsyms]  [k]
__mutex_unlock_slowpath.constprop.0

But I failed to reproduce your result.

If it is possible, would you mind providing your benchmark code? :)


thanks,
fox

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ