lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOQ4uxgA8OAp5Htv9qBtW7S9J-YhyJeatiXTtzyw-1maraRZrA@mail.gmail.com>
Date: Fri, 26 Jan 2024 13:40:40 +0200
From: Amir Goldstein <amir73il@...il.com>
To: Vinicius Costa Gomes <vinicius.gomes@...el.com>
Cc: brauner@...nel.org, hu1.chen@...el.com, miklos@...redi.hu, 
	malini.bhandaru@...el.com, tim.c.chen@...el.com, mikko.ylinen@...el.com, 
	lizhen.you@...el.com, linux-unionfs@...r.kernel.org, 
	linux-kernel@...r.kernel.org, linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: [RFC v2 0/4] overlayfs: Optimize override/revert creds

cc: fsdevel

On Fri, Jan 26, 2024 at 1:57 AM Vinicius Costa Gomes
<vinicius.gomes@...el.com> wrote:
>
> Hi,
>

Hi Vinicius,

I have some specific comments about the overlayfs patch,
but first I prefer to provide higher level feedback on the series.

> It was noticed that some workloads suffer from contention on
> increasing/decrementing the ->usage counter in their credentials,
> those refcount operations are associated with overriding/reverting the
> current task credentials. (the linked thread adds more context)
>
> In some specialized cases, overlayfs is one of them, the credentials
> in question have a longer lifetime than the override/revert "critical
> section". In the overlayfs case, the credentials are created when the
> fs is mounted and destroyed when it's unmounted. In this case of long
> lived credentials, the usage counter doesn't need to be
> incremented/decremented.
>
> Add a lighter version of credentials override/revert to be used in
> these specialized cases. To make sure that the override/revert calls
> are paired, add a cleanup guard macro. This was suggested here:
>
> https://lore.kernel.org/all/20231219-marken-pochen-26d888fb9bb9@brauner/
>
> With a small number of tweaks:
>  - Used inline functions instead of macros;
>  - A small change to store the credentials into the passed argument,
>    the guard is now defined as (note the added '_T ='):
>
>       DEFINE_GUARD(cred, const struct cred *, _T = override_creds_light(_T),
>                   revert_creds_light(_T));
>
>  - Allow "const" arguments to be used with these kind of guards;
>
> Some comments:
>  - If patch 1/4 is not a good idea (adding the cast), the alternative
>    I can see is using some kind of container for the credentials;
>  - The only user for the backing file ops is overlayfs, so these
>    changes make sense, but may not make sense in the most general
>    case;
>
> For the numbers, some from 'perf c2c', before this series:
> (edited to fit)
>
> #
> #        ----- HITM -----                                        Shared
> #   Num  RmtHitm  LclHitm                      Symbol            Object         Source:Line  Node
> # .....  .......  .......  ..........................  ................  .................  ....
> #
>   -------------------------
>       0      412     1028
>   -------------------------
>           41.50%   42.22%  [k] revert_creds            [kernel.vmlinux]  atomic64_64.h:39     0  1
>           15.05%   10.60%  [k] override_creds          [kernel.vmlinux]  atomic64_64.h:25     0  1
>            0.73%    0.58%  [k] init_file               [kernel.vmlinux]  atomic64_64.h:25     0  1
>            0.24%    0.10%  [k] revert_creds            [kernel.vmlinux]  cred.h:266           0  1
>           32.28%   37.16%  [k] generic_permission      [kernel.vmlinux]  mnt_idmapping.h:81   0  1
>            9.47%    8.75%  [k] generic_permission      [kernel.vmlinux]  mnt_idmapping.h:81   0  1
>            0.49%    0.58%  [k] inode_owner_or_capable  [kernel.vmlinux]  mnt_idmapping.h:81   0  1
>            0.24%    0.00%  [k] generic_permission      [kernel.vmlinux]  namei.c:354          0
>
>   -------------------------
>       1       50      103
>   -------------------------
>          100.00%  100.00%  [k] update_cfs_group  [kernel.vmlinux]  atomic64_64.h:15   0  1
>
>   -------------------------
>       2       50       98
>   -------------------------
>           96.00%   96.94%  [k] update_cfs_group  [kernel.vmlinux]  atomic64_64.h:15   0  1
>            2.00%    1.02%  [k] update_load_avg   [kernel.vmlinux]  atomic64_64.h:25   0  1
>            0.00%    2.04%  [k] update_load_avg   [kernel.vmlinux]  fair.c:4118        0
>            2.00%    0.00%  [k] update_cfs_group  [kernel.vmlinux]  fair.c:3932        0  1
>
> after this series:
>
> #
> #        ----- HITM -----                                   Shared
> #   Num  RmtHitm  LclHitm                 Symbol            Object       Source:Line  Node
> # .....  .......  .......   ....................  ................  ...............  ....
> #
>   -------------------------
>       0       54       88
>   -------------------------
>          100.00%  100.00%   [k] update_cfs_group  [kernel.vmlinux]  atomic64_64.h:15   0  1
>
>   -------------------------
>       1       48       83
>   -------------------------
>           97.92%   97.59%   [k] update_cfs_group  [kernel.vmlinux]  atomic64_64.h:15   0  1
>            2.08%    1.20%   [k] update_load_avg   [kernel.vmlinux]  atomic64_64.h:25   0  1
>            0.00%    1.20%   [k] update_load_avg   [kernel.vmlinux]  fair.c:4118        0  1
>
>   -------------------------
>       2       28       44
>   -------------------------
>           85.71%   79.55%   [k] generic_permission      [kernel.vmlinux]  mnt_idmapping.h:81   0  1
>           14.29%   20.45%   [k] generic_permission      [kernel.vmlinux]  mnt_idmapping.h:81   0  1
>
>
> The contention is practically gone.

That is very impressive.
Can you say which workloads were running during this test?
Specifically, I am wondering how much of the improvement came from
backing_file.c and how much from overlayfs/*.c.

The reason I am asking is because the overlayfs patch is quite large and can
take more time to review, so I am wondering out loud if we are not
better off this
course of action:

1. convert backing_file.c to use new helpers/guards
2. convert overlayfs to use new helpers/guards

#1 should definitely go in via Christian's tree and should get a wider review
from fsdevel (please CC fsdevel next time)

#2 is contained for overlayfs reviewers. Once the helpers are merged
and used by backing_file helpers, overlayfs can be converted independently.

#1 and #2 could both be merged in the same merge cycle, or not, it does not
matter. Most likely, #2 will go through Christian's tree as well, but I think we
need to work according to this merge order.

We can also work on the review in parallel and you may keep the overlayfs
patch in following posts, just wanted us to be on the same page w.r.t to
the process.

Thanks,
Amir.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ