linux-kernel - Re: Possible bug report: kernel 6.5.0/6.5.1 high load when CIFS share is mounted (cifsd-cfid-laundromat in"D" state)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAO+kfxQgXOsx6u+xLKGJe0KDiFsRAGstSpnrwxjQF6udgz5HFQ@mail.gmail.com>
Date:   Wed, 6 Sep 2023 17:03:48 -0400
From:   Brian Pardy <brian.pardy@...il.com>
To:     Bagas Sanjaya <bagasdotme@...il.com>,
        Linux CIFS <linux-cifs@...r.kernel.org>
Cc:     Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Linux Regressions <regressions@...ts.linux.dev>,
        Steve French <sfrench@...ba.org>,
        Ronnie Sahlberg <lsahlber@...hat.com>
Subject: Re: Possible bug report: kernel 6.5.0/6.5.1 high load when CIFS share
 is mounted (cifsd-cfid-laundromat in"D" state)

Added committer Ronnie Sahlberg to CC.

On Tue, Sep 5, 2023 at 9:01 PM Bagas Sanjaya <bagasdotme@...il.com> wrote:
> On Tue, Sep 05, 2023 at 01:09:05PM -0400, Brian Pardy wrote:
> > I've noticed an issue with the CIFS client in kernel 6.5.0/6.5.1 that
> > does not exist in 6.4.12 or other previous kernels (I have not tested
> > 6.4.13). Almost immediately after mounting a CIFS share, the reported
> > load average on my system goes up by 2. At the time this occurs I see
> > two [cifsd-cfid-laundromat] kernel threads running the "D" state,
> > where they remain for the entire time the CIFS share is mounted. The
> > load will remain stable at 2 (otherwise idle) until the share is
> > unmounted, at which point the [cifsd-cfid-laundromat] threads
> > disappear and load drops back down to 0. This is easily reproducible
> > on my system, but I am not sure what to do to retrieve more useful
> > debugging information. If I mount two shares from this server, I get
> > four laundromat threads in "D" state and a sustained load average of
> > 4.
> >
> > The client is running Gentoo Linux, the server is a Seagate Personal
> > Cloud NAS running Samba 4.6.5. Mount options used are
> > "noperm,guest,vers=3.02". The CPUs do not actually appear to be
> > spinning, the reported load average appears incorrect as far as actual
> > CPU use is concerned.
>
> Thanks for the regression report. But if you want to get it fixed,
> you have to do your part: perform bisection. See Documentation/admin-guide/bug-bisect.rst in the kernel sources for how to do that.
>
> Anyway, I'm adding it to regzbot:
>
> #regzbot ^introduced: v6.4..v6.5
> #regzbot title: incorrect CPU utilization report (multiplied) when mounting CIFS

Thank you for directing me to the bug-bisect documentation. Results below:

# git bisect bad
d14de8067e3f9653cdef5a094176d00f3260ab20 is the first bad commit
commit d14de8067e3f9653cdef5a094176d00f3260ab20
Author: Ronnie Sahlberg <lsahlber@...hat.com>
Date:   Thu Jul 6 12:32:24 2023 +1000

    cifs: Add a laundromat thread for cached directories

    and drop cached directories after 30 seconds

    Signed-off-by: Ronnie Sahlberg <lsahlber@...hat.com>
    Signed-off-by: Steve French <stfrench@...rosoft.com>

 fs/smb/client/cached_dir.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/smb/client/cached_dir.h |  1 +
 2 files changed, 68 insertions(+)

I do not know what other debug info may be useful, but here is
/proc/[pid]/stack output for one of these threads in D state:

# cat /proc/17314/stack
[<0>] msleep+0x24/0x40
[<0>] cifs_cfids_laundromat_thread+0x5e/0x1c0 [cifs]
[<0>] kthread+0xc4/0xf0
[<0>] ret_from_fork+0x28/0x40
[<0>] ret_from_fork_asm+0x1b/0x30

I will provide any other details requested. Thank you.

#regzbot introduced: d14de8067e3