linux-kernel - Buggy rwsem locking code in fs/smb/client/file.c

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230703114318.1576ea24@rorschach.local.home>
Date:   Mon, 3 Jul 2023 11:43:18 -0400
From:   Steven Rostedt <rostedt@...dmis.org>
To:     LKML <linux-kernel@...r.kernel.org>
Cc:     Dave Wysochanski <dwysocha@...hat.com>,
        Ronnie Sahlberg <lsahlber@...hat.com>,
        Pavel Shilovsky <pshilov@...rosoft.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ajay Kaher <akaher@...are.com>
Subject: Buggy rwsem locking code in fs/smb/client/file.c

I just reviewed a patch that copied a solution from
fs/smb/client/file.c (original was fs/cifs/file.c), which is really
just hiding a bug. And because this code could have possibly caused
this buggy solution to be repeated, I believe it should be fixed,
before others use it as precedent in other areas of the kernel.

Commit d46b0da7a33dd ("cifs: Fix cifsInodeInfo lock_sem deadlock when
reconnect occurs") has in its change log:

    There's a deadlock that is possible and can easily be seen with
    a test where multiple readers open/read/close of the same file
    and a disruption occurs causing reconnect.  The deadlock is due
    a reader thread inside cifs_strict_readv calling down_read and
    obtaining lock_sem, and then after reconnect inside
    cifs_reopen_file calling down_read a second time.  If in
    between the two down_read calls, a down_write comes from
    another process, deadlock occurs.

            CPU0                    CPU1
            ----                    ----
    cifs_strict_readv()
     down_read(&cifsi->lock_sem);
                                   _cifsFileInfo_put
                                      OR
                                   cifs_new_fileinfo
                                    down_write(&cifsi->lock_sem);
    cifs_reopen_file()
     down_read(&cifsi->lock_sem);

    Fix the above by changing all down_write(lock_sem) calls to
    down_write_trylock(lock_sem)/msleep() loop, which in turn
    makes the second down_read call benign since it will never
    block behind the writer while holding lock_sem.

And hides the bug by wrapping the down_write() with:

+void
+cifs_down_write(struct rw_semaphore *sem)
+{
+       while (!down_write_trylock(sem))
+               msleep(10);
+}
+

The comment above down_read_nested() has:

/*
 * nested locking. NOTE: rwsems are not allowed to recurse
 * (which occurs if the same task tries to acquire the same
 * lock instance multiple times), but multiple locks of the
 * same lock class might be taken, if the order of the locks
 * is always the same. This ordering rule can be expressed
 * to lockdep via the _nested() APIs, but enumerating the
 * subclasses that are used. (If the nesting relationship is
 * static then another method for expressing nested locking is
 * the explicit definition of lock class keys and the use of
 * lockdep_set_class() at lock initialization time.
 * See Documentation/locking/lockdep-design.rst for more details.)
 */

As the NOTE above states, down_read() is not a recursive lock, which
appears to be what cifs is using it for. I wonder if it could be
converted to using RCU instead.

I'm just bringing this to everyone's attention because that code really
needs to be fixed.

-- Steve