[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230703114318.1576ea24@rorschach.local.home>
Date: Mon, 3 Jul 2023 11:43:18 -0400
From: Steven Rostedt <rostedt@...dmis.org>
To: LKML <linux-kernel@...r.kernel.org>
Cc: Dave Wysochanski <dwysocha@...hat.com>,
Ronnie Sahlberg <lsahlber@...hat.com>,
Pavel Shilovsky <pshilov@...rosoft.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Peter Zijlstra <peterz@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ajay Kaher <akaher@...are.com>
Subject: Buggy rwsem locking code in fs/smb/client/file.c
I just reviewed a patch that copied a solution from
fs/smb/client/file.c (original was fs/cifs/file.c), which is really
just hiding a bug. And because this code could have possibly caused
this buggy solution to be repeated, I believe it should be fixed,
before others use it as precedent in other areas of the kernel.
Commit d46b0da7a33dd ("cifs: Fix cifsInodeInfo lock_sem deadlock when
reconnect occurs") has in its change log:
There's a deadlock that is possible and can easily be seen with
a test where multiple readers open/read/close of the same file
and a disruption occurs causing reconnect. The deadlock is due
a reader thread inside cifs_strict_readv calling down_read and
obtaining lock_sem, and then after reconnect inside
cifs_reopen_file calling down_read a second time. If in
between the two down_read calls, a down_write comes from
another process, deadlock occurs.
CPU0 CPU1
---- ----
cifs_strict_readv()
down_read(&cifsi->lock_sem);
_cifsFileInfo_put
OR
cifs_new_fileinfo
down_write(&cifsi->lock_sem);
cifs_reopen_file()
down_read(&cifsi->lock_sem);
Fix the above by changing all down_write(lock_sem) calls to
down_write_trylock(lock_sem)/msleep() loop, which in turn
makes the second down_read call benign since it will never
block behind the writer while holding lock_sem.
And hides the bug by wrapping the down_write() with:
+void
+cifs_down_write(struct rw_semaphore *sem)
+{
+ while (!down_write_trylock(sem))
+ msleep(10);
+}
+
The comment above down_read_nested() has:
/*
* nested locking. NOTE: rwsems are not allowed to recurse
* (which occurs if the same task tries to acquire the same
* lock instance multiple times), but multiple locks of the
* same lock class might be taken, if the order of the locks
* is always the same. This ordering rule can be expressed
* to lockdep via the _nested() APIs, but enumerating the
* subclasses that are used. (If the nesting relationship is
* static then another method for expressing nested locking is
* the explicit definition of lock class keys and the use of
* lockdep_set_class() at lock initialization time.
* See Documentation/locking/lockdep-design.rst for more details.)
*/
As the NOTE above states, down_read() is not a recursive lock, which
appears to be what cifs is using it for. I wonder if it could be
converted to using RCU instead.
I'm just bringing this to everyone's attention because that code really
needs to be fixed.
-- Steve
Powered by blists - more mailing lists