netdev - Re: [PATCH] rxrpc_find_service_conn_rcu: use read_seqbegin() rather than read_seqbegin_or

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1959032.1698873608@warthog.procyon.org.uk>
Date: Wed, 01 Nov 2023 21:20:08 +0000
From: David Howells <dhowells@...hat.com>
To: Oleg Nesterov <oleg@...hat.com>
Cc: dhowells@...hat.com, Marc Dionne <marc.dionne@...istor.com>,
    Alexander Viro <viro@...iv.linux.org.uk>,
    "David S. Miller" <davem@...emloft.net>,
    Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
    Paolo Abeni <pabeni@...hat.com>,
    Chuck Lever <chuck.lever@...cle.com>, linux-afs@...ts.infradead.org,
    netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] rxrpc_find_service_conn_rcu: use read_seqbegin() rather than read_seqbegin_or_lock()

Oleg Nesterov <oleg@...hat.com> wrote:

> > 	1	0	need_seqretry() [seq=even; sequence!=seq: retry]
>
> Yes, if CPU_1 races with write_seqlock() need_seqretry() returns true,
>
> > 	1	1	read_seqbegin_or_lock() [exclusive]
>
> No. "seq" is still even, so read_seqbegin_or_lock() won't do read_seqlock_excl(),
> it will do

Yeah, you're right.  I missed the fact that I got in the second example that
read_seqbegin_or_lock() spins until it sees a positive seq.

However, I think just changing all of these to always-lockless isn't
necessarily the most optimal way.  Yes, it will work... eventually.  But the
point is to limit the number of iterations.

So I have the following:

 (1) rxrpc_find_service_conn_rcu().

     I want to move the first part of the reaper to the I/O thread at some
     point, then the locking here can go away entirely.  However, this is
     drivable by external events, so I would prefer to limit the number of
     passes to just two and take a lock on the second pass.  Holding up the
     reaper thread for the moment is fine; holding up the I/O thread is not.

 (2) afs_lookup_volume_rcu().

     There can be a lot of volumes known by a system.  A thousand would
     require a 10-step walk and this is drivable by remote operation, so I
     think this should probably take a lock on the second pass too.

 (3) afs_check_validity().
 (4) afs_get_attr().

     These are both pretty short, so your solution is probably good for them.
     That said, afs_vnode_commit_status() can spend a long time under the
     write lock - and pretty much every file RPC op returns a status update.

 (5) afs_find_server().

     There could be a lot of servers in the list and each server can have
     multiple addresses, so I think this would be better with an exclusive
     second pass.

     The server list isn't likely to change all that often, but when it does
     change, there's a good chance several servers are going to be
     added/removed one after the other.  Further, this is only going to be
     used for incoming cache management/callback requests from the server,
     which hopefully aren't going to happen too often - but it is remotely
     drivable.

 (6) afs_find_server_by_uuid().

     Similarly to (5), there could be a lot of servers to search through, but
     they are in a tree not a flat list, so it should be faster to process.
     Again, it's not likely to change that often and, again, when it does
     change it's likely to involve multiple changes.  This can be driven
     remotely by an incoming cache management request but is mostly going to
     be driven by setting up or reconfiguring a volume's server list -
     something that also isn't likely to happen often.

I wonder if struct seqlock would make more sense with an rwlock rather than a
spinlock.  As it is, it does an exclusive spinlock for the readpath which is
kind of overkill.

David