lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 01 Nov 2023 21:20:08 +0000
From: David Howells <dhowells@...hat.com>
To: Oleg Nesterov <oleg@...hat.com>
Cc: dhowells@...hat.com, Marc Dionne <marc.dionne@...istor.com>,
    Alexander Viro <viro@...iv.linux.org.uk>,
    "David S. Miller" <davem@...emloft.net>,
    Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
    Paolo Abeni <pabeni@...hat.com>,
    Chuck Lever <chuck.lever@...cle.com>, linux-afs@...ts.infradead.org,
    netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] rxrpc_find_service_conn_rcu: use read_seqbegin() rather than read_seqbegin_or_lock()

Oleg Nesterov <oleg@...hat.com> wrote:

> > 	1	0	need_seqretry() [seq=even; sequence!=seq: retry]
>
> Yes, if CPU_1 races with write_seqlock() need_seqretry() returns true,
>
> > 	1	1	read_seqbegin_or_lock() [exclusive]
>
> No. "seq" is still even, so read_seqbegin_or_lock() won't do read_seqlock_excl(),
> it will do

Yeah, you're right.  I missed the fact that I got in the second example that
read_seqbegin_or_lock() spins until it sees a positive seq.

However, I think just changing all of these to always-lockless isn't
necessarily the most optimal way.  Yes, it will work... eventually.  But the
point is to limit the number of iterations.

So I have the following:

 (1) rxrpc_find_service_conn_rcu().

     I want to move the first part of the reaper to the I/O thread at some
     point, then the locking here can go away entirely.  However, this is
     drivable by external events, so I would prefer to limit the number of
     passes to just two and take a lock on the second pass.  Holding up the
     reaper thread for the moment is fine; holding up the I/O thread is not.

 (2) afs_lookup_volume_rcu().

     There can be a lot of volumes known by a system.  A thousand would
     require a 10-step walk and this is drivable by remote operation, so I
     think this should probably take a lock on the second pass too.

 (3) afs_check_validity().
 (4) afs_get_attr().

     These are both pretty short, so your solution is probably good for them.
     That said, afs_vnode_commit_status() can spend a long time under the
     write lock - and pretty much every file RPC op returns a status update.

 (5) afs_find_server().

     There could be a lot of servers in the list and each server can have
     multiple addresses, so I think this would be better with an exclusive
     second pass.

     The server list isn't likely to change all that often, but when it does
     change, there's a good chance several servers are going to be
     added/removed one after the other.  Further, this is only going to be
     used for incoming cache management/callback requests from the server,
     which hopefully aren't going to happen too often - but it is remotely
     drivable.

 (6) afs_find_server_by_uuid().

     Similarly to (5), there could be a lot of servers to search through, but
     they are in a tree not a flat list, so it should be faster to process.
     Again, it's not likely to change that often and, again, when it does
     change it's likely to involve multiple changes.  This can be driven
     remotely by an incoming cache management request but is mostly going to
     be driven by setting up or reconfiguring a volume's server list -
     something that also isn't likely to happen often.

I wonder if struct seqlock would make more sense with an rwlock rather than a
spinlock.  As it is, it does an exclusive spinlock for the readpath which is
kind of overkill.

David


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ