lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <167308517118.1538866.3440481802366869065.stgit@warthog.procyon.org.uk>
Date:   Sat, 07 Jan 2023 09:52:51 +0000
From:   David Howells <dhowells@...hat.com>
To:     netdev@...r.kernel.org
Cc:     Marc Dionne <marc.dionne@...istor.com>,
        syzbot+c22650d2844392afdcfd@...kaller.appspotmail.com,
        linux-afs@...ts.infradead.org, dhowells@...hat.com,
        linux-afs@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: [PATCH net 00/19] rxrpc: Fix race between call connection,
 data transmit and call disconnect


Here are patches to fix an oops[1] caused by a race between call
connection, initial packet transmission and call disconnection which
results in something like:

	kernel BUG at net/rxrpc/peer_object.c:413!

when the syzbot test is run.  The problem is that the connection procedure
is effectively split across two threads and can get expanded by taking an
interrupt, thereby adding the call to the peer error distribution list
*after* it has been disconnected (say by the rxrpc socket shutting down).

The easiest solution is to look at the fourth set of I/O thread
conversion/SACK table expansion patches that didn't get applied[2] and take
from it those patches that move call connection and disconnection into the
I/O thread.  Moving these things into the I/O thread means that the
sequencing is managed by all being done in the same thread - and the race
can no longer happen.

This is preferable to introducing an extra lock as adding an extra lock
would make the I/O thread have to wait for the app thread in yet another
place.

The changes can be considered as a number of logical parts:

 (1) Move all of the call state changes into the I/O thread.

 (2) Make client connection ID space per-local endpoint so that the I/O
     thread doesn't need locks to access it.

 (3) Move actual abort generation into the I/O thread and clean it up.  If
     sendmsg or recvmsg want to cause an abort, they have to delegate it.

 (4) Offload the setting up of the security context on a connection to the
     thread of one of the apps that's starting a call.  We don't want to be
     doing any sort of crypto in the I/O thread.

 (5) Connect calls (ie. assign them to channel slots on connections) in the
     I/O thread.  Calls are set up by sendmsg/kafs and passed to the I/O
     thread to connect.  Connections are allocated in the I/O thread after
     this.

 (6) Disconnect calls in the I/O thread.

I've also added a patch for an unrelated bug that cropped up during
testing, whereby a race can occur between an incoming call and socket
shutdown.


Note that whilst this fixes the original syzbot bug, another bug may get
triggered if this one is fixed:

	INFO: rcu detected stall in corrupted
	rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { P5792 } 2657 jiffies s: 2825 root: 0x0/T
	rcu: blocking rcu_node structures (internal RCU debug):

It doesn't look this should be anything to do with rxrpc, though, as I've
tested an additional patch[3] that removes practically all the RCU usage
from rxrpc and it still occurs.  It seems likely that it is being caused by
something in the tunnelling setup that the syzbot test does, but there's
not enough info to go on.  It also seems unlikely to be anything to do with
the afs driver as the test doesn't use that.


The patches are tagged here:

	git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
	rxrpc-fixes-20230107

and can also be found on the following branch:

	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-fixes

David

Link: https://syzkaller.appspot.com/bug?extid=c22650d2844392afdcfd [1]
Link: https://lore.kernel.org/r/167034231605.1105287.1693064952174322878.stgit@warthog.procyon.org.uk/ [2]
Link: https://lore.kernel.org/r/1278570.1673042093@warthog.procyon.org.uk/ [3]

---
David Howells (19):
      rxrpc: Stash the network namespace pointer in rxrpc_local
      rxrpc: Make the local endpoint hold a ref on a connected call
      rxrpc: Separate call retransmission from other conn events
      rxrpc: Only set/transmit aborts in the I/O thread
      rxrpc: Only disconnect calls in the I/O thread
      rxrpc: Implement a mechanism to send an event notification to a connection
      rxrpc: Clean up connection abort
      rxrpc: Tidy up abort generation infrastructure
      rxrpc: Make the set of connection IDs per local endpoint
      rxrpc: Offload the completion of service conn security to the I/O thread
      rxrpc: Set up a connection bundle from a call, not rxrpc_conn_parameters
      rxrpc: Split out the call state changing functions into their own file
      rxrpc: Wrap accesses to get call state to put the barrier in one place
      rxrpc: Move call state changes from sendmsg to I/O thread
      rxrpc: Move call state changes from recvmsg to I/O thread
      rxrpc: Remove call->state_lock
      rxrpc: Move the client conn cache management to the I/O thread
      rxrpc: Move client call connection to the I/O thread
      rxrpc: Fix incoming call setup race


 Documentation/networking/rxrpc.rst |   4 +-
 fs/afs/cmservice.c                 |   6 +-
 fs/afs/rxrpc.c                     |  24 +-
 include/net/af_rxrpc.h             |   3 +-
 include/trace/events/rxrpc.h       | 160 +++++--
 net/rxrpc/Makefile                 |   1 +
 net/rxrpc/af_rxrpc.c               |  27 +-
 net/rxrpc/ar-internal.h            | 212 ++++++---
 net/rxrpc/call_accept.c            |  57 ++-
 net/rxrpc/call_event.c             |  86 +++-
 net/rxrpc/call_object.c            | 116 +++--
 net/rxrpc/call_state.c             |  69 +++
 net/rxrpc/conn_client.c            | 709 ++++++++---------------------
 net/rxrpc/conn_event.c             | 382 ++++++----------
 net/rxrpc/conn_object.c            |  67 ++-
 net/rxrpc/conn_service.c           |   1 -
 net/rxrpc/input.c                  | 175 +++----
 net/rxrpc/insecure.c               |  20 +-
 net/rxrpc/io_thread.c              | 204 +++++----
 net/rxrpc/local_object.c           |  35 +-
 net/rxrpc/net_ns.c                 |  17 -
 net/rxrpc/output.c                 |  60 ++-
 net/rxrpc/peer_object.c            |  23 +-
 net/rxrpc/proc.c                   |  17 +-
 net/rxrpc/recvmsg.c                | 256 +++--------
 net/rxrpc/rxkad.c                  | 356 ++++++---------
 net/rxrpc/rxperf.c                 |  17 +-
 net/rxrpc/security.c               |  53 +--
 net/rxrpc/sendmsg.c                | 195 ++++----
 29 files changed, 1592 insertions(+), 1760 deletions(-)
 create mode 100644 net/rxrpc/call_state.c


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ