[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <f907fca960f6f7393f3393330941621721efb2cc.1670609077.git.bcodding@redhat.com>
Date: Fri, 9 Dec 2022 13:19:24 -0500
From: Benjamin Coddington <bcodding@...hat.com>
To: "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>
Cc: Philipp Reisner <philipp.reisner@...bit.com>,
Lars Ellenberg <lars.ellenberg@...bit.com>,
Christoph Böhmwalder
<christoph.boehmwalder@...bit.com>, Jens Axboe <axboe@...nel.dk>,
Josef Bacik <josef@...icpanda.com>,
Keith Busch <kbusch@...nel.org>,
Christoph Hellwig <hch@....de>,
Sagi Grimberg <sagi@...mberg.me>,
Lee Duncan <lduncan@...e.com>, Chris Leech <cleech@...hat.com>,
Mike Christie <michael.christie@...cle.com>,
"James E.J. Bottomley" <jejb@...ux.ibm.com>,
"Martin K. Petersen" <martin.petersen@...cle.com>,
Valentina Manea <valentina.manea.m@...il.com>,
Shuah Khan <shuah@...nel.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
David Howells <dhowells@...hat.com>,
Marc Dionne <marc.dionne@...istor.com>,
Steve French <sfrench@...ba.org>,
Christine Caulfield <ccaulfie@...hat.com>,
David Teigland <teigland@...hat.com>,
Mark Fasheh <mark@...heh.com>,
Joel Becker <jlbec@...lplan.org>,
Joseph Qi <joseph.qi@...ux.alibaba.com>,
Eric Van Hensbergen <ericvh@...il.com>,
Latchesar Ionkov <lucho@...kov.net>,
Dominique Martinet <asmadeus@...ewreck.org>,
Ilya Dryomov <idryomov@...il.com>,
Xiubo Li <xiubli@...hat.com>,
Chuck Lever <chuck.lever@...cle.com>,
Jeff Layton <jlayton@...nel.org>,
Trond Myklebust <trond.myklebust@...merspace.com>,
Anna Schumaker <anna@...nel.org>,
Steffen Klassert <steffen.klassert@...unet.com>,
Herbert Xu <herbert@...dor.apana.org.au>,
netdev@...r.kernel.org
Subject: [PATCH net v2 2/3] Treewide: Stop corrupting socket's task_frag
Since moving to memalloc_nofs_save/restore, SUNRPC has stopped setting the
GFP_NOIO flag on sk_allocation which the networking system uses to decide
when it is safe to use current->task_frag. The results of this are
unexpected corruption in task_frag when SUNRPC is involved in memory
reclaim.
The corruption can be seen in crashes, but the root cause is often
difficult to ascertain as a crashing machine's stack trace will have no
evidence of being near NFS or SUNRPC code. I believe this problem to
be much more pervasive than reports to the community may indicate.
Fix this by having kernel users of sockets that may corrupt task_frag due
to reclaim set sk_use_task_frag = false. Preemptively correcting this
situation for users that still set sk_allocation allows them to convert to
memalloc_nofs_save/restore without the same unexpected corruptions that are
sure to follow, unlikely to show up in testing, and difficult to bisect.
CC: Philipp Reisner <philipp.reisner@...bit.com>
CC: Lars Ellenberg <lars.ellenberg@...bit.com>
CC: "Christoph Böhmwalder" <christoph.boehmwalder@...bit.com>
CC: Jens Axboe <axboe@...nel.dk>
CC: Josef Bacik <josef@...icpanda.com>
CC: Keith Busch <kbusch@...nel.org>
CC: Christoph Hellwig <hch@....de>
CC: Sagi Grimberg <sagi@...mberg.me>
CC: Lee Duncan <lduncan@...e.com>
CC: Chris Leech <cleech@...hat.com>
CC: Mike Christie <michael.christie@...cle.com>
CC: "James E.J. Bottomley" <jejb@...ux.ibm.com>
CC: "Martin K. Petersen" <martin.petersen@...cle.com>
CC: Valentina Manea <valentina.manea.m@...il.com>
CC: Shuah Khan <shuah@...nel.org>
CC: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
CC: David Howells <dhowells@...hat.com>
CC: Marc Dionne <marc.dionne@...istor.com>
CC: Steve French <sfrench@...ba.org>
CC: Christine Caulfield <ccaulfie@...hat.com>
CC: David Teigland <teigland@...hat.com>
CC: Mark Fasheh <mark@...heh.com>
CC: Joel Becker <jlbec@...lplan.org>
CC: Joseph Qi <joseph.qi@...ux.alibaba.com>
CC: Eric Van Hensbergen <ericvh@...il.com>
CC: Latchesar Ionkov <lucho@...kov.net>
CC: Dominique Martinet <asmadeus@...ewreck.org>
CC: "David S. Miller" <davem@...emloft.net>
CC: Eric Dumazet <edumazet@...gle.com>
CC: Jakub Kicinski <kuba@...nel.org>
CC: Paolo Abeni <pabeni@...hat.com>
CC: Ilya Dryomov <idryomov@...il.com>
CC: Xiubo Li <xiubli@...hat.com>
CC: Chuck Lever <chuck.lever@...cle.com>
CC: Jeff Layton <jlayton@...nel.org>
CC: Trond Myklebust <trond.myklebust@...merspace.com>
CC: Anna Schumaker <anna@...nel.org>
CC: Steffen Klassert <steffen.klassert@...unet.com>
CC: Herbert Xu <herbert@...dor.apana.org.au>
CC: netdev@...r.kernel.org
Suggested-by: Guillaume Nault <gnault@...hat.com>
Signed-off-by: Benjamin Coddington <bcodding@...hat.com>
---
drivers/block/drbd/drbd_receiver.c | 3 +++
drivers/block/nbd.c | 1 +
drivers/nvme/host/tcp.c | 1 +
drivers/scsi/iscsi_tcp.c | 1 +
drivers/usb/usbip/usbip_common.c | 1 +
fs/afs/rxrpc.c | 1 +
fs/cifs/connect.c | 1 +
fs/dlm/lowcomms.c | 2 ++
fs/ocfs2/cluster/tcp.c | 1 +
net/9p/trans_fd.c | 1 +
net/ceph/messenger.c | 1 +
net/sunrpc/xprtsock.c | 3 +++
net/xfrm/espintcp.c | 1 +
13 files changed, 18 insertions(+)
diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c
index ee69d50ba4fd..0d3f910ae347 100644
--- a/drivers/block/drbd/drbd_receiver.c
+++ b/drivers/block/drbd/drbd_receiver.c
@@ -1030,6 +1030,9 @@ static int conn_connect(struct drbd_connection *connection)
sock.socket->sk->sk_allocation = GFP_NOIO;
msock.socket->sk->sk_allocation = GFP_NOIO;
+ sock.socket->sk->sk_use_task_frag = false;
+ msock.socket->sk->sk_use_task_frag = false;
+
sock.socket->sk->sk_priority = TC_PRIO_INTERACTIVE_BULK;
msock.socket->sk->sk_priority = TC_PRIO_INTERACTIVE;
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 5cffd96ef2d7..3a46b776354d 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -512,6 +512,7 @@ static int sock_xmit(struct nbd_device *nbd, int index, int send,
noreclaim_flag = memalloc_noreclaim_save();
do {
sock->sk->sk_allocation = GFP_NOIO | __GFP_MEMALLOC;
+ sock->sk->sk_use_task_frag = false;
msg.msg_name = NULL;
msg.msg_namelen = 0;
msg.msg_control = NULL;
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 9b47dcb2a7d9..fe772d6c4c96 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1537,6 +1537,7 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl, int qid)
queue->sock->sk->sk_rcvtimeo = 10 * HZ;
queue->sock->sk->sk_allocation = GFP_ATOMIC;
+ queue->sock->sk->sk_use_task_frag = false;
nvme_tcp_set_queue_io_cpu(queue);
queue->request = NULL;
queue->data_remaining = 0;
diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c
index 5fb1f364e815..1d1cf641937c 100644
--- a/drivers/scsi/iscsi_tcp.c
+++ b/drivers/scsi/iscsi_tcp.c
@@ -738,6 +738,7 @@ iscsi_sw_tcp_conn_bind(struct iscsi_cls_session *cls_session,
sk->sk_reuse = SK_CAN_REUSE;
sk->sk_sndtimeo = 15 * HZ; /* FIXME: make it configurable */
sk->sk_allocation = GFP_ATOMIC;
+ sk->sk_use_task_frag = false;
sk_set_memalloc(sk);
sock_no_linger(sk);
diff --git a/drivers/usb/usbip/usbip_common.c b/drivers/usb/usbip/usbip_common.c
index 053a2bca4c47..e15ae6ca95ea 100644
--- a/drivers/usb/usbip/usbip_common.c
+++ b/drivers/usb/usbip/usbip_common.c
@@ -315,6 +315,7 @@ int usbip_recv(struct socket *sock, void *buf, int size)
do {
sock->sk->sk_allocation = GFP_NOIO;
+ sock->sk->sk_use_task_frag = false;
result = sock_recvmsg(sock, &msg, MSG_WAITALL);
if (result <= 0)
diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c
index eccc3cd0cb70..ac75ad18db83 100644
--- a/fs/afs/rxrpc.c
+++ b/fs/afs/rxrpc.c
@@ -46,6 +46,7 @@ int afs_open_socket(struct afs_net *net)
goto error_1;
socket->sk->sk_allocation = GFP_NOFS;
+ socket->sk->sk_use_task_frag = false;
/* bind the callback manager's address to make this a server socket */
memset(&srx, 0, sizeof(srx));
diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index 9db9527c61cf..d84f1660cacb 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -2944,6 +2944,7 @@ generic_ip_connect(struct TCP_Server_Info *server)
cifs_dbg(FYI, "Socket created\n");
server->ssocket = socket;
socket->sk->sk_allocation = GFP_NOFS;
+ socket->sk->sk_use_task_frag = false;
if (sfamily == AF_INET6)
cifs_reclassify_socket6(socket);
else
diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
index 59f64c596233..120be782edbc 100644
--- a/fs/dlm/lowcomms.c
+++ b/fs/dlm/lowcomms.c
@@ -699,6 +699,7 @@ static void add_listen_sock(struct socket *sock, struct listen_connection *con)
sk->sk_user_data = con;
sk->sk_allocation = GFP_NOFS;
+ sk->sk_use_task_frag = false;
/* Install a data_ready callback */
sk->sk_data_ready = lowcomms_listen_data_ready;
release_sock(sk);
@@ -718,6 +719,7 @@ static void add_sock(struct socket *sock, struct connection *con)
sk->sk_write_space = lowcomms_write_space;
sk->sk_state_change = lowcomms_state_change;
sk->sk_allocation = GFP_NOFS;
+ sk->sk_use_task_frag = false;
sk->sk_error_report = lowcomms_error_report;
release_sock(sk);
}
diff --git a/fs/ocfs2/cluster/tcp.c b/fs/ocfs2/cluster/tcp.c
index f660c0dbdb63..3eaafa5e5ec4 100644
--- a/fs/ocfs2/cluster/tcp.c
+++ b/fs/ocfs2/cluster/tcp.c
@@ -1604,6 +1604,7 @@ static void o2net_start_connect(struct work_struct *work)
sc->sc_sock = sock; /* freed by sc_kref_release */
sock->sk->sk_allocation = GFP_ATOMIC;
+ sock->sk->sk_use_task_frag = false;
myaddr.sin_family = AF_INET;
myaddr.sin_addr.s_addr = mynode->nd_ipv4_address;
diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index 07db2f436d44..d9120f14684b 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -868,6 +868,7 @@ static int p9_socket_open(struct p9_client *client, struct socket *csocket)
}
csocket->sk->sk_allocation = GFP_NOIO;
+ csocket->sk->sk_use_task_frag = false;
file = sock_alloc_file(csocket, 0, NULL);
if (IS_ERR(file)) {
pr_err("%s (%d): failed to map fd\n",
diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
index dfa237fbd5a3..1d06e114ba3f 100644
--- a/net/ceph/messenger.c
+++ b/net/ceph/messenger.c
@@ -446,6 +446,7 @@ int ceph_tcp_connect(struct ceph_connection *con)
if (ret)
return ret;
sock->sk->sk_allocation = GFP_NOFS;
+ sock->sk->sk_use_task_frag = false;
#ifdef CONFIG_LOCKDEP
lockdep_set_class(&sock->sk->sk_lock, &socket_class);
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 915b9902f673..41ffc2169743 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -1882,6 +1882,7 @@ static int xs_local_finish_connecting(struct rpc_xprt *xprt,
sk->sk_write_space = xs_udp_write_space;
sk->sk_state_change = xs_local_state_change;
sk->sk_error_report = xs_error_report;
+ sk->sk_use_task_frag = false;
xprt_clear_connected(xprt);
@@ -2082,6 +2083,7 @@ static void xs_udp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
sk->sk_user_data = xprt;
sk->sk_data_ready = xs_data_ready;
sk->sk_write_space = xs_udp_write_space;
+ sk->sk_use_task_frag = false;
xprt_set_connected(xprt);
@@ -2249,6 +2251,7 @@ static int xs_tcp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
sk->sk_state_change = xs_tcp_state_change;
sk->sk_write_space = xs_tcp_write_space;
sk->sk_error_report = xs_error_report;
+ sk->sk_use_task_frag = false;
/* socket options */
sock_reset_flag(sk, SOCK_LINGER);
diff --git a/net/xfrm/espintcp.c b/net/xfrm/espintcp.c
index 29a540dcb5a7..4ca2c5927ace 100644
--- a/net/xfrm/espintcp.c
+++ b/net/xfrm/espintcp.c
@@ -489,6 +489,7 @@ static int espintcp_init_sk(struct sock *sk)
/* avoid using task_frag */
sk->sk_allocation = GFP_ATOMIC;
+ sk->sk_use_task_frag = false;
return 0;
--
2.31.1
Powered by blists - more mailing lists