lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9ef28d50-dad0-4dc6-8a6d-b3f82521fba1@redhat.com>
Date: Mon, 24 Feb 2025 11:09:17 +0100
From: Paolo Abeni <pabeni@...hat.com>
To: Krister Johansen <kjlx@...pleofstupid.com>,
 Matthieu Baerts <matttbe@...nel.org>, Mat Martineau <martineau@...nel.org>
Cc: Geliang Tang <geliang@...nel.org>, "David S. Miller"
 <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
 Jakub Kicinski <kuba@...nel.org>, Simon Horman <horms@...nel.org>,
 netdev@...r.kernel.org, mptcp@...ts.linux.dev
Subject: Re: [PATCH mptcp] mptcp: fix 'scheduling while atomic' in
 mptcp_pm_nl_append_new_local_addr

Hi,

On 2/21/25 11:21 PM, Krister Johansen wrote:
> If multiple connection requests attempt to create an implicit mptcp
> endpoint in parallel, more than one caller may end up in
> mptcp_pm_nl_append_new_local_addr because none found the address in
> local_addr_list during their call to mptcp_pm_nl_get_local_id.  In this
> case, the concurrent new_local_addr calls may delete the address entry
> created by the previous caller.  These deletes use synchronize_rcu, but
> this is not permitted in some of the contexts where this function may be
> called.  During packet recv, the caller may be in a rcu read critical
> section and have preemption disabled.
> 
> An example stack:
> 
>    BUG: scheduling while atomic: swapper/2/0/0x00000302
> 
>    Call Trace:
>    <IRQ>
>    dump_stack_lvl+0x76/0xa0
>    dump_stack+0x10/0x20
>    __schedule_bug+0x64/0x80
>    schedule_debug.constprop.0+0xdb/0x130
>    __schedule+0x69/0x6a0
>    schedule+0x33/0x110
>    schedule_timeout+0x157/0x170
>    wait_for_completion+0x88/0x150
>    __wait_rcu_gp+0x150/0x160
>    synchronize_rcu+0x12d/0x140
>    mptcp_pm_nl_append_new_local_addr+0x1bd/0x280
>    mptcp_pm_nl_get_local_id+0x121/0x160
>    mptcp_pm_get_local_id+0x9d/0xe0
>    subflow_check_req+0x1a8/0x460
>    subflow_v4_route_req+0xb5/0x110
>    tcp_conn_request+0x3a4/0xd00
>    subflow_v4_conn_request+0x42/0xa0
>    tcp_rcv_state_process+0x1e3/0x7e0
>    tcp_v4_do_rcv+0xd3/0x2a0
>    tcp_v4_rcv+0xbb8/0xbf0
>    ip_protocol_deliver_rcu+0x3c/0x210
>    ip_local_deliver_finish+0x77/0xa0
>    ip_local_deliver+0x6e/0x120
>    ip_sublist_rcv_finish+0x6f/0x80
>    ip_sublist_rcv+0x178/0x230
>    ip_list_rcv+0x102/0x140
>    __netif_receive_skb_list_core+0x22d/0x250
>    netif_receive_skb_list_internal+0x1a3/0x2d0
>    napi_complete_done+0x74/0x1c0
>    igb_poll+0x6c/0xe0 [igb]
>    __napi_poll+0x30/0x200
>    net_rx_action+0x181/0x2e0
>    handle_softirqs+0xd8/0x340
>    __irq_exit_rcu+0xd9/0x100
>    irq_exit_rcu+0xe/0x20
>    common_interrupt+0xa4/0xb0
>    </IRQ>
> 
> This problem seems particularly prevalent if the user advertises an
> endpoint that has a different external vs internal address.  In the case
> where the external address is advertised and multiple connections
> already exist, multiple subflow SYNs arrive in parallel which tends to
> trigger the race during creation of the first local_addr_list entries
> which have the internal address instead.
> 
> Fix this problem by switching mptcp_pm_nl_append_new_local_addr to use
> call_rcu .  As part of plumbing this up, make
> __mptcp_pm_release_addr_entry take a rcu_head which is used by all
> callers regardless of cleanup method.
> 
> Cc: stable@...r.kernel.org
> Fixes: d045b9eb95a9 ("mptcp: introduce implicit endpoints")
> Signed-off-by: Krister Johansen <kjlx@...pleofstupid.com>

The proposed patch looks functionally correct to me, but I think it
would be better to avoid adding new fields to mptcp_pm_addr_entry, if
not strictly needed.

What about the following? (completely untested!). When inplicit
endpoints creations race one with each other, we don't need to replace
the existing one, we could simply use it.

That would additionally prevent an implicit endpoint created from a
subflow from overriding the flags set by a racing user-space endpoint add.

If that works/fits you feel free to take/use it.
---
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index 572d160edca3..dcb27b479824 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -977,7 +977,7 @@ static void __mptcp_pm_release_addr_entry(struct
mptcp_pm_addr_entry *entry)

 static int mptcp_pm_nl_append_new_local_addr(struct pm_nl_pernet *pernet,
 					     struct mptcp_pm_addr_entry *entry,
-					     bool needs_id)
+					     bool needs_id, bool replace)
 {
 	struct mptcp_pm_addr_entry *cur, *del_entry = NULL;
 	unsigned int addr_max;
@@ -1017,6 +1017,12 @@ static int
mptcp_pm_nl_append_new_local_addr(struct pm_nl_pernet *pernet,
 			if (entry->addr.id)
 				goto out;

+			if (!replace) {
+				kfree(entry);
+				ret = cur->addr.id;
+				goto out;
+			}
+
 			pernet->addrs--;
 			entry->addr.id = cur->addr.id;
 			list_del_rcu(&cur->list);
@@ -1165,7 +1171,7 @@ int mptcp_pm_nl_get_local_id(struct mptcp_sock
*msk, struct mptcp_addr_info *skc
 	entry->ifindex = 0;
 	entry->flags = MPTCP_PM_ADDR_FLAG_IMPLICIT;
 	entry->lsk = NULL;
-	ret = mptcp_pm_nl_append_new_local_addr(pernet, entry, true);
+	ret = mptcp_pm_nl_append_new_local_addr(pernet, entry, true, false);
 	if (ret < 0)
 		kfree(entry);

@@ -1433,7 +1439,8 @@ int mptcp_pm_nl_add_addr_doit(struct sk_buff *skb,
struct genl_info *info)
 		}
 	}
 	ret = mptcp_pm_nl_append_new_local_addr(pernet, entry,
-						!mptcp_pm_has_addr_attr_id(attr, info));
+						!mptcp_pm_has_addr_attr_id(attr, info),
+						true);
 	if (ret < 0) {
 		GENL_SET_ERR_MSG_FMT(info, "too many addresses or duplicate one: %d",
ret);
 		goto out_free;



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ