[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID:
<PH7PR02MB10039A099EB6D53B3DD2EB9ABCBD32@PH7PR02MB10039.namprd02.prod.outlook.com>
Date: Mon, 1 Jul 2024 20:45:50 +0000
From: "Sean Tranchetti (QUIC)" <quic_stranche@...cinc.com>
To: "jmaloy@...hat.com" <jmaloy@...hat.com>,
"ying.xue@...driver.com"
<ying.xue@...driver.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>
CC: "Subash Abhinov Kasiviswanathan (QUIC)" <quic_subashab@...cinc.com>
Subject: RE: Permanent TIPC Broadcast Failure When a Member Joins
> Hi everyone,
>
> We've been trying to track down the cause of a strange TIPC failure over the last few weeks and have gotten stuck. Hoping that opening this up to a wider audience might help a bit.
>
> The scenario is as follows. We have two programs, a sever and a client, that use TIPC group communication to exchange messages.
> - Server opens a TIPC socket and creates the group via setsockopt()
> - Once every second, the server performs a group broadcast via send()
> - Client opens a TIPC socket and joins the group via setsockopt()
> - Client then waits for messages to arrive via recv()
>
> What we see happening is that as soon as the client joins the TIPC group, the server's broadcast will fail from then on (either blocking on the send() or returning EAGAIN/EWOULDBLOCK if set to non-blocking).
>
> After adding several debug messages into the kernel code, we can see that the server-side send is failing because tipc_group_bc_cong() is always returning true, instructing the broadcast code to wait as the group is congested.
> If we compare logs taken on the same machine where these programs work (more on that later, though), we see that tipc_group_bc_cong() always returns false because of the following condition:
> if (list_empty(&grp->small_win))
> return false;
I was able to track down the actual race condition that is causing this behavior. In short, there is a race between TIPC socket binding and the socket joining a group as a member. Slightly updated scenario with better terminology and the complete flow which turned out to be VERY important.
1) Sender TIPC socket opens with socket(), binds it to a service with bind(), and creates the group via setsockopt()
2) Once a second, this sending socket performs a groupcast send() to all members
3) Receiver TIPC socket opens with socket(), binds it to the service via bind(), and joins the group via setsockopt()
4) Receiver socket then waits for messages to arrive via recv()
>From the kernel side, the race shows up between the bind() call and the setsockopt() call for the receiving socket.
As part of the bind() processing, the socket/port combo is added to the TIPC nametable. This addition results in an event being sent out via the TIPC topology server in the kernel. This event message arrives at the sender-side socket as a group member event and is processed. The sender-side socket will transmit a GRP_JOIN message to the receiver-side socket as a response.
[ T1382] tipc: tipc_nametbl_publish(): publishing sk [0x46228388 : 0x5660004]
[ T1382] tipc: tipc_nametbl_insert_publ(): inserting sk [0x46228388 : 0x5660004] into service 0x46228388
[ T1382] tipc: tipc_service_insert_publ(): publishing report for service type 0x46228388, sub lower 0x0
[ T1382] tipc: tipc_sub_send_event(): pushing event 0x6286e55c for lower 0x5660004, node 0x0
[ T1382] tipc: tipc_topsrv_queue_evt(): queuing work for conn 0x6286e3c0, evt 0x62af5504/0x6286e55c
[ T21] tipc: tipc_conn_send_work(): handling conn 0x6286e3c0
[ T21] tipc: tipc_conn_send_to_sock(): passing TOP_SRV event 0x62af5504 from conn 0x6286e3c0
[ T21] tipc: tipc_topsrv_kern_evt(): pushing TOP_SRV message 0x62af94b0 on skb 0x6305fa00 for port 0x757d95d5 from event 0x62af5504
[ T21] tipc: tipc_sk_filter_rcv(): processing RCV skb 0x6305fa00
[ T21] tipc: tipc_sk_filter_rcv(): handling tipc protocol message. sk 0x623b7680, skb 0x6305fa00
[ T21] tipc: tipc_sk_proto_rcv(): passing TOP_SRV tipc message 0x62af94b0 from skb 0x6305fa00 for group 0x6286e9c0
[ T21] tipc: tipc_group_member_evt(): handling group member event 0x62af94b0 for group type 0x46228388 ptr 0x6286e9c0, port 0xf826a8ad, and instance 0x5660004
[ T21] tipc: tipc_group_member_evt(): no member with port 0xf826a8ad, instance 0x5660004 yet for group 0x6286e9c0. creating 0x63112680 and posting GRP_JOIN
[ T21] tipc: tipc_group_update_member(): adding member instance 0x5660004 ptr 0x63112680 to group type 0x46228388 ptr 0x6286e9c0 small_win
[ T21] tipc: tipc_group_proto_xmit(): handling protocol tx for group type 0x46228388 ptr 0x6286e9c0, port 0x757d95d5
[ T21] tipc: tipc_group_proto_xmit(): member instance 0x5660004 adv adjustment for GRP_JOIN. ADV_IDLE: false, ADV_ACTIVE: false
[ T21] tipc: tipc_group_proto_xmit(): queuing msg 0x622048b0 on XMIT skb 0x6305fe00
Now the race can occur. In the good scenario, the receiver-side socket joins the TIPC group in the kernel via setsockopt()/tipc_sk_join(), creates its local group structure, processes the notification from the send-side socket and all is good. However, if the GRP_JOIN message from the sender-side socket arrives before this happens, we have a problem.
Because the tsk->group element is in the receive-side socket is still NULL in tipc_sk_proto_rcv(), tipc_group_proto_rcv() will drop this incoming message from the send-side socket as the receive-side socket does not yet belong to any group.
[ T21] tipc: tipc_sk_filter_rcv(): handling tipc group processing. sk 0x623b7680, skb 0x6305fa00
[ T21] tipc: tipc_sk_filter_rcv(): processing RCV skb 0x6305fe00
[ T21] tipc: tipc_sk_filter_rcv(): handling tipc protocol message. sk 0x623b6140, skb 0x6305fe00
[ T21] tipc: tipc_sk_proto_rcv(): passing GROUP_PROTOCOL tipc msg 0x622048b0 from skb 0x6305fe00
[ T21] tipc: tipc_group_proto_rcv(): no group for msg 0x622048b0
static void tipc_sk_proto_rcv(struct sock *sk,
struct sk_buff_head *inputq,
struct sk_buff_head *xmitq)
{
struct sk_buff *skb = __skb_dequeue(inputq);
struct tipc_sock *tsk = tipc_sk(sk);
struct tipc_msg *hdr = buf_msg(skb);
struct tipc_group *grp = tsk->group;
bool wakeup = false;
...
case GROUP_PROTOCOL:
tipc_group_proto_rcv(grp, &wakeup, hdr, inputq, xmitq);
void tipc_group_proto_rcv(struct tipc_group *grp, bool *usr_wakeup,
struct tipc_msg *hdr, struct sk_buff_head *inputq,
struct sk_buff_head *xmitq)
{
u32 node = msg_orignode(hdr);
u32 port = msg_origport(hdr);
struct tipc_member *m, *pm;
u16 remitted, in_flight;
if (!grp)
return;
The receive-side socket will then join the group via setsockopt()/tipc_sk_join(). It will broadcast the GRP_JOIN message to the send-side socket which is processed normally and receive the GRP_ADV response from it. As such, from the receive-side socket, everything looks fine.
[ T1382] tipc: tipc_sk_join(): group 0x6286e000 type 0x46228388 created for sk 0x623b6140 instance 0x5660004
[ T1382] tipc: tipc_sk_join(): publishing socket 0x623b6140 instance 0x5660004
[ T1382] tipc: tipc_sk_join(): joining group 0x6286e000 type 0x46228388 sk 0x623b6140 instance 0x5660004
[ T1382] tipc: tipc_group_proto_xmit(): handling protocol tx for group type 0x46228388 ptr 0x6286e000, port 0xf826a8ad
[ T1382] tipc: tipc_group_proto_xmit(): member instance 0x5660004 adv adjustment for GRP_JOIN. ADV_IDLE: false, ADV_ACTIVE: false
[ T1382] tipc: tipc_group_proto_xmit(): queuing msg 0x622048b0 on XMIT skb 0x6305fe00
[ T1382] tipc: tipc_group_update_member(): adding member instance 0x5660004 ptr 0x63112f00 to group type 0x46228388 ptr 0x6286e000 small_win
[ T1382] tipc: tipc_group_proto_xmit(): handling protocol tx for group type 0x46228388 ptr 0x6286e000, port 0xf826a8ad
[ T1382] tipc: tipc_group_proto_xmit(): member instance 0x5610004 adv adjustment for GRP_JOIN. ADV_IDLE: false, ADV_ACTIVE: false
[ T1382] tipc: tipc_group_proto_xmit(): queuing msg 0x62af94b0 on XMIT skb 0x6305fa00
[ T1382] tipc: tipc_group_update_member(): adding member instance 0x5610004 ptr 0x63112700 to group type 0x46228388 ptr 0x6286e000 small_win
...
[ T1382] tipc: tipc_sk_filter_rcv(): handling tipc protocol message. sk 0x623b7680, skb 0x6305fa00
[ T1382] tipc: tipc_sk_proto_rcv(): passing GROUP_PROTOCOL tipc msg 0x62af94b0 from skb 0x6305fa00
[ T1382] tipc: tipc_group_proto_rcv(): handling group message 0x62af94b0 for group type 0x46228388 ptr 0x6286e9c0 and port 0xf826a8ad
[ T1382] tipc: tipc_group_proto_rcv(): GRP_JOIN event for group type 0x46228388, port 0xf826a8ad, instance 0x5660004, ptr 0x63112680
[ T1382] tipc: tipc_group_proto_rcv(): setting member instance 0x5660004 to state MBR_JOINED
[ T1382] tipc: tipc_group_open(): removing member instance 0x5660004 ptr 0x63112680 from grp->small_win
[ T1382] tipc: tipc_group_update_member(): adding member instance 0x5660004 ptr 0x63112680 to group type 0x46228388 ptr 0x6286e9c0 small_win
[ T1382] tipc: tipc_group_proto_xmit(): handling protocol tx for group type 0x46228388 ptr 0x6286e9c0, port 0x757d95d5
[ T1382] tipc: tipc_group_proto_xmit(): member instance 0x5660004 in MBR_JOINED/PENDING state for adv adjustment
[ T1382] tipc: tipc_group_proto_xmit(): member instance 0x5660004 adv adjustment for GRP_ADV. ADV_IDLE: true, ADV_ACTIVE: false
[ T1382] tipc: tipc_group_proto_xmit(): queuing msg 0x628770b0 on XMIT skb 0x6305fc00
...
[ T1382] tipc: tipc_sk_filter_rcv(): handling tipc protocol message. sk 0x623b6140, skb 0x6305fc00
[ T1382] tipc: tipc_sk_proto_rcv(): passing GROUP_PROTOCOL tipc msg 0x628770b0 from skb 0x6305fc00
[ T1382] tipc: tipc_group_proto_rcv(): handling group message 0x628770b0 for group type 0x46228388 ptr 0x6286e000 and port 0x757d95d5
[ T1382] tipc: tipc_group_proto_rcv(): GRP_ADV for member instance 0x5610004
[ T1382] tipc: tipc_group_open(): removing member instance 0x5610004 ptr 0x63112700 from grp->small_win
However, the send-side socket is stuck in a bad state. Because it never received a GRP_ADV message from the receive-side socket as its GRP_JOIN message was dropped, the sender-side socket still sees the receiver-side socket as having a small window, and thus is it is still present on the grp->small_win list it maintains. As such, any groupcast messages made by the send socket will always fail, despite the receiver-side socket having joined the group properly.
[ T1380] tipc: tipc_group_bc_cong(): member instance 0x5660004 ptr 0x63112680 is marked as having a small window
...
[ T1380] tipc: tipc_group_bc_cong(): member instance 0x5660004 ptr 0x63112680 is marked as having a small window
Thanks,
Sean
Powered by blists - more mailing lists