lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1016b317-d521-4787-80dc-3b92320f2d19@bytedance.com>
Date: Fri, 8 Nov 2024 10:28:22 -0800
From: Zijian Zhang <zijianzhang@...edance.com>
To: Cong Wang <xiyou.wangcong@...il.com>
Cc: bpf@...r.kernel.org, john.fastabend@...il.com, jakub@...udflare.com,
 davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
 pabeni@...hat.com, dsahern@...nel.org, netdev@...r.kernel.org,
 cong.wang@...edance.com
Subject: Re: [External] Re: [PATCH bpf 2/2] tcp_bpf: add sk_rmem_alloc related
 logic for ingress redirection


On 11/7/24 8:04 PM, Cong Wang wrote:
> On Thu, Oct 17, 2024 at 12:57:42AM +0000, zijianzhang@...edance.com wrote:
>> From: Zijian Zhang <zijianzhang@...edance.com>
>>
>> Although we sk_rmem_schedule and add sk_msg to the ingress_msg of sk_redir
>> in bpf_tcp_ingress, we do not update sk_rmem_alloc. As a result, except
>> for the global memory limit, the rmem of sk_redir is nearly unlimited.
>>
>> Thus, add sk_rmem_alloc related logic to limit the recv buffer.
>>
>> Signed-off-by: Zijian Zhang <zijianzhang@...edance.com>
>> ---
>>   include/linux/skmsg.h | 11 ++++++++---
>>   net/core/skmsg.c      |  6 +++++-
>>   net/ipv4/tcp_bpf.c    |  4 +++-
>>   3 files changed, 16 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
>> index d9b03e0746e7..2cbe0c22a32f 100644
>> --- a/include/linux/skmsg.h
>> +++ b/include/linux/skmsg.h
>> @@ -317,17 +317,22 @@ static inline void sock_drop(struct sock *sk, struct sk_buff *skb)
>>   	kfree_skb(skb);
>>   }
>>   
>> -static inline void sk_psock_queue_msg(struct sk_psock *psock,
>> +static inline bool sk_psock_queue_msg(struct sk_psock *psock,
>>   				      struct sk_msg *msg)
>>   {
>> +	bool ret;
>> +
>>   	spin_lock_bh(&psock->ingress_lock);
>> -	if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED))
>> +	if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED)) {
>>   		list_add_tail(&msg->list, &psock->ingress_msg);
>> -	else {
>> +		ret = true;
>> +	} else {
>>   		sk_msg_free(psock->sk, msg);
>>   		kfree(msg);
>> +		ret = false;
>>   	}
>>   	spin_unlock_bh(&psock->ingress_lock);
>> +	return ret;
>>   }
>>   
>>   static inline struct sk_msg *sk_psock_dequeue_msg(struct sk_psock *psock)
>> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
>> index b1dcbd3be89e..110ee0abcfe0 100644
>> --- a/net/core/skmsg.c
>> +++ b/net/core/skmsg.c
>> @@ -445,8 +445,10 @@ int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg,
>>   			if (likely(!peek)) {
>>   				sge->offset += copy;
>>   				sge->length -= copy;
>> -				if (!msg_rx->skb)
>> +				if (!msg_rx->skb) {
>>   					sk_mem_uncharge(sk, copy);
>> +					atomic_sub(copy, &sk->sk_rmem_alloc);
>> +				}
>>   				msg_rx->sg.size -= copy;
>>   
>>   				if (!sge->length) {
>> @@ -772,6 +774,8 @@ static void __sk_psock_purge_ingress_msg(struct sk_psock *psock)
>>   
>>   	list_for_each_entry_safe(msg, tmp, &psock->ingress_msg, list) {
>>   		list_del(&msg->list);
>> +		if (!msg->skb)
>> +			atomic_sub(msg->sg.size, &psock->sk->sk_rmem_alloc);
>>   		sk_msg_free(psock->sk, msg);
> 
> Why not calling this atomic_sub() in sk_msg_free_elem()?
> 
> Thanks.

sk_msg_free_elem called by sk_msg_free or sk_msg_free_no_charge will
be invoked in multiple locations including TX/RX/Error and etc.

We should call atomic_sub(&sk->sk_rmem_alloc) for sk_msgs that have
been atomic_add before. In other words, we need to call atomic_sub
only for sk_msgs in ingress_msg.

As for "!msg->skb" check here, I want to make sure the sk_msg is not
from function sk_psock_skb_ingress_enqueue, because these sk_msgs'
rmem accounting has already handled by skb_set_owner_r in function
sk_psock_skb_ingress.



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ