lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1285060993.2617.163.camel@edumazet-laptop>
Date:	Tue, 21 Sep 2010 11:23:13 +0200
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Amit Salecha <amit.salecha@...gic.com>
Cc:	David Miller <davem@...emloft.net>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	Ameen Rahman <ameen.rahman@...gic.com>,
	Anirban Chakraborty <anirban.chakraborty@...gic.com>
Subject: RE: [PATCH] qlcnic: dont assume NET_IP_ALIGN is 2

Le mardi 21 septembre 2010 à 03:41 -0500, Amit Salecha a écrit :

> > Amit, if you believe this is a problem, you should address it for all
> > NICS, not only qlcnic.
> > 
> > Qlcnic was lying to stack, because it consumed 2Kbytes blocs and
> > pretended they were consuming skb->len bytes.
> > (assuming MTU=1500, problem is worse if MTU is bigger)
> > 
> > So in order to improve "throughput", you were allowing for memory
> > exhaust and freeze of the _machine_ ?
> >
> This won't lead to such problem. truesize is used for accounting only.

You must have big machines in your lab and never hit OOM ?

You really should take a look on various files in net/core, net/ipv4
trees. And files like "/proc/sys/net/tcp_mem", "/proc/sys/net/udp_mem"

In fact, truesize is _underestimated_ : (we dont account for struct
skb_shared_info) and kmalloc() rounding

We probably should use this patch (without having to check all possible
net drivers !)

Problem is this would slow down alloc_skb(), so this patch is not for
inclusion.

cheap alternative would be to use 

size + sizeof(struct sk_buff) + SKB_DATA_ALIGN(sizeof(struct skb_shared_info))

If you think about it, when 128bit arches come, truesize will grow anyway.
If some tuning is needed in our stack, we'll do it.

(socket api SO_RCVBUF/ SO_SNDBUF is the problem, because
 applications are not aware of packetization or kernel internals)

SOCK_MIN_RCVBUF is way too small, since sizeof(struct sk_buff) 
is already close to 256. I guess we cannot even receive a single frame.

 include/net/sock.h |    2 +-
 net/core/skbuff.c  |    2 +-
 net/core/sock.c    |    8 ++++----
 3 files changed, 6 insertions(+), 6 deletions(-)


diff --git a/include/net/sock.h b/include/net/sock.h
index 8ae97c4..348fc9e 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1558,7 +1558,7 @@ static inline void sk_wake_async(struct sock *sk, int how, int band)
 }
 
 #define SOCK_MIN_SNDBUF 2048
-#define SOCK_MIN_RCVBUF 256
+#define SOCK_MIN_RCVBUF 1024
 
 static inline void sk_stream_moderate_sndbuf(struct sock *sk)
 {
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 752c197..5ab2e8e 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -196,7 +196,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
 	 * the tail pointer in struct sk_buff!
 	 */
 	memset(skb, 0, offsetof(struct sk_buff, tail));
-	skb->truesize = size + sizeof(struct sk_buff);
+	skb->truesize = ksize(data) + sizeof(struct sk_buff);
 	atomic_set(&skb->users, 1);
 	skb->head = data;
 	skb->data = data;
diff --git a/net/core/sock.c b/net/core/sock.c
index f3a06c4..803e041 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -535,10 +535,10 @@ int sock_setsockopt(struct socket *sock, int level, int optname,
 			val = sysctl_wmem_max;
 set_sndbuf:
 		sk->sk_userlocks |= SOCK_SNDBUF_LOCK;
-		if ((val * 2) < SOCK_MIN_SNDBUF)
+		if ((val * 4) < SOCK_MIN_SNDBUF)
 			sk->sk_sndbuf = SOCK_MIN_SNDBUF;
 		else
-			sk->sk_sndbuf = val * 2;
+			sk->sk_sndbuf = val * 4;
 
 		/*
 		 *	Wake up sending tasks if we
@@ -579,10 +579,10 @@ set_rcvbuf:
 		 * returning the value we actually used in getsockopt
 		 * is the most desirable behavior.
 		 */
-		if ((val * 2) < SOCK_MIN_RCVBUF)
+		if ((val * 4) < SOCK_MIN_RCVBUF)
 			sk->sk_rcvbuf = SOCK_MIN_RCVBUF;
 		else
-			sk->sk_rcvbuf = val * 2;
+			sk->sk_rcvbuf = val * 4;
 		break;
 
 	case SO_RCVBUFFORCE:


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ