linux-kernel - Re: CONFIG_DEBUG_SLAB_LEAK omits size-4096 and larger?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20080613220422.GC14338@fieldses.org>
Date:	Fri, 13 Jun 2008 18:04:22 -0400
From:	"J. Bruce Fields" <bfields@...ldses.org>
To:	"Weathers, Norman R." <Norman.R.Weathers@...ocophillips.com>
Cc:	Jeff Layton <jlayton@...chiereds.net>,
	linux-kernel@...r.kernel.org, linux-nfs@...r.kernel.org,
	Neil Brown <neilb@...e.de>
Subject: Re: CONFIG_DEBUG_SLAB_LEAK omits size-4096 and larger?

On Fri, Jun 13, 2008 at 04:53:31PM -0500, Weathers, Norman R. wrote:
>  
> 
> > > The big one seems to be the __alloc_skb. (This is with 16 
> > threads, and
> > > it says that we are using up somewhere between 12 and 14 GB 
> > of memory,
> > > about 2 to 3 gig of that is disk cache).  If I were to put anymore
> > > threads out there, the server would become almost 
> > unresponsive (it was
> > > bad enough as it was).   
> > > 
> > > At the same time, I also noticed this:
> > > 
> > > skbuff_fclone_cache: 1842524 __alloc_skb+0x50/0x170
> > > 
> > > Don't know for sure if that is meaningful or not....
> > 
> > OK, so, starting at net/core/skbuff.c, this means that this memory was
> > allocated by __alloc_skb() calls with something nonzero in the third
> > ("fclone") argument.  The only such caller is alloc_skb_fclone().
> > Callers of alloc_skb_fclone() include:
> > 
> > 	sk_stream_alloc_skb:
> > 		do_tcp_sendpages
> > 		tcp_sendmsg
> > 		tcp_fragment
> > 		tso_fragment
> 
> Interesting you should mention the tso...  We recently went through and
> turned on TSO on all of our systems, trying it out to see if it helped
> with performance...  This could be something to do with that.  I can try
> disabling the tso on all of the servers and see if that helps with the
> memory.  Actually, I think I will, and I will monitor the situation.  I
> think it might help some, but I still think there may be something else
> going on in a deep corner...

I'll plead total ignorance about TSO, and it sounds like a long
shot--but sure, it'd be worth trying, thanks.

> 
> > 		tcp_mtu_probe
> > 	tcp_send_fin
> > 	tcp_connect
> > 	buf_acquire:
> > 		lots of callers in tipc code (whatever that is).
> > 
> > So unless you're using tipc, or you have something in userspace going
> > haywire (perhaps netstat would help rule that out?), then I suppose
> > there's something wrong with knfsd's tcp code.  Which makes sense, I
> > guess.
> > 
> 
> Not for sure what tipc is either....
> 
> > I'd think this sort of allocation would be limited by the number of
> > sockets times the size of the send and receive buffers.
> > svc_xprt.c:svc_check_conn_limits() claims to be limiting the number of
> > sockets to (nrthreads+3)*20.  (You aren't hitting the "too many open
> > connections" printk there, are you?)  The total buffer size should be
> > bounded by something like 4 megs.
> > 
> > --b.
> > 
> 
> Yes, we are getting a continuous stream of the too many open connections
> scrolling across our logs.  

That's interesting!  So we should probably look more closely at the
svc_check_conn_limits() behavior.  I wonder whether some pathological
behavior is triggered in the case where you're constantly over the limit
it's trying to enforce.

(Remind me how many active clients you have?)

> No problems.  I feel good if I exercised some deep corner of the code
> and found something that needed flushed out, that's what the experience
> is all about, isn't it?

Yep!

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/