[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1288039106.3296.4.camel@edumazet-laptop>
Date: Mon, 25 Oct 2010 22:38:26 +0200
From: Eric Dumazet <eric.dumazet@...il.com>
To: nhorman@...driver.com
Cc: netdev@...r.kernel.org, davem@...emloft.net, jpirko@...hat.com
Subject: Re: [PATCH] Enhance AF_PACKET implementation to not require high
order contiguous memory allocation
Le lundi 25 octobre 2010 à 15:06 -0400, nhorman@...driver.com a écrit :
> From: Neil Horman <nhorman@...driver.com>
>
> It was shown to me recently that systems under high load were driven very deep
> into swap when tcpdump was run. The reason this happened was because the
> AF_PACKET protocol has a SET_RINGBUFFER socket option that allows the user space
> application to specify how many entries an AF_PACKET socket will have and how
> large each entry will be. It seems the default setting for tcpdump is to set
> the ring buffer to 32 entries of 64 Kb each, which implies 32 order 5
> allocation. Thats difficult under good circumstances, and horrid under memory
> pressure.
>
> I thought it would be good to make that a bit more usable. I was going to do a
> simple conversion of the ring buffer from contigous pages to iovecs, but
> unfortunately, the metadata which AF_PACKET places in these buffers can easily
> span a page boundary, and given that these buffers get mapped into user space,
> and the data layout doesn't easily allow for a change to padding between frames
> to avoid that, a simple iovec change is just going to break user space ABI
> consistency.
>
> So instead I've done this. This patch does the aforementioned change,
> allocating an array of pages instead of one contiguous chunk, and then vmaps the
> array into a contiguous memory space, so that it can still be accessed in the
> same way it was before. This allows for a consisten user and kernel space
> behavior for memory mapped AF_PACKET sockets, which at the same time relieving
> the memory pressure placed on a system when tcpdump defaults are used.
>
> Tested successfully by me.
>
> Signed-off-by: Neil Horman <nhorman@...driver.com>
> ---
Strange because last time I took a look at this stuff, libpcap was doing
several tries, reducing page orders until it got no allocation
failures...
(It tries to get high order pages, maybe to reduce TLB pressure...)
I remember adding __GFP_NOWARN to avoid a kernel message, while tcpdump
was actually working...
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists