netdev - Re: [RFC] packet: handle too big packets for PACKET

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1408102630.8789.6.camel@localhost>
Date:	Fri, 15 Aug 2014 13:37:10 +0200
From:	Hannes Frederic Sowa <hannes@...essinduktion.org>
To:	Guy Harris <guy@...m.mit.edu>
Cc:	Eric Dumazet <eric.dumazet@...il.com>,
	Daniel Borkmann <dborkman@...hat.com>,
	Neil Horman <nhorman@...driver.com>,
	Jesper Dangaard Brouer <brouer@...hat.com>,
	David Miller <davem@...emloft.net>,
	netdev <netdev@...r.kernel.org>
Subject: Re: [RFC] packet: handle too big packets for PACKET_V3

Hi,

On Do, 2014-08-14 at 21:54 -0700, Guy Harris wrote:
> On Aug 14, 2014, at 6:04 PM, Hannes Frederic Sowa <hannes@...essinduktion.org> wrote:
> 
> > On Fri, Aug 15, 2014, at 02:54, Eric Dumazet wrote:
> >> On Fri, 2014-08-15 at 02:43 +0200, Hannes Frederic Sowa wrote:
> >> 
> >>> Someone could use GRO to create packet trains to hide from intrustion
> >>> detection systems, which maybe are the main user of TPACKET_V3. I don't
> >>> think this is a good idea.
> >> 
> >> Presumably these tools already use a large enough bloc_size, and not a
> >> 4KB one ;)
> >> 
> >> Even without GRO, a jumbo frame (9K) can trigger the bug.
> > 
> > Sure, but if I would have written such a tool without knowledge of GRO I
> > would have queried at least the MTU. ;)
> 
> ...and then queried the maximum size of the headers that precede the link-layer payload.

But those are mostly constant, no?

> Except that you *can't* do that, and it can be variable-length without an obvious maximum (think 802.11 in monitor mode, where you have radiotap headers).  This causes much pain for libpcap when using TPACKET_V1 and TPACKET_V2, forcing it to allocate huge blocks when smaller ones might be sufficient.
> > If an IDS allocates block_sizes below the MTU there is not much we can
> > do.
> 
> If an IDS uses libpcap, it will get libpcap's behavior, which, for TPACKET_V3 is, roughly
> 
> 	req.tp_frame_size = MAXIMUM_SNAPLEN;
> 	req.tp_frame_nr = handle->opt.buffer_size/req.tp_frame_size;
> 
> 	/* compute the minumum block size that will handle this frame. 
> 	 * The block has to be page size aligned. 
> 	 * The max block size allowed by the kernel is arch-dependent and 
> 	 * it's not explicitly checked here. */
> 	req.tp_block_size = getpagesize();
> 	while (req.tp_block_size < req.tp_frame_size) 
> 		req.tp_block_size <<= 1;
> 
> 	frames_per_block = req.tp_block_size/req.tp_frame_size;
> 
> 	req.tp_block_nr = req.tp_frame_nr / frames_per_block;
> 
> 	/* req.tp_frame_nr is requested to match frames_per_block*req.tp_block_nr */
> 	req.tp_frame_nr = req.tp_block_nr * frames_per_block;
> 
> (the last two C statements are actually part of a loop, where it'll reduce req.tp_frame_nr if it gets told "I don't have enough room for that big a ring").
> 
> MAXIMUM_SNAPLEN is 65535 in older versions of libpcap and 262144 in newer versions; it's the maximum frame size.
> 
> handle->opt.buffer_size is the buffer size requested by the application; it defaults to 2 MiB.
> 
> That calculation, with the default values for the latest version of libpcap, ends up with:
> 
> 	req.tp_frame_size = 262144;
> 	req.tp_frame_nr = /* 2097152/262144 */ 8;
> 
> 	/* compute the minumum block size that will handle this frame. 
> 	 * The block has to be page size aligned. 
> 	 * The max block size allowed by the kernel is arch-dependent and 
> 	 * it's not explicitly checked here. */
> 	req.tp_block_size = 4096;	/* IA-32 and x86-64, and probably many others */
> 	while (req.tp_block_size < req.tp_frame_size) 
> 		req.tp_block_size <<= 1;
> 	/* ends up with req.tp_block_size = 262144 */
> 
> 	frames_per_block = /* 262144/262144 */ 1;
> 
> 	req.tp_block_nr = /* 8 / 1 */ 8;
> 
> 	/* req.tp_frame_nr is requested to match frames_per_block*req.tp_block_nr */
> 	req.tp_frame_nr = /* 8 * 1 / 8;
> 
> which I think means "8 256 KiB blocks".
> 
> > But up to the MTU we should let GRO behave transparently and here we
> > violate this. There are also interfaces which extremely large MTUs but
> > at least they report the MTU size correctly to user space.
> 
> Reporting the MTU to user space is insufficient for libpcap; it needs to know the *maximum packet size*, which includes link-layer headers (which I guess it could compute based on the ARPHRD_ for the interface, although for 802.11 that may be subject to change, e.g. the maximum link-layer header length grew when the QoS stuff was added) *and* metadata headers (such as radiotap, for which there really is no generic maximum length other than the 65535 byte limit imposed by the header length field being 16 bits, but a given driver can presumably return its maximum).
> 
> It also doesn't help with segmentation/reassembly offloading, where the MTU is decoupled from the maximum packet size.

Thanks, that answered the question above. Still, I think that the most
commonly use-cases would get covered by querying the MTU. libpcap needs
to be much more general and so must pay more attention.

Also, IMHO GRO is something special and does not fit that model very
well. But I am also fine with clamping. If people complain, we can
revisit that topic.

Thanks,
Hannes


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html