[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20150909.214349.1837841017864345220.davem@davemloft.net>
Date: Wed, 09 Sep 2015 21:43:49 -0700 (PDT)
From: David Miller <davem@...emloft.net>
To: daniel@...earbox.net
Cc: chamaken@...il.com, tgraf@...g.ch, fw@...len.de, kaber@...sh.net,
netdev@...r.kernel.org
Subject: Re: [PATCH net] netlink, mmap: fix edge-case leakages in nf queue
zero-copy
From: Daniel Borkmann <daniel@...earbox.net>
Date: Thu, 10 Sep 2015 02:10:57 +0200
> When netlink mmap on receive side is the consumer of nf queue data,
> it can happen that in some edge cases, we write skb shared info into
> the user space mmap buffer:
>
> Assume a possible rx ring frame size of only 4096, and the network skb,
> which is being zero-copied into the netlink skb, contains page frags
> with an overall skb->len larger than the linear part of the netlink
> skb.
>
> skb_zerocopy(), which is generic and thus not aware of the fact that
> shared info cannot be accessed for such skbs then tries to write and
> fill frags, thus leaking kernel data/pointers and in some corner cases
> possibly writing out of bounds of the mmap area (when filling the
> last slot in the ring buffer this way).
>
> I.e. the ring buffer slot is then of status NL_MMAP_STATUS_VALID, has
> an advertised length larger than 4096, where the linear part is visible
> at the slot beginning, and the leaked sizeof(struct skb_shared_info)
> has been written to the beginning of the next slot (also corrupting
> the struct nl_mmap_hdr slot header incl. status etc), since skb->end
> points to skb->data + ring->frame_size - NL_MMAP_HDRLEN.
>
> The fix adds and lets __netlink_alloc_skb() take the actual needed
> linear room for the network skb + meta data into account. It's completely
> irrelevant for non-mmaped netlink sockets, but in case mmap sockets
> are used, it can be decided whether the available skb_tailroom() is
> really large enough for the buffer, or whether it needs to internally
> fallback to a normal alloc_skb().
>
> From nf queue side, the information whether the destination port is
> an mmap RX ring is not really available without extra port-to-socket
> lookup, thus it can only be determined in lower layers i.e. when
> __netlink_alloc_skb() is called that checks internally for this. I
> chose to add the extra ldiff parameter as mmap will then still work:
> We have data_len and hlen in nfqnl_build_packet_message(), data_len
> is the full length (capped at queue->copy_range) for skb_zerocopy()
> and hlen some possible part of data_len that needs to be copied; the
> rem_len variable indicates the needed remaining linear mmap space.
>
> The only other workaround in nf queue internally would be after
> allocation time by f.e. cap'ing the data_len to the skb_tailroom()
> iff we deal with an mmap skb, but that would 1) expose the fact that
> we use a mmap skb to upper layers, and 2) trim the skb where we
> otherwise could just have moved the full skb into the normal receive
> queue.
>
> After the patch, in my test case the ring slot doesn't fit and therefore
> shows NL_MMAP_STATUS_COPY, where a full skb carries all the data and
> thus needs to be picked up via recv().
>
> Fixes: 3ab1f683bf8b ("nfnetlink: add support for memory mapped netlink")
> Signed-off-by: Daniel Borkmann <daniel@...earbox.net>
Applied.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists