netdev - Re: [PATCH bpf-next v3 03/11] xsk: add support to allow unaligned chunk placement

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3af74e26-8899-cf1e-6fd4-5ea0bd349fc3@mellanox.com>
Date:   Thu, 25 Jul 2019 09:27:04 +0000
From:   Maxim Mikityanskiy <maximmi@...lanox.com>
To:     Kevin Laatz <kevin.laatz@...el.com>
CC:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "ast@...nel.org" <ast@...nel.org>,
        "daniel@...earbox.net" <daniel@...earbox.net>,
        "bjorn.topel@...el.com" <bjorn.topel@...el.com>,
        "magnus.karlsson@...el.com" <magnus.karlsson@...el.com>,
        "jakub.kicinski@...ronome.com" <jakub.kicinski@...ronome.com>,
        "jonathan.lemon@...il.com" <jonathan.lemon@...il.com>,
        Saeed Mahameed <saeedm@...lanox.com>,
        "stephen@...workplumber.org" <stephen@...workplumber.org>,
        "bruce.richardson@...el.com" <bruce.richardson@...el.com>,
        "ciara.loftus@...el.com" <ciara.loftus@...el.com>,
        "bpf@...r.kernel.org" <bpf@...r.kernel.org>,
        "intel-wired-lan@...ts.osuosl.org" <intel-wired-lan@...ts.osuosl.org>
Subject: Re: [PATCH bpf-next v3 03/11] xsk: add support to allow unaligned
 chunk placement

On 2019-07-24 08:10, Kevin Laatz wrote:
> Currently, addresses are chunk size aligned. This means, we are very
> restricted in terms of where we can place chunk within the umem. For
> example, if we have a chunk size of 2k, then our chunks can only be placed
> at 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0).
> 
> This patch introduces the ability to use unaligned chunks. With these
> changes, we are no longer bound to having to place chunks at a 2k (or
> whatever your chunk size is) interval. Since we are no longer dealing with
> aligned chunks, they can now cross page boundaries. Checks for page
> contiguity have been added in order to keep track of which pages are
> followed by a physically contiguous page.
> 
> Signed-off-by: Kevin Laatz <kevin.laatz@...el.com>
> Signed-off-by: Ciara Loftus <ciara.loftus@...el.com>
> Signed-off-by: Bruce Richardson <bruce.richardson@...el.com>
> 
> ---
> v2:
>    - Add checks for the flags coming from userspace
>    - Fix how we get chunk_size in xsk_diag.c
>    - Add defines for masking the new descriptor format
>    - Modified the rx functions to use new descriptor format
>    - Modified the tx functions to use new descriptor format
> 
> v3:
>    - Add helper function to do address/offset masking/addition
> ---
>   include/net/xdp_sock.h      | 17 ++++++++
>   include/uapi/linux/if_xdp.h |  9 ++++
>   net/xdp/xdp_umem.c          | 18 +++++---
>   net/xdp/xsk.c               | 86 ++++++++++++++++++++++++++++++-------
>   net/xdp/xsk_diag.c          |  2 +-
>   net/xdp/xsk_queue.h         | 68 +++++++++++++++++++++++++----
>   6 files changed, 170 insertions(+), 30 deletions(-)
> 

<...>

> +/* If a buffer crosses a page boundary, we need to do 2 memcpy's, one for
> + * each page. This is only required in copy mode.
> + */
> +static void __xsk_rcv_memcpy(struct xdp_umem *umem, u64 addr, void *from_buf,
> +			     u32 len, u32 metalen)
> +{
> +	void *to_buf = xdp_umem_get_data(umem, addr);
> +
> +	if (xskq_crosses_non_contig_pg(umem, addr, len + metalen)) {
> +		void *next_pg_addr = umem->pages[(addr >> PAGE_SHIFT) + 1].addr;
> +		u64 page_start = addr & (PAGE_SIZE - 1);
> +		u64 first_len = PAGE_SIZE - (addr - page_start);

Let addr = 0x12345, PAGE_SIZE = 0x1000, len = 0x1000. Your calculations 
lead to page_start = 0x345, first_len = 0x1000 - 0x12000, which is 
negative. I think page_start is calculated incorrectly (is ~ missing?).

> +
> +		memcpy(to_buf, from_buf, first_len + metalen);
> +		memcpy(next_pg_addr, from_buf + first_len, len - first_len);
> +
> +		return;
> +	}
> +
> +	memcpy(to_buf, from_buf, len + metalen);
> +}
> +

<...>

> +static inline bool xskq_is_valid_addr_unaligned(struct xsk_queue *q, u64 addr,
> +						u64 length,
> +						struct xdp_umem *umem)
> +{
> +	addr += addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
> +	addr &= XSK_UNALIGNED_BUF_ADDR_MASK;
> +	if (addr >= q->size ||

Addresses like 0x00aaffffffffffff will pass the validation (0xaa + 
0xffffffffffff will overflow mod 2^48 and become a small number), 
whereas such addresses don't look valid for me.

> +	    xskq_crosses_non_contig_pg(umem, addr, length)) {

If the region is not contiguous, we cant RX into it - that's clear. 
However, how can the userspace determine whether these two pages of UMEM 
are mapped contiguously in the DMA space? Are we going to silently drop 
descriptors to non-contiguous frames and leak them? Please explain how 
to use it correctly from the application side.

> +		q->invalid_descs++;
> +		return false;
> +	}
> +
> +	return true;
> +}

<...>