lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 6 Dec 2016 18:50:47 +0200
From:   Saeed Mahameed <saeedm@....mellanox.co.il>
To:     Martin KaFai Lau <kafai@...com>
Cc:     Linux Netdev List <netdev@...r.kernel.org>,
        Alexei Starovoitov <ast@...com>,
        Brenden Blanco <bblanco@...mgrid.com>,
        Daniel Borkmann <daniel@...earbox.net>,
        David Miller <davem@...emloft.net>,
        Jesper Dangaard Brouer <brouer@...hat.com>,
        Saeed Mahameed <saeedm@...lanox.com>,
        Tariq Toukan <tariqt@...lanox.com>,
        Kernel Team <kernel-team@...com>
Subject: Re: [PATCH v2 net-next 3/4] mlx4: xdp: Reserve headroom for receiving
 packet when XDP prog is active

On Mon, Dec 5, 2016 at 9:55 PM, Martin KaFai Lau <kafai@...com> wrote:
> On Mon, Dec 05, 2016 at 02:54:06AM +0200, Saeed Mahameed wrote:
>> On Sun, Dec 4, 2016 at 5:17 AM, Martin KaFai Lau <kafai@...com> wrote:
>> > Reserve XDP_PACKET_HEADROOM and honor bpf_xdp_adjust_head()
>> > when XDP prog is active.  This patch only affects the code
>> > path when XDP is active.
>> >
>> > Signed-off-by: Martin KaFai Lau <kafai@...com>
>> > ---
>>
>> Hi Martin, Sorry for the late review, i have some comments below
>>
>> >  drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 17 +++++++++++++++--
>> >  drivers/net/ethernet/mellanox/mlx4/en_rx.c     | 23 +++++++++++++++++------
>> >  drivers/net/ethernet/mellanox/mlx4/en_tx.c     |  9 +++++----
>> >  drivers/net/ethernet/mellanox/mlx4/mlx4_en.h   |  3 ++-
>> >  4 files changed, 39 insertions(+), 13 deletions(-)
>> >
>> > diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
>> > index 311c14153b8b..094a13b52cf6 100644
>> > --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
>> > +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
>> > @@ -51,7 +51,8 @@
>> >  #include "mlx4_en.h"
>> >  #include "en_port.h"
>> >
>> > -#define MLX4_EN_MAX_XDP_MTU ((int)(PAGE_SIZE - ETH_HLEN - (2 * VLAN_HLEN)))
>> > +#define MLX4_EN_MAX_XDP_MTU ((int)(PAGE_SIZE - ETH_HLEN - (2 * VLAN_HLEN) - \
>> > +                                  XDP_PACKET_HEADROOM))
>> >
>> >  int mlx4_en_setup_tc(struct net_device *dev, u8 up)
>> >  {
>> > @@ -1551,6 +1552,7 @@ int mlx4_en_start_port(struct net_device *dev)
>> >         struct mlx4_en_tx_ring *tx_ring;
>> >         int rx_index = 0;
>> >         int err = 0;
>> > +       int mtu;
>> >         int i, t;
>> >         int j;
>> >         u8 mc_list[16] = {0};
>> > @@ -1684,8 +1686,12 @@ int mlx4_en_start_port(struct net_device *dev)
>> >         }
>> >
>> >         /* Configure port */
>> > +       mtu = priv->rx_skb_size + ETH_FCS_LEN;
>> > +       if (priv->tx_ring_num[TX_XDP])
>> > +               mtu += XDP_PACKET_HEADROOM;
>> > +
>>
>> Why would the physical MTU care for the headroom you preserve for XDP prog?
>> This is the wire MTU, it shouldn't be changed, please keep it as
>> before, any preservation you make in packets buffers are needed only
>> for FWD case or modify case (HW or wire should not care about them).
>
> Thanks for your feedback!

Just doing my job :))

>
> FWD:
> packet received from a port
> => process by a XDP prog
> => XDP_TX out to the same port.
>
> For example, if the received packet has 1500 payload and the XDP prog
> encapsulates it in an IPv6 header (+40 bytes).  After testing, it cannot
> be sent out due to the HW/wire MTU is 1500.
>
> Even the wire MTU info was passed to the XDP prog, there is not much a
> XDP prog could do here other than dropping it.
>
> Hence, this patch gives guarantee to the XDP prog such that
> it can always send out what it has received + XDP_PACKET_HEADROOM.
>

Still i am not convinced ! this is against common sense,
this means that the XDP prog can send packets larger than the  MTU
seen on netdev!

anyway if a packet with the size (MTU + XDP_PACKET_HEADROOM) was sent
from XDP ring and HW allowed it to exit somehow (with the code you
provided :)), most likely it will be dropped
at the other end.

I still think XDP prog should not be allowed to FW packets larger than
the MTU seen on the netdev and you shouldn't modify the wire MTU just
for this case.

>>
>> >         err = mlx4_SET_PORT_general(mdev->dev, priv->port,
>> > -                                   priv->rx_skb_size + ETH_FCS_LEN,
>> > +                                   mtu,
>> >                                     priv->prof->tx_pause,
>> >                                     priv->prof->tx_ppp,
>> >                                     priv->prof->rx_pause,
>> > @@ -2255,6 +2261,13 @@ static bool mlx4_en_check_xdp_mtu(struct net_device *dev, int mtu)
>> >  {
>> >         struct mlx4_en_priv *priv = netdev_priv(dev);
>> >
>> > +       if (mtu + XDP_PACKET_HEADROOM > priv->max_mtu) {
>> > +               en_err(priv,
>> > +                      "Device max mtu:%d does not allow %d bytes reserved headroom for XDP prog\n",
>> > +                      priv->max_mtu, XDP_PACKET_HEADROOM);
>> > +               return false;
>> > +       }
>> > +
>> >         if (mtu > MLX4_EN_MAX_XDP_MTU) {
>> >                 en_err(priv, "mtu:%d > max:%d when XDP prog is attached\n",
>> >                        mtu, MLX4_EN_MAX_XDP_MTU);
>> > diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
>> > index 23e9d04d1ef4..324771ac929e 100644
>> > --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
>> > +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
>> > @@ -96,7 +96,6 @@ static int mlx4_en_alloc_frags(struct mlx4_en_priv *priv,
>> >         struct mlx4_en_rx_alloc page_alloc[MLX4_EN_MAX_RX_FRAGS];
>> >         const struct mlx4_en_frag_info *frag_info;
>> >         struct page *page;
>> > -       dma_addr_t dma;
>> >         int i;
>> >
>> >         for (i = 0; i < priv->num_frags; i++) {
>> > @@ -115,9 +114,10 @@ static int mlx4_en_alloc_frags(struct mlx4_en_priv *priv,
>> >
>> >         for (i = 0; i < priv->num_frags; i++) {
>> >                 frags[i] = ring_alloc[i];
>> > -               dma = ring_alloc[i].dma + ring_alloc[i].page_offset;
>> > +               frags[i].page_offset += priv->frag_info[i].rx_headroom;
>>
>> I don't see any need for headroom on frag_info other that frag0 (which
>> where the packet starts).
>> What is the meaning of a headroom of a frag in a middle of a packet ?
>>
>> if you agree with me then, you can use XDP_PACKET_HEADROOM as is where
>> needed (i.e frag0 page offset) and remove
>> "priv->frag_info[i].rx_headroom"
>>
>> ...
>>
>> After going through the code a little bit i see that this code is
>> shared between XDP and common path, and you didn't want to add boolean
>> conditions.
>>
>> Ok i see what you did here.
>>
>> Maybe we can pass headroom as a function parameter and split frag0
>> handling from the rest ?
>> If it is too much then i am ok with the code as it is,
> Right, this patch does the boolean check (XDP active or not) early on
> in mlx4_en_calc_rx_buf() (i.e. out of the fast path) and store
> the result in priv->frag_info[0].rx_headroom.
>
> Just want to ensure I understand your comment correctly.
> You prefer not to store the boolean test result in frag_info[0].rx_headroom
> since it is redundant to !!priv->tx_ring_num[TX_XDP] and rx_headroom is also
> confusing for frag[1-3].
>
> Instead, do the XDP [in]active test before calling mlx4_en_alloc_frags()
> and then only adjust frags[0].page_offset by +XDP_PACKET_HEADROOM if is needed.
> It could be done either by passing an extra argument to mlx4_en_alloc_frags()
> or completely separate mlx4_en_alloc_frags().  I am fine with this also.
>

Correct, but if this change will add extra checks to the data path
then I am ok with the current code.

>
>>
>> > +               rx_desc->data[i].addr = cpu_to_be64(frags[i].dma +
>> > +                                                   frags[i].page_offset);
>> >                 ring_alloc[i] = page_alloc[i];
>> > -               rx_desc->data[i].addr = cpu_to_be64(dma);
>> >         }
>> >
>> >         return 0;
>> > @@ -250,7 +250,8 @@ static int mlx4_en_prepare_rx_desc(struct mlx4_en_priv *priv,
>> >
>> >         if (ring->page_cache.index > 0) {
>> >                 frags[0] = ring->page_cache.buf[--ring->page_cache.index];
>> > -               rx_desc->data[0].addr = cpu_to_be64(frags[0].dma);
>> > +               rx_desc->data[0].addr = cpu_to_be64(frags[0].dma +
>> > +                                                   frags[0].page_offset);
>> >                 return 0;
>> >         }
>> >
>> > @@ -889,6 +890,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
>> >                 if (xdp_prog) {
>> >                         struct xdp_buff xdp;
>> >                         dma_addr_t dma;
>> > +                       void *pg_addr, *orig_data;
>> >                         u32 act;
>> >
>> >                         dma = be64_to_cpu(rx_desc->data[0].addr);
>> > @@ -896,11 +898,18 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
>> >                                                 priv->frag_info[0].frag_size,
>> >                                                 DMA_FROM_DEVICE);
>> >
>> > -                       xdp.data = page_address(frags[0].page) +
>> > -                                                       frags[0].page_offset;
>> > +                       pg_addr = page_address(frags[0].page);
>> > +                       orig_data = pg_addr + frags[0].page_offset;
>> > +                       xdp.data = orig_data;
>> >                         xdp.data_end = xdp.data + length;
>> >
>> >                         act = bpf_prog_run_xdp(xdp_prog, &xdp);
>> > +
>> > +                       if (xdp.data != orig_data) {
>> > +                               length = xdp.data_end - xdp.data;
>> > +                               frags[0].page_offset = xdp.data - pg_addr;
>> > +                       }
>> > +
>> >
>>
>> is this needed only for XDP FWD case ?
> No. It is also for PASS.
>

I see.

>> is this the only way to detect that the user modified the packet
>> headers (comparing pointers, before and after) ?
> Yes
>
>>
>> if the answer is yes, it should be faster to unconditionally reset
>> packet offset and lenght on XDP_FWD :
>> case XDP_FWD:
>>    length = xdp.data_end - xdp.data;
>>    frags[0].page_offset = xdp.data - pg_addr;
>>
>>
>> >                         switch (act) {
>> >                         case XDP_PASS:
>> >                                 break;
>> > @@ -1180,6 +1189,7 @@ void mlx4_en_calc_rx_buf(struct net_device *dev)
>> >                  */
>> >                 priv->frag_info[0].frag_stride = PAGE_SIZE;
>> >                 priv->frag_info[0].dma_dir = PCI_DMA_BIDIRECTIONAL;
>> > +               priv->frag_info[0].rx_headroom = XDP_PACKET_HEADROOM;
>> >                 i = 1;
>> >         } else {
>> >                 int buf_size = 0;
>> > @@ -1194,6 +1204,7 @@ void mlx4_en_calc_rx_buf(struct net_device *dev)
>> >                                 ALIGN(priv->frag_info[i].frag_size,
>> >                                       SMP_CACHE_BYTES);
>> >                         priv->frag_info[i].dma_dir = PCI_DMA_FROMDEVICE;
>> > +                       priv->frag_info[i].rx_headroom = 0;
>>
>> IMHO, redundant. as you see here frag0 and other frags handling are
>> separated, maybe we can do the same in mlx4_en_alloc_frags.
>>
>> >                         buf_size += priv->frag_info[i].frag_size;
>> >                         i++;
>> >                 }
>> > diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
>> > index 4b597dca5c52..9e5f38cefe5f 100644
>> > --- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
>> > +++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
>> > @@ -354,7 +354,7 @@ u32 mlx4_en_recycle_tx_desc(struct mlx4_en_priv *priv,
>> >         struct mlx4_en_rx_alloc frame = {
>> >                 .page = tx_info->page,
>> >                 .dma = tx_info->map0_dma,
>> > -               .page_offset = 0,
>> > +               .page_offset = XDP_PACKET_HEADROOM,
>> >                 .page_size = PAGE_SIZE,
>> >         };
>> >
>> > @@ -1132,7 +1132,7 @@ netdev_tx_t mlx4_en_xmit_frame(struct mlx4_en_rx_ring *rx_ring,
>> >         tx_info->page = frame->page;
>> >         frame->page = NULL;
>> >         tx_info->map0_dma = dma;
>> > -       tx_info->map0_byte_count = length;
>> > +       tx_info->map0_byte_count = length + frame->page_offset;
>>
>> Didn't you already take care of lenght by the following code:
>>                        if (xdp.data != orig_data) {
>>                                length = xdp.data_end - xdp.data;
>>                                frags[0].page_offset = xdp.data - pg_addr;
>>                         }
>>
> Before this patch, length always assumes the data starts at the beginning
> of the page and dma is the start of the page.  Hence, adding
> framg->page_offset back to the length here.
>
> However, if I read the codes correctly, I think the map0_byte_count (before or
> after this patch) does not matter since it is only used in dma_unmap_page() and
> PAGE_SIZE is always used in dma_unmap_page() for this code patch.  Hence, I think
> we can just set map0_byte_count to PAGE_SIZE here.
>

Right, in mlx4_alloc_pages we always map with PAGE_SIZE <<  order
 dma = dma_map_page(priv->ddev, page, 0, PAGE_SIZE << order,
  frag_info->dma_dir);
for XDP order is always 0, so you can safely set it to PAGE_SIZE.

>> and here  frame->page_offset is not really page offset, it can only be
>> XDP_PACKET_HEADROOM.
> Note that the XDP prog can call bpf_xdp_adjust_head() to add a header.
> The XDP prog can extend up to XDP_PACKET_HEADROOM (256) bytes but it
> can also (and usually) only add 40 bytes IPv6 header and then XDP_TX it out.
>

I see.

>>
>> >         tx_info->nr_txbb = nr_txbb;
>> >         tx_info->nr_bytes = max_t(unsigned int, length, ETH_ZLEN);
>> >         tx_info->data_offset = (void *)data - (void *)tx_desc;
>> > @@ -1141,9 +1141,10 @@ netdev_tx_t mlx4_en_xmit_frame(struct mlx4_en_rx_ring *rx_ring,
>> >         tx_info->linear = 1;
>> >         tx_info->inl = 0;
>> >
>> > -       dma_sync_single_for_device(priv->ddev, dma, length, PCI_DMA_TODEVICE);
>> > +       dma_sync_single_range_for_device(priv->ddev, dma, frame->page_offset,
>> > +                                        length, PCI_DMA_TODEVICE);
>> >
>> > -       data->addr = cpu_to_be64(dma);
>> > +       data->addr = cpu_to_be64(dma + frame->page_offset);
>> >         data->lkey = ring->mr_key;
>> >         dma_wmb();
>> >         data->byte_count = cpu_to_be32(length);
>> > diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
>> > index 20a936428f4a..ba1c6cd0cc79 100644
>> > --- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
>> > +++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
>> > @@ -475,7 +475,8 @@ struct mlx4_en_frag_info {
>> >         u16 frag_prefix_size;
>> >         u32 frag_stride;
>> >         enum dma_data_direction dma_dir;
>> > -       int order;
>> > +       u16 order;
>> > +       u16 rx_headroom;
>> >  };
>> >
>> >  #ifdef CONFIG_MLX4_EN_DCB
>> > --
>> > 2.5.1
>> >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ