lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 11 Nov 2020 12:57:34 +0100
From:   Magnus Karlsson <magnus.karlsson@...il.com>
To:     kernel test robot <lkp@...el.com>
Cc:     "Karlsson, Magnus" <magnus.karlsson@...el.com>,
        Björn Töpel <bjorn.topel@...el.com>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Network Development <netdev@...r.kernel.org>,
        Jonathan Lemon <jonathan.lemon@...il.com>,
        Jakub Kicinski <kuba@...nel.org>,
        John Fastabend <john.fastabend@...il.com>,
        kbuild-all@...ts.01.org, clang-built-linux@...glegroups.com,
        bpf <bpf@...r.kernel.org>, jeffrey.t.kirsher@...el.com
Subject: Re: [PATCH bpf-next v2 5/5] i40e: use batched xsk Tx interfaces to
 increase performance

On Wed, Nov 11, 2020 at 2:38 AM kernel test robot <lkp@...el.com> wrote:
>
> Hi Magnus,
>
> I love your patch! Perhaps something to improve:
>
> [auto build test WARNING on bpf-next/master]
>
> url:    https://github.com/0day-ci/linux/commits/Magnus-Karlsson/xsk-i40e-Tx-performance-improvements/20201110-190310
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
> config: powerpc64-randconfig-r025-20201110 (attached as .config)
> compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 4d81c8adb6ed9840257f6cb6b93f60856d422a15)
> reproduce (this is a W=1 build):
>         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # install powerpc64 cross compiling tool for clang build
>         # apt-get install binutils-powerpc64-linux-gnu
>         # https://github.com/0day-ci/linux/commit/b016bbeac6692a93e61b28efa430d64645032b5e
>         git remote add linux-review https://github.com/0day-ci/linux
>         git fetch --no-tags linux-review Magnus-Karlsson/xsk-i40e-Tx-performance-improvements/20201110-190310
>         git checkout b016bbeac6692a93e61b28efa430d64645032b5e
>         # save the attached .config to linux build tree
>         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=powerpc64
>
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@...el.com>
>
> All warnings (new ones prefixed by >>):
>
> >> drivers/net/ethernet/intel/i40e/i40e_xsk.c:417:13: warning: unknown pragma ignored [-Wunknown-pragmas]
>    #pragma GCC unroll 4
>                ^
>    1 warning generated.

And I was hoping that unknown pragmas would be ignored, but that will
obviously not be the case with -Wunknown-pragmas added. The unrolling
of this inner loop where the code spends most of its time gives me
nearly 1 Mpps extra in performance which is substantial, so I would
like to get this unrolled in some way, but without the warning. Need
some advice please. Here are some options that comes in mind:

#1: Suppress unknown pragma warnings in this file only by adding
CFLAGS_i40e_xsk.o += -Wno-unknown-pragmas (or whatever that option
might be) in the Makefile

#2: Force the compiler to loop-unroll the loop with for example a
switch statement with four cases that all fall through. This will make
the code less readable.

#3: Manually loop-unroll the loop. This will make the code even less
readable than #2.

I prefer #1 as I like to keep the code readable, but you might have
other better suggestions on how to tackle this.

Thanks: Magnus

> vim +417 drivers/net/ethernet/intel/i40e/i40e_xsk.c
>
>    408
>    409  static void i40e_xmit_pkt_batch(struct i40e_ring *xdp_ring, struct xdp_desc *desc,
>    410                                  unsigned int *total_bytes)
>    411  {
>    412          u16 ntu = xdp_ring->next_to_use;
>    413          struct i40e_tx_desc *tx_desc;
>    414          dma_addr_t dma;
>    415          u32 i;
>    416
>  > 417  #pragma GCC unroll 4
>    418          for (i = 0; i < PKTS_PER_BATCH; i++) {
>    419                  dma = xsk_buff_raw_get_dma(xdp_ring->xsk_pool, desc[i].addr);
>    420                  xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_pool, dma, desc[i].len);
>    421
>    422                  tx_desc = I40E_TX_DESC(xdp_ring, ntu++);
>    423                  tx_desc->buffer_addr = cpu_to_le64(dma);
>    424                  tx_desc->cmd_type_offset_bsz = build_ctob(I40E_TX_DESC_CMD_ICRC |
>    425                                                            I40E_TX_DESC_CMD_EOP,
>    426                                                            0, desc[i].len, 0);
>    427
>    428                  *total_bytes += desc[i].len;
>    429          }
>    430
>    431          xdp_ring->next_to_use = ntu;
>    432  }
>    433
>
> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ