lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180502225903.39180be8@redhat.com>
Date:   Wed, 2 May 2018 22:59:03 +0200
From:   Jesper Dangaard Brouer <brouer@...hat.com>
To:     Björn Töpel <bjorn.topel@...il.com>
Cc:     magnus.karlsson@...el.com, alexander.h.duyck@...el.com,
        alexander.duyck@...il.com, john.fastabend@...il.com, ast@...com,
        willemdebruijn.kernel@...il.com, daniel@...earbox.net,
        mst@...hat.com, netdev@...r.kernel.org,
        michael.lundkvist@...csson.com, jesse.brandeburg@...el.com,
        anjali.singhai@...el.com, qi.z.zhang@...el.com,
        Björn Töpel <bjorn.topel@...el.com>,
        brouer@...hat.com
Subject: Re: [PATCH bpf-next v3 15/15] samples/bpf: sample application and
 documentation for AF_XDP sockets


On Wed,  2 May 2018 13:01:36 +0200 Björn Töpel <bjorn.topel@...il.com> wrote:

> +static void rx_drop(struct xdpsock *xsk)
> +{
> +	struct xdp_desc descs[BATCH_SIZE];
> +	unsigned int rcvd, i;
> +
> +	rcvd = xq_deq(&xsk->rx, descs, BATCH_SIZE);
> +	if (!rcvd)
> +		return;
> +
> +	for (i = 0; i < rcvd; i++) {
> +		u32 idx = descs[i].idx;
> +
> +		lassert(idx < NUM_FRAMES);
> +#if DEBUG_HEXDUMP
> +		char *pkt;
> +		char buf[32];
> +
> +		pkt = xq_get_data(xsk, idx, descs[i].offset);
> +		sprintf(buf, "idx=%d", idx);
> +		hex_dump(pkt, descs[i].len, buf);
> +#endif
> +	}
> +
> +	xsk->rx_npkts += rcvd;
> +
> +	umem_fill_to_kernel_ex(&xsk->umem->fq, descs, rcvd);
> +}

I would really like to see an option that can enable reading the
data/memory in the packet.  Else the test is rather fake...

I hacked it myself manually to read first u32.
 - Before: 10,771,083 pps
 - After:   9,430,741 pps

The slowdown is not as big as I expected, which is good :-)

With perf stat I can see more LLC-load's, but not misses.  It is not
getting registered as a cache-miss that I read data on the remote CPPU.

p.s. these tests are with mlx5 (which only have XDP_REDIRECT RX-side).

- - 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


Before:

sudo ~/perf stat -C3 -e L1-icache-load-misses -e cycles -e  instructions -e cache-misses -e   cache-references  -e LLC-store-misses -e LLC-store -e LLC-load-misses -e  LLC-load -r 3 sleep 1

 Performance counter stats for 'CPU(s) 3' (3 runs):

           200,020      L1-icache-load-misses                                         ( +-  0.76% )  (33.31%)
     3,920,754,587      cycles                                                        ( +-  0.14% )  (44.50%)
     3,062,308,209      instructions              #    0.78  insn per cycle           ( +-  0.28% )  (55.65%)
               823      cache-misses              #    0.011 % of all cache refs      ( +- 70.81% )  (66.74%)
         7,587,132      cache-references                                              ( +-  0.48% )  (77.83%)
                 0      LLC-store-misses                                              (77.83%)
           384,401      LLC-store                                                     ( +-  2.97% )  (77.83%)
                15      LLC-load-misses           #    0.00% of all LL-cache hits     ( +-100.00% )  (22.17%)
         3,192,312      LLC-load                                                      ( +-  0.35% )  (22.17%)

       1.001199221 seconds time elapsed                                          ( +-  0.00% )


After:

$ sudo ~/perf stat -C3 -e L1-icache-load-misses -e cycles -e  instructions -e cache-misses -e   cache-references  -e LLC-store-misses -e LLC-store -e LLC-load-misses -e  LLC-load -r 3 sleep 1

 Performance counter stats for 'CPU(s) 3' (3 runs):

           154,921      L1-icache-load-misses                                         ( +-  3.88% )  (33.31%)
     3,924,791,213      cycles                                                        ( +-  0.10% )  (44.50%)
     2,930,116,185      instructions              #    0.75  insn per cycle           ( +-  0.33% )  (55.65%)
               342      cache-misses              #    0.002 % of all cache refs      ( +- 65.52% )  (66.74%)
        15,810,892      cache-references                                              ( +-  0.13% )  (77.83%)
                 0      LLC-store-misses                                              (77.83%)
           925,544      LLC-store                                                     ( +-  2.33% )  (77.83%)
               155      LLC-load-misses           #    0.00% of all LL-cache hits     ( +- 67.22% )  (22.17%)
        12,791,264      LLC-load                                                      ( +-  0.04% )  (22.17%)

       1.001206058 seconds time elapsed                                          ( +-  0.00% )

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ