[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180502225903.39180be8@redhat.com>
Date: Wed, 2 May 2018 22:59:03 +0200
From: Jesper Dangaard Brouer <brouer@...hat.com>
To: Björn Töpel <bjorn.topel@...il.com>
Cc: magnus.karlsson@...el.com, alexander.h.duyck@...el.com,
alexander.duyck@...il.com, john.fastabend@...il.com, ast@...com,
willemdebruijn.kernel@...il.com, daniel@...earbox.net,
mst@...hat.com, netdev@...r.kernel.org,
michael.lundkvist@...csson.com, jesse.brandeburg@...el.com,
anjali.singhai@...el.com, qi.z.zhang@...el.com,
Björn Töpel <bjorn.topel@...el.com>,
brouer@...hat.com
Subject: Re: [PATCH bpf-next v3 15/15] samples/bpf: sample application and
documentation for AF_XDP sockets
On Wed, 2 May 2018 13:01:36 +0200 Björn Töpel <bjorn.topel@...il.com> wrote:
> +static void rx_drop(struct xdpsock *xsk)
> +{
> + struct xdp_desc descs[BATCH_SIZE];
> + unsigned int rcvd, i;
> +
> + rcvd = xq_deq(&xsk->rx, descs, BATCH_SIZE);
> + if (!rcvd)
> + return;
> +
> + for (i = 0; i < rcvd; i++) {
> + u32 idx = descs[i].idx;
> +
> + lassert(idx < NUM_FRAMES);
> +#if DEBUG_HEXDUMP
> + char *pkt;
> + char buf[32];
> +
> + pkt = xq_get_data(xsk, idx, descs[i].offset);
> + sprintf(buf, "idx=%d", idx);
> + hex_dump(pkt, descs[i].len, buf);
> +#endif
> + }
> +
> + xsk->rx_npkts += rcvd;
> +
> + umem_fill_to_kernel_ex(&xsk->umem->fq, descs, rcvd);
> +}
I would really like to see an option that can enable reading the
data/memory in the packet. Else the test is rather fake...
I hacked it myself manually to read first u32.
- Before: 10,771,083 pps
- After: 9,430,741 pps
The slowdown is not as big as I expected, which is good :-)
With perf stat I can see more LLC-load's, but not misses. It is not
getting registered as a cache-miss that I read data on the remote CPPU.
p.s. these tests are with mlx5 (which only have XDP_REDIRECT RX-side).
- -
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
Before:
sudo ~/perf stat -C3 -e L1-icache-load-misses -e cycles -e instructions -e cache-misses -e cache-references -e LLC-store-misses -e LLC-store -e LLC-load-misses -e LLC-load -r 3 sleep 1
Performance counter stats for 'CPU(s) 3' (3 runs):
200,020 L1-icache-load-misses ( +- 0.76% ) (33.31%)
3,920,754,587 cycles ( +- 0.14% ) (44.50%)
3,062,308,209 instructions # 0.78 insn per cycle ( +- 0.28% ) (55.65%)
823 cache-misses # 0.011 % of all cache refs ( +- 70.81% ) (66.74%)
7,587,132 cache-references ( +- 0.48% ) (77.83%)
0 LLC-store-misses (77.83%)
384,401 LLC-store ( +- 2.97% ) (77.83%)
15 LLC-load-misses # 0.00% of all LL-cache hits ( +-100.00% ) (22.17%)
3,192,312 LLC-load ( +- 0.35% ) (22.17%)
1.001199221 seconds time elapsed ( +- 0.00% )
After:
$ sudo ~/perf stat -C3 -e L1-icache-load-misses -e cycles -e instructions -e cache-misses -e cache-references -e LLC-store-misses -e LLC-store -e LLC-load-misses -e LLC-load -r 3 sleep 1
Performance counter stats for 'CPU(s) 3' (3 runs):
154,921 L1-icache-load-misses ( +- 3.88% ) (33.31%)
3,924,791,213 cycles ( +- 0.10% ) (44.50%)
2,930,116,185 instructions # 0.75 insn per cycle ( +- 0.33% ) (55.65%)
342 cache-misses # 0.002 % of all cache refs ( +- 65.52% ) (66.74%)
15,810,892 cache-references ( +- 0.13% ) (77.83%)
0 LLC-store-misses (77.83%)
925,544 LLC-store ( +- 2.33% ) (77.83%)
155 LLC-load-misses # 0.00% of all LL-cache hits ( +- 67.22% ) (22.17%)
12,791,264 LLC-load ( +- 0.04% ) (22.17%)
1.001206058 seconds time elapsed ( +- 0.00% )
Powered by blists - more mailing lists