[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5546FFCB.50903@plumgrid.com>
Date: Sun, 03 May 2015 22:12:43 -0700
From: Alexei Starovoitov <ast@...mgrid.com>
To: Jesper Dangaard Brouer <brouer@...hat.com>
CC: "David S. Miller" <davem@...emloft.net>,
John Fastabend <john.r.fastabend@...el.com>,
Jamal Hadi Salim <jhs@...atatu.com>,
Daniel Borkmann <daniel@...earbox.net>, netdev@...r.kernel.org
Subject: Re: [PATCH v2 net-next] net: sched: run ingress qdisc without locks
On 5/3/15 8:42 AM, Jesper Dangaard Brouer wrote:
>
> I was actually expecting to see a higher performance boost.
> improvement diff = -2.85 ns
...
> The patch is removing two atomic operations, spin_{un,}lock, which I
> have benchmarked[1] to cost approx 14ns on my system. Your system
> likely is faster, but not that much (p.s. benchmark your own system
> with [1])
>
> [1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/time_bench_sample.c
have tried you tight loop spin_lock test on my box and it showed:
time_bench: Type:spin_lock_unlock Per elem: 40 cycles(tsc) 11.070 ns
and yet the total single cpu gain from removal of spin_lock/unlock
in ingress path is smaller than 11ns. I think this observation is
telling us that tight loop benchmarking is inherently flawed.
I'm guessing that uops that cmpxchg is broken into can execute in
parallel with uops of other insns, so tight loops of the same sequence
of uops has more alu dependencies whereas in more normal insn flow
these uops can mix and match better. Would be great if intel microarch
experts can chime in.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists