[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.21.1912201536120.16819@www.lameter.com>
Date: Fri, 20 Dec 2019 15:36:51 +0000 (UTC)
From: Christopher Lameter <cl@...ux.com>
To: Tejun Heo <tj@...nel.org>
cc: Jesper Dangaard Brouer <brouer@...hat.com>,
Björn Töpel <bjorn.topel@...il.com>,
bpf <bpf@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>,
Dennis Zhou <dennis@...nel.org>
Subject: Re: Percpu variables, benchmarking, and performance weirdness
On Fri, 20 Dec 2019, Tejun Heo wrote:
> On Fri, Dec 20, 2019 at 10:34:20AM +0100, Jesper Dangaard Brouer wrote:
> > > So, my question to the uarch/percpu folks out there: Why are percpu
> > > accesses (%gs segment register) more expensive than regular global
> > > variables in this scenario.
> >
> > I'm also VERY interested in knowing the answer to above question!?
> > (Adding LKML to reach more people)
>
> No idea. One difference is that percpu accesses are through vmap area
> which is mapped using 4k pages while global variable would be accessed
> through the fault linear mapping. Maybe you're getting hit by tlb
> pressure?
And there are some accesses from remote processors to per cpu ares of
other cpus. If those are in the same cacheline then those will cause
additional latencies.
Powered by blists - more mailing lists