lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 3 Nov 2006 18:01:39 +0100
From:	Andi Kleen <ak@...e.de>
To:	Eric Dumazet <dada1@...mosbay.com>
Cc:	linux-kernel@...r.kernel.org
Subject: Re: [x86_64] Strange oprofile results on access to per_cpu data

On Friday 03 November 2006 08:26, Eric Dumazet wrote:
> Hi Andi
> 
> While doing some oprofile analysis, I got this result on ip_route_input() : 
> one particular instruction seems to spend a lot of cycles.
> 
> machine is a dual core 285, 2.6 GHz

Single socket? 


> ################ BEGIN
>     370  0.0017 :ffffffff803e98f1:       mov    %gs:0x8,%rax
> 222769  1.0454 :ffffffff803e98fa:       incl   0x38(%rcx,%rax,1)
> ################ END
>       6 2.8e-05 :ffffffff803e98fe:       mov    (%rdx),%rdx
>     833  0.0039 :ffffffff803e9901:       test   %rdx,%rdx
> 
> __raw_get_cpu_var(rt_cache_stat).field++ appears to be very expensive

First the profile events are not exact. It could be an earlier instruction.
The reordering Window is relatively large (~80 macro ops) 

But let's assume it is this one.

Weird. Maybe it's a cache miss. Can you check with DATA_CACHE_MISSES ? 
Or possible a TLB miss, although that's far less likely (L1_AND_L2_DTLB_MISSES)

> (about 18000 RT_CACHE_STAT_INC(in_hlist_search); are done per second, not an 
> impressive count in fact)
> 
> Are segment prefixes that expensive ?

No, they are supposed to be one cycle only.

> Or is it only the first access to %gs:8 that is doing extra checks ?
> (because other RT_CACHE_STAT_INC() done in the same function dont have this cost)
> is it the loading of %rcx (done in ffffffff803e98bd) that is stalling ?

My first guess would be a cache miss of some sort.

Maybe the rest of the code needs enough cache to push this line out. 
Or since you got dual core with separated caches they are bouncing
for some reason (they shouldn't, but maybe there is a bug) 

> so that RT_CACHE_STAT_INC(field) would map to
> 
>     addl $1,%gs:OFFSET  /* no register needed */

I doubt that will make much difference.

-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ