linux-kernel - Re: [x86_64] Strange oprofile results on access to per

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Fri, 3 Nov 2006 18:01:39 +0100
From:	Andi Kleen <ak@...e.de>
To:	Eric Dumazet <dada1@...mosbay.com>
Cc:	linux-kernel@...r.kernel.org
Subject: Re: [x86_64] Strange oprofile results on access to per_cpu data

On Friday 03 November 2006 08:26, Eric Dumazet wrote:
> Hi Andi
> 
> While doing some oprofile analysis, I got this result on ip_route_input() : 
> one particular instruction seems to spend a lot of cycles.
> 
> machine is a dual core 285, 2.6 GHz

Single socket? 


> ################ BEGIN
>     370  0.0017 :ffffffff803e98f1:       mov    %gs:0x8,%rax
> 222769  1.0454 :ffffffff803e98fa:       incl   0x38(%rcx,%rax,1)
> ################ END
>       6 2.8e-05 :ffffffff803e98fe:       mov    (%rdx),%rdx
>     833  0.0039 :ffffffff803e9901:       test   %rdx,%rdx
> 
> __raw_get_cpu_var(rt_cache_stat).field++ appears to be very expensive

First the profile events are not exact. It could be an earlier instruction.
The reordering Window is relatively large (~80 macro ops) 

But let's assume it is this one.

Weird. Maybe it's a cache miss. Can you check with DATA_CACHE_MISSES ? 
Or possible a TLB miss, although that's far less likely (L1_AND_L2_DTLB_MISSES)

> (about 18000 RT_CACHE_STAT_INC(in_hlist_search); are done per second, not an 
> impressive count in fact)
> 
> Are segment prefixes that expensive ?

No, they are supposed to be one cycle only.

> Or is it only the first access to %gs:8 that is doing extra checks ?
> (because other RT_CACHE_STAT_INC() done in the same function dont have this cost)
> is it the loading of %rcx (done in ffffffff803e98bd) that is stalling ?

My first guess would be a cache miss of some sort.

Maybe the rest of the code needs enough cache to push this line out. 
Or since you got dual core with separated caches they are bouncing
for some reason (they shouldn't, but maybe there is a bug) 

> so that RT_CACHE_STAT_INC(field) would map to
> 
>     addl $1,%gs:OFFSET  /* no register needed */

I doubt that will make much difference.

-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/