linux-kernel - Re: [v2.6.26] what's brewing in x86.git for v2.6.26

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <48072B9D.2000900@firstfloor.org>
Date:	Thu, 17 Apr 2008 12:51:09 +0200
From:	Andi Kleen <andi@...stfloor.org>
To:	Alexander van Heukelum <heukelum@...tmail.fm>
CC:	Ingo Molnar <mingo@...e.hu>, linux-kernel@...r.kernel.org
Subject: Re: [v2.6.26] what's brewing in x86.git for v2.6.26

> 
> The input for the first 'benchmark' was indeed completely unrealistic.
> They did show a very convincing speedup, though. This program was
> really written to verify the implementation and was later converted
> to a benchmark. Many benchmarks are unrealistic. I also wrote a
> benchmark for find_first_bit and find_next_bit:
>         http://heukelum.fastmail.fm/find_first_bit

I think a realistic benchmark would be by running a real kernel
and profiling the input values of the bitmap functions and then
testing these cases.

I actually started that when I complained last time by writing
a systemtap script for this that generates a histogram, but for some
reason systemtap couldn't tap all bitmap functions in my kernel and
missed some completely and I ran out of time tracking that down.

My gut feeling is the only interesting cases are cpumask/nodemask sized
(which can be one word, two words but now upto 8 words on a NR_CPU=4096
x86 kernel) and then 4k sized ext3/reiser/etc. block bitmaps.

> My conclusion would be: the speed of the generic bitmap implementation
> is either better than or at least comparable to the current private
> implementations in i386/x86_64. 

Ok.

The generic version is out-of-line,
> while the private implementation of i386 was inlined: this causes a
> regression for very small bitmaps. However, if the bitmap size is
> a constant and fits a long integer, the updated generic code should
> inline an optimized version, like x86_64 currently does it.

Yes it should probably. cpumask walks are relatively common.

I remember profiling mysql some time ago which did bad overscheduling
due to dumb locking. Funny was that the mask walking in the scheduler
actually stood out. No, i don't claim extreme overscheduling is an
interesting case to optimize for, but then there are more realistic
workloads which also do a lot of context switching.

BTW if you do generic work on this: one reason the generated code for
for_each_cpu etc. is so ugly is that the code has checks for
find_next_bit returning >= max size. If you can generize the
code enough to make sure no arch does that anymore these checks
could be eliminated.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/