linux-kernel - Re: [PATCH 0/4] amd64

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20090603182023.GA28083@aftab>
Date:	Wed, 3 Jun 2009 20:20:23 +0200
From:	Borislav Petkov <borislav.petkov@....com>
To:	"H. Peter Anvin" <hpa@...or.com>
CC:	Andrew Morton <akpm@...ux-foundation.org>, greg@...ah.com,
	mingo@...e.hu, norsk5@...oo.com, tglx@...utronix.de,
	mchehab@...hat.com, aris@...hat.com, edt@....ca,
	linux-kernel@...r.kernel.org, randy.dunlap@...cle.com,
	Sam Ravnborg <sam@...nborg.org>
Subject: Re: [PATCH 0/4] amd64_edac: misc fixes

On Mon, Jun 01, 2009 at 11:57:18AM -0700, H. Peter Anvin wrote:
> Borislav Petkov wrote:
> > Actually, popcnt got added to gas in July 2006 so checking the gas
> > version should suffice, IMHO.
> 
> gas is part of binutils.
> 
> > Anyway, I proposed something similar before but Andrew suggested that we
> > should simply slap in the opcode so we don't need the Kbuild changes.
> > The advantage of the approach is that it works unconditionally on all
> > toolchains and introduces less code changes. Hmm...
> 
> That really sucks, though, in the long run.  I personally prefer to have
> the "right thing" -- which in this case is probably gcc intrinsics --
> and then a fallback that will gradually fall out of use.

Ok, here's a simple performance data measurement exercise:

I went and rerouted all the cpumask_weight calls in sched.c through a
noinline local definition:

static noinline unsigned int my_weight(const struct cpumask *mask)
{
       return cpumask_weight(mask);
}

so that I could be able to dynamically ftrace the invocations. Compiling
a kernel (make -j8) on a quad core Fam10h gave the following trace
(excerpt):

          <idle>-0     [000]   313.120141: my_weight <-scheduler_tick
          <idle>-0     [000]   313.120145: my_weight <-select_nohz_load_balancer
          <idle>-0     [000]   313.124133: my_weight <-scheduler_tick
          <idle>-0     [000]   313.124138: my_weight <-select_nohz_load_balancer
          <idle>-0     [000]   313.128124: my_weight <-scheduler_tick
          <idle>-0     [000]   313.128127: my_weight <-select_nohz_load_balancer
          <idle>-0     [000]   313.132116: my_weight <-scheduler_tick
          <idle>-0     [000]   313.132120: my_weight <-select_nohz_load_balancer
          <idle>-0     [000]   313.136109: my_weight <-scheduler_tick
          <idle>-0     [000]   313.136114: my_weight <-select_nohz_load_balancer
           <...>-3986  [002]   313.138868: my_weight <-sched_balance_self
           <...>-3986  [002]   313.138870: my_weight <-sched_balance_self
           <...>-4064  [003]   313.138942: my_weight <-sched_balance_self
           <...>-4064  [003]   313.138945: my_weight <-sched_balance_self
           <...>-4064  [000]   313.142034: my_weight <-sched_balance_self
           <...>-4064  [000]   313.142037: my_weight <-sched_balance_self
           <...>-4065  [001]   313.143509: my_weight <-sched_balance_self
           <...>-4065  [001]   313.143511: my_weight <-sched_balance_self
            make-3777  [000]   313.146553: my_weight <-sched_balance_self
            make-3777  [000]   313.146554: my_weight <-sched_balance_self
           <...>-4066  [001]   313.146614: my_weight <-sched_balance_self
           <...>-4066  [001]   313.146614: my_weight <-sched_balance_self
           <...>-4066  [003]   313.149516: my_weight <-sched_balance_self


and the following stats:

compile time: ~309.373623 secs
my_weight calls on _all_ cores: 54005
	(cpu0: 14262, cpu1: 14417, cpu2: 11654, cpu3: 13672)

leading to approx. 174.56 calls per second on _ALL_ cores combined. If,
hypothetically speaking, this is a representative workload and we forget
the ftrace overhead, it looks like there's no need to switch to the
hardware version of hweight since this'll bring a bunch of code changes
which simply wouldn't justify themselves wrt to performance improvement.
It is just not worth the effort.

Of course, I'm open for suggestions wrt to a better workload but from
looking at the code, the most frequent hweight call site seems to be
scheduler_tick which happens with HZ frequency and even this is by
several magnitudes not enough for a measurable performance improvement.

Hmm..?

-- 
Regards/Gruss,
Boris.

Operating | Advanced Micro Devices GmbH
  System  | Karl-Hammerschmidt-Str. 34, 85609 Dornach b. München, Germany
 Research | Geschäftsführer: Thomas M. McCoy, Giuliano Meroni
  Center  | Sitz: Dornach, Gemeinde Aschheim, Landkreis München
  (OSRC)  | Registergericht München, HRB Nr. 43632

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/