lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20081104163636.GA20534@elte.hu>
Date:	Tue, 4 Nov 2008 17:36:36 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	Alexander van Heukelum <heukelum@...tmail.fm>
Cc:	Cyrill Gorcunov <gorcunov@...il.com>,
	Alexander van Heukelum <heukelum@...lshack.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	"H. Peter Anvin" <hpa@...or.com>, lguest@...abs.org,
	jeremy@...source.com, Steven Rostedt <srostedt@...hat.com>,
	Mike Travis <travis@....com>, Andi Kleen <andi@...stfloor.org>
Subject: Re: [PATCH RFC/RFB] x86_64, i386: interrupt dispatch changes


* Alexander van Heukelum <heukelum@...tmail.fm> wrote:

> I wonder how the time needed for reading the GDT segments balances 
> against the time needed due to the extra redirection due to running 
> the stubs. I'ld be interested if the difference can be measured with 
> the current implementation. (I really need to highjack a machine to 
> do some measurements; I hoped someone would do it before I got to it 
> ;) )
> 
> Even if some CPU's have some internal optimization for the case 
> where the gate segment is the same as the current one, I wonder if 
> it is really important... Interrupts that occur while the processor 
> is running userspace already cause changing segments. They are more 
> likely to be in cache, maybe.

there are three main factors:

- Same-value segment loads are optimized on most modern CPUs and can
  give a few cycles (2-3) advantage. That might or might not apply to 
  the microcode that does IRQ entry processing. (A cache miss will 
  increase the cost much more but that is true in general as well)

- A second effect is that the changed data structure layout: a more
  compressed GDT entry (6 bytes) against a more spread out (~7 bytes,
  not aligned) interrupt trampoline. Note that the first one is data 
  cache the second one is instruction cache - the two have different 
  sizes, different implementations and different hit/miss pressures. 
  Generally the instruction-cache is the more precious resource and we 
  optimize for that first, for data cache second.

- A third effect is branch prediction: currently we are fanning 
  out all the vectors into ~240 branches just to recover a single 
  constant in essence. That is quite wasteful of instruction cache 
  resources, because from the logic side it's a data constant, not a 
  control flow difference. (we demultiplex that number into an 
  interrupt handler later on, but the CPU has no knowledge of that 
  relationship)

... all in one, the situation is complex enough on the CPU 
architecture side for it to really necessiate a measurement in 
practice, and that's why i have asked you to do them: the numbers need 
to go hand in hand with the patch submission.

My estimation is that if we do it right, your approach will behave 
better on modern CPUs (which is what matters most for such things), 
especially on real workloads where there's a considerable 
instruction-cache pressure. But it should be measured in any case.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ