lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200611131710.13285.ak@suse.de>
Date:	Mon, 13 Nov 2006 17:10:13 +0100
From:	Andi Kleen <ak@...e.de>
To:	Ingo Molnar <mingo@...e.hu>
Cc:	"Siddha, Suresh B" <suresh.b.siddha@...el.com>,
	Andrew Morton <akpm@...l.org>, linux-kernel@...r.kernel.org,
	ashok.raj@...el.com
Subject: Re: [patch] genapic: optimize & fix APIC mode setup


> Programming the IPI itself is cheap.

yep, that is why physical mode doesn't hurt much.
 
> > for_each_cpu_mask is essentially just BSF with some glue.
> 
> yes - it's a minimum of 2 BSF scans, and that isnt exactly the cheapest 
> of instructions. It should be tens of cycles or more, combined with the 
> nonzero cost of passing a cpumask (which is 32 bytes) into the function 
> by value ...

Sorry, but everytime IPIs and all the other overhead etc. are involved even 10 BSFs 
would be completely  in the noise.  The relative cost is so much smaller.
I think you're barking up the wrong tree here regarding that loop.

There are surely inefficiencies in the code, but I don't think they're
related to cpu loops.

[BTW the code for the cpu loop is much worse than it could be [1], but 
that's a different story. But even with the worse code it doesn't matter much]

[1] I had an optimization patch for find_next_bit for that case once but gcc
didn't like it and fixing it completely (= getting rid of the bogus 
additional test) would require auditing all architectures. I didn't push
it very hard because I didn't have much evidence it is really performance
critical -- compared to cache misses and locks it is all small fries.

> 
> and your argument again, is to hide a hotplug bug 

I don't think it's a bug -- it's inherent.

> indeed, you are right. Btw., this is another change to io_apic.c that i 
> would never have ACKed ;-) This whole thing is hidden via something that 
> looks like a macro value:

Yes agreed the magic macros are ugly. It has historical reasons from i386 (and I'm 
partly to blame). Essentially it comes out of the mess that i386 subarchs 
are and when moving genapic to x86-64 that particular ugliness wasn't cleaned up.

> physical delivery is bad on the IO-APIC anyway - today LowestPrio 
> delivery mode works again and the hardware can help us. 

Help us with what exactly?

> you are risking the introduction of the kind of regressions that i'm 
> seeing on my box: that physical delivery mystically doesnt work 
> occasionally. AFAIK Windows defaults to logical APIC delivery too on 
> small systems. (it in fact uses the TPR IIRC which can only work with 
> logical delivery)

Yes that's a real risk.

Probably need a solution for your chipset. But currently as proposed by you it 
would be robbing Peter to pay Paul.

> So your suggestion would put us up to use an uncommon (and more 
> expensive ...) mode of IPI delivery - instead of limiting that rare 
> delivery mode to large SMP systems only ...
> 
> and again, please remind me of what you are trying to argue: that we 
> should keep the slower and less common delivery mode just to not have to 
> fix the hotplug bug?

Ok assuming temporarily it's a bug, how would you want to fix it?

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ