lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 30 Apr 2008 12:54:21 -0300
From:	Glauber Costa <gcosta@...hat.com>
To:	Hugh Dickins <hugh@...itas.com>,
	linux-visws-devel@...ts.sourceforge.net, pazke@...pac.ru
CC:	"J.A. Magallón" <jamagallon@....com>,
	Ingo Molnar <mingo@...e.hu>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org
Subject: Re: Problems with -git14

Hugh Dickins wrote:
> On Wed, 30 Apr 2008, J.A. Magallón wrote:
>> I have a couple problems with latest git (-14):
>>
>> - It only recognises 2 processors out of 4 (dual Xeon HT)
>> - It oopses on the swapper process just on boot...
>>
>> Difference in dmesg is below. If full correct dmesg or config is
>> needed, please ask for them. The kernel was built copying old
>> 2.6.25 config to .config && make oldconfig. I filled the missing
>> gaps like PAT and others...
>>
>> -Brought up 4 CPUs
>> +native_cpu_up: bad cpu 2
>> +native_cpu_up: bad cpu 3
>> +Brought up 2 CPUs
> 
> Yes, I've been getting this for some days too: only 2 processors on
> dual Xeon HT in 32-bit mode; whereas x86_64 finds all 4 just fine.
> Ran lots of testing on 2.6.25-mm1 before I noticed it there.
> 
> I bisected for a while, but it got confusing (arrived at a bisect
> point which gave only 1 processor: smpboot code getting rearranged),
> so I was forced (quel horreur!) to investigate properly.  I've just
> now had success with the patch below, please give it a try:
> but it'll need an Ack from Glauber before it can go in.
> 
>> +WARNING: at include/linux/blkdev.h:427 blk_queue_init_tags+0x110/0x11f()
> 
> I presume this warning and backtrace is what you report as an oops:
> I think you'll find Linus included a fix for this one overnight, and
> it should have gone away in 2.6.25-git15 (but I didn't see it myself).
> 
> Hugh
> 
> [PATCH] x86_32: fix HT cpu booting
> 
> Since recent smpboot 32/64-bit merge, my dual Xeon with HT has been
> booting only 2 of its 4 cpus (when running an i386 kernel; but x86_64
> is okay).  J.A. Magallón reports the same.
> 
> native_cpu_up: bad cpu 2
> native_cpu_up: bad cpu 3
> 
> The mach-default cpu_present_to_apicid() was just returning cpu number
> (2, 3) instead of apicid (6, 7): looks like we now need the x86_64 code
> even for the i386 case.
> 
> Comparing with other versions of cpu_present_to_apicid(), it seems a
> good idea to include an NR_CPUS test too, since cpu_present() doesn't
> include that; but that wasn't a problem here, and may no problem at all.
> 
> One point worth noting - is it a worry?  Prior to that smpboot merge,
> my Xeon booted the two HT siblings on one physical first, then the
> two siblings on the other physical after - when i386, but alternated
> them when x86_64.  Since the merge, the x86_64 sequence is unchanged,
> but the i386 sequence is now like x86_64.  I prefer this consistency,
> and I prefer the new sequence: booting with maxcpus=2 then uses the
> independent physicals without HT sharing; but surprises in store?
> 
> Signed-off-by: Hugh Dickins <hugh@...itas.com>
> 
> --- 2.6.25-git/include/asm-x86/mach-default/mach_apic.h	2008-04-23 07:24:16.000000000 +0100
> +++ linux/include/asm-x86/mach-default/mach_apic.h	2008-04-30 14:55:14.000000000 +0100
> @@ -109,13 +109,8 @@ static inline int cpu_to_logical_apicid(
>  
>  static inline int cpu_present_to_apicid(int mps_cpu)
>  {
> -#ifdef CONFIG_X86_64
> -	if (cpu_present(mps_cpu))
> +	if (mps_cpu < NR_CPUS && cpu_present(mps_cpu))
>  		return (int)per_cpu(x86_bios_cpu_apicid, mps_cpu);
> -#else
> -	if (mps_cpu < get_physical_broadcast())
> -		return  mps_cpu;
> -#endif
>  	else
>  		return BAD_APICID;
>  }

Hugh, thanks for tracing this. The patch looks sane to me

However, since this problem was raised, I'm still concerned that visws 
may have the same problem, since it uses the same logic that i386 
mach_default used to. (and I did not touched, since x86_64 code went int 
o mach_default).

Could anyone with access to such hardware give it a try ?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ