lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 21 Sep 2009 14:00:30 +0530
From:	Sachin Sant <sachinp@...ibm.com>
To:	Tejun Heo <tj@...nel.org>
CC:	Pekka Enberg <penberg@...helsinki.fi>, Mel Gorman <mel@....ul.ie>,
	Nick Piggin <npiggin@...e.de>,
	Christoph Lameter <cl@...ux-foundation.org>,
	heiko.carstens@...ibm.com, linux-kernel@...r.kernel.org,
	linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>,
	Benjamin Herrenschmidt <benh@...nel.crashing.org>
Subject: Re: [PATCH 1/3] slqb: Do not use DEFINE_PER_CPU for per-node data

Tejun Heo wrote:
> Pekka Enberg wrote:
>   
>> Tejun Heo wrote:
>>     
>>> Pekka Enberg wrote:
>>>       
>>>> On Fri, Sep 18, 2009 at 10:34 PM, Mel Gorman <mel@....ul.ie> wrote:
>>>>         
>>>>> SLQB used a seemingly nice hack to allocate per-node data for the
>>>>> statically
>>>>> initialised caches. Unfortunately, due to some unknown per-cpu
>>>>> optimisation, these regions are being reused by something else as the
>>>>> per-node data is getting randomly scrambled. This patch fixes the
>>>>> problem but it's not fully understood *why* it fixes the problem at the
>>>>> moment.
>>>>>           
>>>> Ouch, that sounds bad. I guess it's architecture specific bug as x86
>>>> works ok? Lets CC Tejun.
>>>>         
>>> Is the corruption being seen on ppc or s390?
>>>       
>> On ppc.
>>     
>
> Can you please post full dmesg showing the corruption?  Also, if you
> apply the attached patch, does the added BUG_ON() trigger?
>   
I applied the three patches from Mel and one from Tejun.
With these patches applied the machine boots past
the original reported SLQB problem, but then hangs
just after printing these messages.

<6>ehea: eth0: Physical port up
<7>irq: irq 33539 on host null mapped to virtual irq 259
<6>ehea: External switch port is backup port
<7>irq: irq 33540 on host null mapped to virtual irq 260
<6>NET: Registered protocol family 10
^^^^^^ Hangs at this point.

Tejun, the above hang looks exactly the same as the one
i have reported here :

http://lists.ozlabs.org/pipermail/linuxppc-dev/2009-September/075791.html

This particular hang was bisected to the following patch

powerpc64: convert to dynamic percpu allocator

This hang can be recreated without SLQB. So i think this is a different
problem. 

I have attached the complete dmesg log here.

Thanks
-Sachin


-- 

---------------------------------
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
---------------------------------


View attachment "dmesg-log" of type "text/plain" (14599 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ