linux-kernel - Re: [PATCH 1/3] slqb: Do not use DEFINE_PER

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4AB739A6.5060807@in.ibm.com>
Date:	Mon, 21 Sep 2009 14:00:30 +0530
From:	Sachin Sant <sachinp@...ibm.com>
To:	Tejun Heo <tj@...nel.org>
CC:	Pekka Enberg <penberg@...helsinki.fi>, Mel Gorman <mel@....ul.ie>,
	Nick Piggin <npiggin@...e.de>,
	Christoph Lameter <cl@...ux-foundation.org>,
	heiko.carstens@...ibm.com, linux-kernel@...r.kernel.org,
	linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>,
	Benjamin Herrenschmidt <benh@...nel.crashing.org>
Subject: Re: [PATCH 1/3] slqb: Do not use DEFINE_PER_CPU for per-node data

Tejun Heo wrote:
> Pekka Enberg wrote:
>   
>> Tejun Heo wrote:
>>     
>>> Pekka Enberg wrote:
>>>       
>>>> On Fri, Sep 18, 2009 at 10:34 PM, Mel Gorman <mel@....ul.ie> wrote:
>>>>         
>>>>> SLQB used a seemingly nice hack to allocate per-node data for the
>>>>> statically
>>>>> initialised caches. Unfortunately, due to some unknown per-cpu
>>>>> optimisation, these regions are being reused by something else as the
>>>>> per-node data is getting randomly scrambled. This patch fixes the
>>>>> problem but it's not fully understood *why* it fixes the problem at the
>>>>> moment.
>>>>>           
>>>> Ouch, that sounds bad. I guess it's architecture specific bug as x86
>>>> works ok? Lets CC Tejun.
>>>>         
>>> Is the corruption being seen on ppc or s390?
>>>       
>> On ppc.
>>     
>
> Can you please post full dmesg showing the corruption?  Also, if you
> apply the attached patch, does the added BUG_ON() trigger?
>   
I applied the three patches from Mel and one from Tejun.
With these patches applied the machine boots past
the original reported SLQB problem, but then hangs
just after printing these messages.

<6>ehea: eth0: Physical port up
<7>irq: irq 33539 on host null mapped to virtual irq 259
<6>ehea: External switch port is backup port
<7>irq: irq 33540 on host null mapped to virtual irq 260
<6>NET: Registered protocol family 10
^^^^^^ Hangs at this point.

Tejun, the above hang looks exactly the same as the one
i have reported here :

http://lists.ozlabs.org/pipermail/linuxppc-dev/2009-September/075791.html

This particular hang was bisected to the following patch

powerpc64: convert to dynamic percpu allocator

This hang can be recreated without SLQB. So i think this is a different
problem. 

I have attached the complete dmesg log here.

Thanks
-Sachin


-- 

---------------------------------
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
---------------------------------


View attachment "dmesg-log" of type "text/plain" (14599 bytes)