[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <54F438AE.7010301@arm.com>
Date: Mon, 02 Mar 2015 10:17:18 +0000
From: Sudeep Holla <sudeep.holla@....com>
To: Andre Przywara <andre.przywara@....com>,
Borislav Petkov <bp@...e.de>
CC: Sudeep Holla <sudeep.holla@....com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>,
"H. Peter Anvin" <hpa@...or.com>, "x86@...nel.org" <x86@...nel.org>
Subject: Re: [PATCH RFT v2] x86: move cacheinfo sysfs to generic cacheinfo
infrastructure
Hi Andre,
On 01/03/15 22:39, Andre Przywara wrote:
> On Tue, 24 Feb 2015 18:57:49 +0100, Borislav Petkov wrote:
>> On Mon, Feb 23, 2015 at 06:14:25PM +0000, Sudeep Holla wrote:
>>> - Rebased on v4.0-rc1
>>> - Fixed lockdep warning reported by Borislav
>> You probably have fixed the lockdep splat but not the NULL pointer
>> dereference which was there in the first mail I sent you. I'd suggest
>> you find an AMD box with an L3 and do some testing yourself before
>> sending out another version.
>
> So I gave that a spin on my old PhenomII X6 box, and indeed there is a
> NULL pointer dereference. I tracked it down to
> __cache_amd_cpumap_setup(), where the sibling map for the L3 is
> populated. When the function is called for the first CPU, subsequent
> CPU's cacheinfo structures obviously haven't been initialized yet, so
> the struct cpu_cacheinfo we obtain for the other CPUs is empty
> (num_levels and num_leaves are 0). Dereferencing info_list + index is
> then a bad idea.
Thanks a lot for spending time on debugging this and providing the fix.
> I fixed it by simply skipping the sibling map setup in this case (see
> the patch below). Later we revisit this code path because the most
> outer loop iterates over all CPUs anyway, so the fields are valid then
> and the sibling map is build up properly. Not sure if that is a proper
> fix (should we check for info_list being NULL?, is num_levels the
> right value?), but I don't feel like looking into this code deeper
> than I already did (Sudeep should be decorated for doing that!).
>
> Boris, can you try this on a Bulldozer class machine?
>
> Sudeep, if Boris acknowledges this, can you merge those two lines (or
> something similar) in your patch and repost it? I can give it a try
> then again.
>
I am fine squashing this change into the patch if Boris is fine with
the change.
>>
>> Also, when testing, do a snapshot of the sysfs cache hierarchy by
>> doing:
>>
>> grep . -EriIn /sys/devices/system/cpu/cpu?/cache/*
>> before your changes and after and check for differences.
>>
>> We can't allow ourselves any differences because this is an API.
>
> But as Sudeep mentioned, this is an orthogonal issue not directly
> related to this patch.
> Indeed I see a regression between 3.19.0 and a clean 4.0.0-rc1 in this
> respect, the diff is like:
> -cpu5/cache/index3/shared_cpu_map:00000000,0000003f
> +cpu5/cache/index3/shared_cpu_map:3f
> (for each cpu and index)
>
> Can someone confirm (and investigate) this?
>
I saw this even on ARM64 platforms with 4.0-rc1 and did narrow down to
series of formatting functions changes introduced by Tejun Heo, mainly
commit 4a0792b0e7a6("bitmap: use %*pb[l] to print bitmaps including
cpumasks and nodemasks")
Regards,
Sudeep
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists