[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALcN6mjmbMnObqOHD-ZRHHN5-1eQe=evuASNKDRM7nk6OLZ+2A@mail.gmail.com>
Date: Wed, 17 May 2017 21:59:41 -0700
From: David Carrillo-Cisneros <davidcc@...gle.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Zefan Li <lizefan@...wei.com>,
Thomas Gleixner <tglx@...utronix.de>,
"H. Peter Anvin" <hpa@...or.com>, Ingo Molnar <mingo@...nel.org>,
Matt Fleming <matt.fleming@...el.com>,
Vikas Shivappa <vikas.shivappa@...ux.intel.com>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] perf/x86/intel/cqm: Make sure the head event of
cache_groups always has valid RMID
On Tue, May 16, 2017 at 7:38 AM, Peter Zijlstra <peterz@...radead.org> wrote:
> On Thu, May 04, 2017 at 10:31:43AM +0800, Zefan Li wrote:
>> It is assumed that the head of cache_groups always has valid RMID,
>> which isn't true.
>>
>> When we deallocate RMID from conflicting events currently we don't
>> move them to the tail, and one of those events can happen to be in
>> the head. Another case is we allocate RMIDs for all the events except
>> the head event in intel_cqm_sched_in_event().
>>
>> Besides there's another bug that we retry rotating without resetting
>> nr_needed and start in __intel_cqm_rmid_rotate().
>>
>> Those bugs combined together led to the following oops.
>>
>> WARNING: at arch/x86/kernel/cpu/perf_event_intel_cqm.c:186 __put_rmid+0x28/0x80()
>> ...
>> [<ffffffff8103a578>] __put_rmid+0x28/0x80
>> [<ffffffff8103a74a>] intel_cqm_rmid_rotate+0xba/0x440
>> [<ffffffff8109d8cb>] process_one_work+0x17b/0x470
>> [<ffffffff8109e69b>] worker_thread+0x11b/0x400
>> ...
>> BUG: unable to handle kernel NULL pointer dereference at (null)
I ran into this bug long time ago but never found an easy way to
reproduce. Do you have one?
>> ...
>> [<ffffffff8103a74a>] intel_cqm_rmid_rotate+0xba/0x440
>> [<ffffffff8109d8cb>] process_one_work+0x17b/0x470
>> [<ffffffff8109e69b>] worker_thread+0x11b/0x400
>
> I've managed to forgot most if not all of that horror show. Vikas and
> David seem to be working on a replacement, but until such a time it
> would be good if this thing would not crash the kernel.
>
> Guys, could you have a look? To me it appears to mostly have the right
> shape, but like I said, I forgot most details...
The patch LGTM. I ran into this issues before and fixed them in a
similar but messier way, then the re-write started ...
>
>>
>> Cc: stable@...r.kernel.org
>> Signed-off-by: Zefan Li <lizefan@...wei.com>
Acked-by: David Carrillo-Cisneros <davidcc@...gle.com>
Powered by blists - more mailing lists