linux-kernel - Re: [PATCH] perf/x86/intel/cqm: Make sure the head event of cache

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALcN6mjmbMnObqOHD-ZRHHN5-1eQe=evuASNKDRM7nk6OLZ+2A@mail.gmail.com>
Date:   Wed, 17 May 2017 21:59:41 -0700
From:   David Carrillo-Cisneros <davidcc@...gle.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Zefan Li <lizefan@...wei.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        "H. Peter Anvin" <hpa@...or.com>, Ingo Molnar <mingo@...nel.org>,
        Matt Fleming <matt.fleming@...el.com>,
        Vikas Shivappa <vikas.shivappa@...ux.intel.com>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] perf/x86/intel/cqm: Make sure the head event of
 cache_groups always has valid RMID

On Tue, May 16, 2017 at 7:38 AM, Peter Zijlstra <peterz@...radead.org> wrote:
> On Thu, May 04, 2017 at 10:31:43AM +0800, Zefan Li wrote:
>> It is assumed that the head of cache_groups always has valid RMID,
>> which isn't true.
>>
>> When we deallocate RMID from conflicting events currently we don't
>> move them to the tail, and one of those events can happen to be in
>> the head. Another case is we allocate RMIDs for all the events except
>> the head event in intel_cqm_sched_in_event().
>>
>> Besides there's another bug that we retry rotating without resetting
>> nr_needed and start in __intel_cqm_rmid_rotate().
>>
>> Those bugs combined together led to the following oops.
>>
>> WARNING: at arch/x86/kernel/cpu/perf_event_intel_cqm.c:186 __put_rmid+0x28/0x80()
>> ...
>>  [<ffffffff8103a578>] __put_rmid+0x28/0x80
>>  [<ffffffff8103a74a>] intel_cqm_rmid_rotate+0xba/0x440
>>  [<ffffffff8109d8cb>] process_one_work+0x17b/0x470
>>  [<ffffffff8109e69b>] worker_thread+0x11b/0x400
>> ...
>> BUG: unable to handle kernel NULL pointer dereference at           (null)

I ran into this bug long time ago but never found an easy way to
reproduce. Do you have one?

>> ...
>>  [<ffffffff8103a74a>] intel_cqm_rmid_rotate+0xba/0x440
>>  [<ffffffff8109d8cb>] process_one_work+0x17b/0x470
>>  [<ffffffff8109e69b>] worker_thread+0x11b/0x400
>
> I've managed to forgot most if not all of that horror show. Vikas and
> David seem to be working on a replacement, but until such a time it
> would be good if this thing would not crash the kernel.
>
> Guys, could you have a look? To me it appears to mostly have the right
> shape, but like I said, I forgot most details...

The patch LGTM. I ran into this issues before and fixed them in a
similar but messier way, then the re-write started ...

>
>>
>> Cc: stable@...r.kernel.org
>> Signed-off-by: Zefan Li <lizefan@...wei.com>
Acked-by: David Carrillo-Cisneros <davidcc@...gle.com>