lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 22 Oct 2020 13:32:29 +0530
From:   Sai Prakash Ranjan <saiprakash.ranjan@...eaurora.org>
To:     Suzuki Poulose <suzuki.poulose@....com>
Cc:     Mathieu Poirier <mathieu.poirier@...aro.org>,
        mike.leach@...aro.org, coresight@...ts.linaro.org,
        swboyd@...omium.org, linux-arm-msm@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
        denik@...gle.com, leo.yan@...aro.org, peterz@...radead.org
Subject: Re: [PATCH 1/2] coresight: tmc-etf: Fix NULL ptr dereference in
 tmc_enable_etf_sink_perf()

On 2020-10-21 15:38, Suzuki Poulose wrote:
> On 10/21/20 8:29 AM, Sai Prakash Ranjan wrote:
>> On 2020-10-20 21:40, Sai Prakash Ranjan wrote:
>>> On 2020-10-14 21:29, Sai Prakash Ranjan wrote:
>>>> On 2020-10-14 18:46, Suzuki K Poulose wrote:
>>>>> On 10/14/2020 10:36 AM, Sai Prakash Ranjan wrote:
>>>>>> On 2020-10-13 22:05, Suzuki K Poulose wrote:
>>>>>>> On 10/07/2020 02:00 PM, Sai Prakash Ranjan wrote:
>>>>>>>> There was a report of NULL pointer dereference in ETF enable
>>>>>>>> path for perf CS mode with PID monitoring. It is almost 100%
>>>>>>>> reproducible when the process to monitor is something very
>>>>>>>> active such as chrome and with ETF as the sink and not ETR.
>>>>>>>> Currently in a bid to find the pid, the owner is dereferenced
>>>>>>>> via task_pid_nr() call in tmc_enable_etf_sink_perf() and with
>>>>>>>> owner being NULL, we get a NULL pointer dereference.
>>>>>>>> 
>>>>>>>> Looking at the ETR and other places in the kernel, ETF and the
>>>>>>>> ETB are the only places trying to dereference the task(owner)
>>>>>>>> in tmc_enable_etf_sink_perf() which is also called from the
>>>>>>>> sched_in path as in the call trace. Owner(task) is NULL even
>>>>>>>> in the case of ETR in tmc_enable_etr_sink_perf(), but since we
>>>>>>>> cache the PID in alloc_buffer() callback and it is done as part
>>>>>>>> of etm_setup_aux() when allocating buffer for ETR sink, we never
>>>>>>>> dereference this NULL pointer and we are safe. So lets do the
>>>>>>> 
>>>>>>> The patch is necessary to fix some of the issues. But I feel it 
>>>>>>> is
>>>>>>> not complete. Why is it safe earlier and not later ? I believe we 
>>>>>>> are
>>>>>>> simply reducing the chances of hitting the issue, by doing this 
>>>>>>> earlier than
>>>>>>> later. I would say we better fix all instances to make sure that 
>>>>>>> the
>>>>>>> event->owner is valid. (e.g, I can see that the for kernel events
>>>>>>> event->owner == -1 ?)
>>>>>>> 
>>>>>>> struct task_struct *tsk = READ_ONCE(event->owner);
>>>>>>> 
>>>>>>> if (!tsk || is_kernel_event(event))
>>>>>>>    /* skip ? */
>>>>>>> 
>>>>>> 
>>>>>> Looking at it some more, is_kernel_event() is not exposed
>>>>>> outside events core and probably for good reason. Why do
>>>>>> we need to check for this and not just tsk?
>>>>> 
>>>>> Because the event->owner could be :
>>>>> 
>>>>>  = NULL
>>>>>  = -1UL  // kernel event
>>>>>  = valid.
>>>>> 
>>>> 
>>>> Yes I understood that part, but here we were trying to
>>>> fix the NULL pointer dereference right and hence the
>>>> question as to why we need to check for kernel events?
>>>> I am no expert in perf but I don't see anywhere in the
>>>> kernel checking for is_kernel_event(), so I am a bit
>>>> skeptical if exporting that is actually right or not.
>>>> 
>>> 
>>> I have stress tested with the original patch many times
>>> now, i.e., without a check for event->owner and is_kernel_event()
>>> and didn't observe any crash. Plus on ETR where this was already
>>> done, no crashes were reported till date and with ETF, the issue
>>> was quickly reproducible, so I am fairly confident that this
>>> doesn't just delay the original issue but actually fixes
>>> it. I will run an overnight test again to confirm this.
>>> 
>> 
>> I ran the overnight test which collected aroung 4G data(see below),
>> with the following small change to see if the two cases
>> (event->owner=NULL and is_kernel_event()) are triggered
>> with suggested changes and it didn't trigger at all.
>> Do we still need those additional checks?
>> 
> 
> Yes. Please see perf_event_create_kernel_event(), which is
> an exported function allowing any kernel code (including modules)
> to use the PMU (just like the userspace perf tool would do).
> Just because your use case doesn't trigger this (because
> you don't run something that can trigger this) doesn't mean
> this can't be triggered.
> 

Thanks for that pointer, I will add them in the next version.

Thanks,
Sai

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member
of Code Aurora Forum, hosted by The Linux Foundation

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ