lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 24 Jul 2019 14:36:10 -0600
From:   Lina Iyer <ilina@...eaurora.org>
To:     Stephen Boyd <swboyd@...omium.org>
Cc:     agross@...nel.org, bjorn.andersson@...aro.org,
        linux-arm-msm@...r.kernel.org, linux-soc@...r.kernel.org,
        rnayak@...eaurora.org, linux-kernel@...r.kernel.org,
        linux-pm@...r.kernel.org, dianders@...omium.org,
        mkshah@...eaurora.org
Subject: Re: [PATCH V2 2/4] drivers: qcom: rpmh-rsc: avoid locking in the
 interrupt handler

On Wed, Jul 24 2019 at 13:38 -0600, Stephen Boyd wrote:
>Quoting Lina Iyer (2019-07-24 07:52:51)
>> On Tue, Jul 23 2019 at 14:11 -0600, Stephen Boyd wrote:
>> >Quoting Lina Iyer (2019-07-22 14:53:38)
>> >> Avoid locking in the interrupt context to improve latency. Since we
>> >> don't lock in the interrupt context, it is possible that we now could
>> >> race with the DRV_CONTROL register that writes the enable register and
>> >> cleared by the interrupt handler. For fire-n-forget requests, the
>> >> interrupt may be raised as soon as the TCS is triggered and the IRQ
>> >> handler may clear the enable bit before the DRV_CONTROL is read back.
>> >>
>> >> Use the non-sync variant when enabling the TCS register to avoid reading
>> >> back a value that may been cleared because the interrupt handler ran
>> >> immediately after triggering the TCS.
>> >>
>> >> Signed-off-by: Lina Iyer <ilina@...eaurora.org>
>> >> ---
>> >
>> >I have to read this patch carefully. The commit text isn't convincing me
>> >that it is actually safe to make this change. It mostly talks about the
>> >performance improvements and how we need to fix __tcs_trigger(), which
>> >is good, but I was hoping to be convinced that not grabbing the lock
>> >here is safe.
>> >
>> >How do we ensure that drv->tcs_in_use is cleared before we call
>> >tcs_write() and try to look for a free bit? Isn't it possible that we'll
>> >get into a situation where the bitmap is all used up but the hardware
>> >has just received an interrupt and is going to clear out a bit and then
>> >an rpmh write fails with -EBUSY?
>> >
>> If we have a situation where there are no available free bits, we retry
>> and that is part of the function. Since we have only 2 TCSes avaialble
>> to write to the hardware and there could be multiple requests coming in,
>> it is a very common situation. We try and acquire the drv->lock and if
>> there are free TCS available and if available mark them busy and send
>> our requests. If there are none available, we keep retrying.
>>
>
>Ok. I wonder if we need some sort of barriers here too, like an
>smp_mb__after_atomic()? That way we can make sure that the write to
>clear the bit is seen by another CPU that could be spinning forever
>waiting for that bit to be cleared? Before this change the spinlock
>would be guaranteed to make these barriers for us, but now that doesn't
>seem to be the case. I really hope that this whole thing can be changed
>to be a mutex though, in which case we can use the bit_wait() API, etc.
>to put tasks to sleep while RPMh is processing things.
>
We have drivers that want to send requests in atomic contexts and
therefore mutex locks would not work.

--Lina

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ