linux-kernel - Re: [PATCH V2 2/4] drivers: qcom: rpmh-rsc: avoid locking in the interrupt handler

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190729190139.GH18620@codeaurora.org>
Date:   Mon, 29 Jul 2019 13:01:39 -0600
From:   Lina Iyer <ilina@...eaurora.org>
To:     Doug Anderson <dianders@...omium.org>
Cc:     Stephen Boyd <swboyd@...omium.org>, Andy Gross <agross@...nel.org>,
        Bjorn Andersson <bjorn.andersson@...aro.org>,
        linux-arm-msm <linux-arm-msm@...r.kernel.org>,
        "open list:ARM/QUALCOMM SUPPORT" <linux-soc@...r.kernel.org>,
        Rajendra Nayak <rnayak@...eaurora.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Linux PM <linux-pm@...r.kernel.org>, mkshah@...eaurora.org
Subject: Re: [PATCH V2 2/4] drivers: qcom: rpmh-rsc: avoid locking in the
 interrupt handler

On Thu, Jul 25 2019 at 09:44 -0600, Doug Anderson wrote:
>Hi,
>
>On Thu, Jul 25, 2019 at 8:18 AM Lina Iyer <ilina@...eaurora.org> wrote:
>>
>> On Wed, Jul 24 2019 at 17:28 -0600, Doug Anderson wrote:
>> >Hi,
>> >
>> >On Wed, Jul 24, 2019 at 1:36 PM Lina Iyer <ilina@...eaurora.org> wrote:
>> >>
>> >> On Wed, Jul 24 2019 at 13:38 -0600, Stephen Boyd wrote:
>> >> >Quoting Lina Iyer (2019-07-24 07:52:51)
>> >> >> On Tue, Jul 23 2019 at 14:11 -0600, Stephen Boyd wrote:
>> >> >> >Quoting Lina Iyer (2019-07-22 14:53:38)
>> >> >> >> Avoid locking in the interrupt context to improve latency. Since we
>> >> >> >> don't lock in the interrupt context, it is possible that we now could
>> >> >> >> race with the DRV_CONTROL register that writes the enable register and
>> >> >> >> cleared by the interrupt handler. For fire-n-forget requests, the
>> >> >> >> interrupt may be raised as soon as the TCS is triggered and the IRQ
>> >> >> >> handler may clear the enable bit before the DRV_CONTROL is read back.
>> >> >> >>
>> >> >> >> Use the non-sync variant when enabling the TCS register to avoid reading
>> >> >> >> back a value that may been cleared because the interrupt handler ran
>> >> >> >> immediately after triggering the TCS.
>> >> >> >>
>> >> >> >> Signed-off-by: Lina Iyer <ilina@...eaurora.org>
>> >> >> >> ---
>> >> >> >
>> >> >> >I have to read this patch carefully. The commit text isn't convincing me
>> >> >> >that it is actually safe to make this change. It mostly talks about the
>> >> >> >performance improvements and how we need to fix __tcs_trigger(), which
>> >> >> >is good, but I was hoping to be convinced that not grabbing the lock
>> >> >> >here is safe.
>> >> >> >
>> >> >> >How do we ensure that drv->tcs_in_use is cleared before we call
>> >> >> >tcs_write() and try to look for a free bit? Isn't it possible that we'll
>> >> >> >get into a situation where the bitmap is all used up but the hardware
>> >> >> >has just received an interrupt and is going to clear out a bit and then
>> >> >> >an rpmh write fails with -EBUSY?
>> >> >> >
>> >> >> If we have a situation where there are no available free bits, we retry
>> >> >> and that is part of the function. Since we have only 2 TCSes avaialble
>> >> >> to write to the hardware and there could be multiple requests coming in,
>> >> >> it is a very common situation. We try and acquire the drv->lock and if
>> >> >> there are free TCS available and if available mark them busy and send
>> >> >> our requests. If there are none available, we keep retrying.
>> >> >>
>> >> >
>> >> >Ok. I wonder if we need some sort of barriers here too, like an
>> >> >smp_mb__after_atomic()? That way we can make sure that the write to
>> >> >clear the bit is seen by another CPU that could be spinning forever
>> >> >waiting for that bit to be cleared? Before this change the spinlock
>> >> >would be guaranteed to make these barriers for us, but now that doesn't
>> >> >seem to be the case. I really hope that this whole thing can be changed
>> >> >to be a mutex though, in which case we can use the bit_wait() API, etc.
>> >> >to put tasks to sleep while RPMh is processing things.
>> >> >
>> >> We have drivers that want to send requests in atomic contexts and
>> >> therefore mutex locks would not work.
>> >
>> >Jumping in without reading all the context, but I saw this fly by and
>> >it seemed odd.  If I'm way off base then please ignore...
>> >
>> >Can you give more details?  Why are these drivers in atomic contexts?
>> >If they are in atomic contexts because they are running in the context
>> >of an interrupt then your next patch in the series isn't so correct.
>> >
>> >Also: when people submit requests in atomic context are they always
>> >submitting an asynchronous request?  In that case we could
>> >(presumably) just use a spinlock to protect the queue of async
>> >requests and a mutex for everything else?
>> Yes, drivers only make async requests in interrupt contexts.
>
>So correct me if I'm off base, but you're saying that drivers make
>requests in interrupt contexts even after your whole series and that's
>why you're using spinlocks instead of mutexes.  ...but then in patch
>#3 in your series you say:
>
>> Switch over from using _irqsave/_irqrestore variants since we no longer
>> race with a lock from the interrupt handler.
>
>Those seem like contradictions.  What happens if someone is holding
>the lock, then an interrupt fires, then the interrupt routine wants to
>do an async request.  Boom, right?
>
The interrupt routine is handled by the driver and only completes the
waiting object (for sync requests). No other requests can be made from
our interrupt handler.

>> They cannot
>> use the sync variants. The async and sync variants are streamlined into
>> the same code path. Hence the use of spinlocks instead of mutexes
>> through the critical path.
>
>I will perhaps defer to Stephen who was the one thinking that a mutex
>would be a big win here.  ...but if a mutex truly is a big win then it
>doesn't seem like it'd be that hard to have a linked list (protected
>by a spinlock) and then some type of async worker that:
>
>1. Grab the spinlock, pops one element off the linked list, release the spinlock
>2. Grab the mutex, send the one element, release the mutex
This would be a problem when the request is made from an irq handler. We
want to keep things simple and quick.

>3. Go back to step #1.
>
>This will keep the spinlock held for as little time as possible.