lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220329150846.GA3767550@p14s>
Date:   Tue, 29 Mar 2022 09:08:46 -0600
From:   Mathieu Poirier <mathieu.poirier@...aro.org>
To:     AngeloGioacchino Del Regno 
        <angelogioacchino.delregno@...labora.com>
Cc:     bjorn.andersson@...aro.org, matthias.bgg@...il.com,
        pihsun@...omium.org, linux-remoteproc@...r.kernel.org,
        linux-arm-kernel@...ts.infradead.org,
        linux-mediatek@...ts.infradead.org, linux-kernel@...r.kernel.org,
        kernel@...labora.com
Subject: Re: [PATCH] rpmsg: mtk_rpmsg: Fix circular locking dependency

On Tue, Mar 29, 2022 at 12:55:33PM +0200, AngeloGioacchino Del Regno wrote:
> Il 18/02/22 19:01, Mathieu Poirier ha scritto:
> > On Fri, Feb 18, 2022 at 10:16:51AM +0100, AngeloGioacchino Del Regno wrote:
> > > Il 17/02/22 20:03, Mathieu Poirier ha scritto:
> > > > Hi Angelo,
> > > > 
> > > > On Fri, Jan 14, 2022 at 03:47:37PM +0100, AngeloGioacchino Del Regno wrote:
> > > > > During execution of the worker that's used to register rpmsg devices
> > > > > we are safely locking the channels mutex but, when creating a new
> > > > > endpoint for such devices, we are registering a IPI on the SCP, which
> > > > > then makes the SCP to trigger an interrupt, lock its own mutex and in
> > > > > turn register more subdevices.
> > > > > This creates a circular locking dependency situation, as the mtk_rpmsg
> > > > > channels_lock will then depend on the SCP IPI lock.
> > > > > 
> > > > > [   18.014514]  Possible unsafe locking scenario:
> > > > > [   18.014515]        CPU0                    CPU1
> > > > > [   18.014517]        ----                    ----
> > > > > [   18.045467]   lock(&mtk_subdev->channels_lock);
> > > > > [   18.045474]                                lock(&scp->ipi_desc[i].lock);
> > > > 
> > > > I spent well over an hour tracing through the meanders of the code to end up in
> > > > scp_ipi_register() which, I think, leads to the above.  But from there I don't
> > > > see how an IPI can come in and that tells me my assumption is wrong.
> > > > 
> > > > Can you give more details on the events that lead to the above?  I'm not saying
> > > > there is no problem, I just need to understand it.
> > > > 
> > > 
> > > Hi Mathieu,
> > > 
> > > I understand that following this flow without the assistance of the actual
> > > hardware may be a little confusing, so, no worries.
> > > 
> > > drivers/remoteproc/mtk_scp.c - this driver manages the SCP (obviously, a
> > > remote processor)
> > > drivers/remoteproc/mtk_scp_ipi.c - public functions for kernel SCP IPC
> > > 
> > > Flow:
> > > - MediaTek SCP gets probed
> > > - RPMSG starts, we start probing "something", like google,cros-ec-rpmsg
> > > - mtk_rpmsg: creates endpoint; IPI handler is registered here.
> > > 
> > >           ( more flow )
> > > 
> > > - mtk_rpmsg: mtk_rpmsg_ns_cb() -> mtk_rpmsg_create_device(), channel is
> > >               added to the channels list, worker gets scheduled
> > 
> > To me the above is out of order.  The name space endpoint is registered as part
> > of the remote processor start sequence.  From there an IPI with ns_ipi_id comes in
> > and then cros_ec_rpmsg_probe() is called.  The above seems to imply the
> > opposite.
> > 
> > > 
> > > 
> > > Now for the part that produces the real issue:
> > > 
> > > label_a:
> > > 
> > > *** RPMSG MUTEX LOCK ***
> > 
> > By this I take you mean the subdev->channels_lock mutex.
> > 
> > > - mtk_rpmsg: ## Go through multiple channels ##, call mtk_rpmsg_register_device()
> > > 
> > > - Registered device tries to communicate through RPMSG
> > > - .send() or .trysend() (depending on the device) is called: send_ipi()
> > >      *** SCP MUTEX LOCK ***
> > 
> > And this one is either scp->send_lock or scp->ipi_desc[i].lock.
> > 
> > >     - mtk_scp_ipi: Data written, ACK? ok -> return 0
> > >      *** SCP MUTEX UNLOCK ***
> > > 
> > > - mtk_scp_ipi: **** INTERRUPT!!! **** New RPMSG NS available? -> create channel
> > >            goto label_a;
> > > 
> > > *** RPMSG MUTEX UNLOCK ***
> > > 
> > > 
> > > Pardon me for keeping some things in this flow implicit, but that was done to
> > > simplify it as much as possible as to try to make you understand the situation.
> > 
> > I certainly appreciate the effort but the above does not provide me with a clear
> > path that causes the lock to happen.  As I said in my last reply I don't doubt
> > there is a lock contention but the provided information doesn't allow to
> > understand how it happens.
> > 
> > All I am looking for is one scenario with all mutexes and functions calls
> > involved.
> > 
> 
> Hello Mathieu,
> I'm sorry for leaving this unresolved for a long time, had to work on other
> things in the meanwhile.
>

I also had to move on and as such details related to this patch have mostly
vanished from my memory.

> I'm not sure what you need, can you please help me to give you the
> issue background that you require?
> 
> In the meanwhile, here's full debugging info coming from the kmsg:
> 
> https://paste.debian.net/1235967/

I will review this pastebin along with the explanation you had provided earlier
in this thread - maybe I'll get it this time... But that will take some time as
I have 5 other patchsets to review before looking at this again.

> 
> Thanks,
> Angelo
> 
> > Thanks,
> > Mathieu
> > 
> > > 
> > > Cheers,
> > > Angelo
> > > 
> > > > Thanks,
> > > > Mathieu
> > > > 
> > > > > [   18.228399]                                lock(&mtk_subdev->channels_lock);
> > > > > [   18.228405]   lock(&scp->ipi_desc[i].lock);
> > > > > [   18.264405]
> > > > > 
> > > > > To solve this, simply unlock the channels_lock mutex before calling
> > > > > mtk_rpmsg_register_device() and relock it right after, as safety is
> > > > > still ensured by the locking mechanism that happens right after
> > > > > through SCP.
> > > > > Notably, mtk_rpmsg_register_device() does not even require locking.
> > > > > 
> > > > > Fixes: 7017996951fd ("rpmsg: add rpmsg support for mt8183 SCP.")
> > > > > Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@...labora.com>
> > > > > ---
> > > > >    drivers/rpmsg/mtk_rpmsg.c | 2 ++
> > > > >    1 file changed, 2 insertions(+)
> > > > > 
> > > > > diff --git a/drivers/rpmsg/mtk_rpmsg.c b/drivers/rpmsg/mtk_rpmsg.c
> > > > > index 5b4404b8be4c..d1213c33da20 100644
> > > > > --- a/drivers/rpmsg/mtk_rpmsg.c
> > > > > +++ b/drivers/rpmsg/mtk_rpmsg.c
> > > > > @@ -234,7 +234,9 @@ static void mtk_register_device_work_function(struct work_struct *register_work)
> > > > >    		if (info->registered)
> > > > >    			continue;
> > > > > +		mutex_unlock(&subdev->channels_lock);
> > > > >    		ret = mtk_rpmsg_register_device(subdev, &info->info);
> > > > > +		mutex_lock(&subdev->channels_lock);
> > > > >    		if (ret) {
> > > > >    			dev_err(&pdev->dev, "Can't create rpmsg_device\n");
> > > > >    			continue;
> > > > > -- 
> > > > > 2.33.1
> > > > > 
> > > 
> > > 
> > > 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ