lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <747b383eb63797da52bb604ed98d2ab768eb2754.camel@mediatek.com>
Date: Wed, 1 May 2024 07:09:55 +0000
From: Jason-JH Lin (林睿祥) <Jason-JH.Lin@...iatek.com>
To: "jassisinghbrar@...il.com" <jassisinghbrar@...il.com>,
	"matthias.bgg@...il.com" <matthias.bgg@...il.com>,
	"angelogioacchino.delregno@...labora.com"
	<angelogioacchino.delregno@...labora.com>, "chunkuang.hu@...nel.org"
	<chunkuang.hu@...nel.org>
CC: "linux-mediatek@...ts.infradead.org" <linux-mediatek@...ts.infradead.org>,
	Singo Chang (張興國) <Singo.Chang@...iatek.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Jason-ch Chen (陳建豪)
	<Jason-ch.Chen@...iatek.com>, Shawn Sung (宋孝謙)
	<Shawn.Sung@...iatek.com>, Nancy Lin (林欣螢)
	<Nancy.Lin@...iatek.com>, Project_Global_Chrome_Upstream_Group
	<Project_Global_Chrome_Upstream_Group@...iatek.com>,
	"linux-arm-kernel@...ts.infradead.org" <linux-arm-kernel@...ts.infradead.org>
Subject: Re: [PATCH] mailbox: mtk-cmdq: Fix sleeping function called from
 invalid context

Hi Angelo,

Thanks for the reviews.

On Tue, 2024-04-30 at 11:39 +0200, AngeloGioacchino Del Regno wrote:
> Il 30/04/24 10:39, Jason-JH.Lin ha scritto:
> > When we run kernel with lockdebug option, we will get the BUG
> > below:
> > [  106.692124] BUG: sleeping function called from invalid context
> > at drivers/base/power/runtime.c:1164
> > [  106.692190] in_atomic(): 1, irqs_disabled(): 128, non_block: 0,
> > pid: 3616, name: kworker/u17:3
> > [  106.692226] preempt_count: 1, expected: 0
> > [  106.692254] RCU nest depth: 0, expected: 0
> > [  106.692282] INFO: lockdep is turned off.
> > [  106.692306] irq event stamp: 0
> > [  106.692331] hardirqs last  enabled at (0): [<0000000000000000>]
> > 0x0
> > [  106.692376] hardirqs last disabled at (0): [<ffffffee15d37fa0>]
> > copy_process+0xc90/0x2ac0
> > [  106.692429] softirqs last  enabled at (0): [<ffffffee15d37fc4>]
> > copy_process+0xcb4/0x2ac0
> > [  106.692473] softirqs last disabled at (0): [<0000000000000000>]
> > 0x0
> > [  106.692513] CPU: 1 PID: 3616 Comm: kworker/u17:3 Not tainted
> > 6.1.87-lockdep-14133-g26e933aca785 #1
> > 6839942e1cf34914b0a366137843dd2366f52aa9
> > [  106.692556] Hardware name: Google Ciri sku0/unprovisioned board
> > (DT)
> > [  106.692586] Workqueue: imgsys_runner imgsys_runner_func
> > [  106.692638] Call trace:
> > [  106.692662]  dump_backtrace+0x100/0x120
> > [  106.692702]  show_stack+0x20/0x2c
> > [  106.692737]  dump_stack_lvl+0x84/0xb4
> > [  106.692775]  dump_stack+0x18/0x48
> > [  106.692809]  __might_resched+0x354/0x4c0
> > [  106.692847]  __might_sleep+0x98/0xe4
> > [  106.692883]  __pm_runtime_resume+0x70/0x124
> > [  106.692921]  cmdq_mbox_send_data+0xe4/0xb1c
> > [  106.692964]  msg_submit+0x194/0x2dc
> > [  106.693003]  mbox_send_message+0x190/0x330
> > [  106.693043]  imgsys_cmdq_sendtask+0x1618/0x2224
> > [  106.693082]  imgsys_runner_func+0xac/0x11c
> > [  106.693118]  process_one_work+0x638/0xf84
> > [  106.693158]  worker_thread+0x808/0xcd0
> > [  106.693196]  kthread+0x24c/0x324
> > [  106.693231]  ret_from_fork+0x10/0x20
> > 
> > We found that there is a spin_lock_irqsave protection in
> > msg_submit()
> > of mailbox.c.
> > So when cmdq driver calls pm_runtime_get_sync() in
> > cmdq_mbox_send_data(),
> > it will get this BUG report.
> > 
> > Add pm_runtime_irq_safe() to let pm_runtime callbacks is safe in
> > atomic
> > context with interrupts disabled.
> > 
> 
> I see. The problem with this is that pm_runtime_irq_safe() will raise
> the refcount
> of the parent device "forever", which isn't the best and partially
> defeats what we
> are trying to do here (keeping stuff off unless really needed).
> 
> At this point I wonder if it'd be just better to really fix this
> instead of working
> around the problem.
> 
> I'd say that one of the options would be to perform a change to
> msg_submit() so
> that it will take into account that a .send_data() callback might be
> using RPM
> functions.
> 
> Ideas? :-)
> 
> Cheers,
> Angelo
> 

As Jassi mentioned, .send_data() can not use sleep, so we can not use
pm_runtime_get_sync() in cmdq_mbox_send_data().

Because we have to make sure the clocks is enabled before setting GCE
registers inside the cmdq_mbox_send_data(). I think we can't use the
ASYNC pm_runtime_get() instead of pm_runtime_get_sync() either.

And you're right, I didn't notice that raising the irq_safe flag will
make the parent device can not be suspended.

How about this:
1. Use clk_bluk_enable() + pm_runtime_get() instead of 
pm_runtime_get_sync() everywhere in mtk-cmdq-mailbox.c.

2. Use refcount++ instead of clk_bluk_enable() in
cmdq_runtime_resume().

3. Call clk_bluk_disable() when refcount > 0 in cmdq_runtime_suspend().

Is this an option?

Regards,
Jason-JH.Lin

> > Fixes: 8afe816b0c99 ("mailbox: mtk-cmdq-mailbox: Implement Runtime
> > PM with autosuspend")
> > Signed-off-by: Jason-JH.Lin <jason-jh.lin@...iatek.com>
> > ---
> >   drivers/mailbox/mtk-cmdq-mailbox.c | 1 +
> >   1 file changed, 1 insertion(+)
> > 
> > diff --git a/drivers/mailbox/mtk-cmdq-mailbox.c
> > b/drivers/mailbox/mtk-cmdq-mailbox.c
> > index ead2200f39ba..3b4f19633c83 100644
> > --- a/drivers/mailbox/mtk-cmdq-mailbox.c
> > +++ b/drivers/mailbox/mtk-cmdq-mailbox.c
> > @@ -694,6 +694,7 @@ static int cmdq_probe(struct platform_device
> > *pdev)
> >   
> >   	pm_runtime_set_autosuspend_delay(dev,
> > CMDQ_MBOX_AUTOSUSPEND_DELAY_MS);
> >   	pm_runtime_use_autosuspend(dev);
> > +	pm_runtime_irq_safe(dev);
> >   
> >   	return 0;
> >   }
> 
> 
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ