lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <895af04e45d286c38d01f48b29ad41598b7dadb7.camel@mediatek.com>
Date: Tue, 25 Jun 2024 03:40:59 +0000
From: Jason-JH Lin (林睿祥) <Jason-JH.Lin@...iatek.com>
To: "jassisinghbrar@...il.com" <jassisinghbrar@...il.com>,
	"angelogioacchino.delregno@...labora.com"
	<angelogioacchino.delregno@...labora.com>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-mediatek@...ts.infradead.org" <linux-mediatek@...ts.infradead.org>,
	Singo Chang (張興國) <Singo.Chang@...iatek.com>,
	"chunkuang.hu@...nel.org" <chunkuang.hu@...nel.org>,
	Nancy Lin (林欣螢) <Nancy.Lin@...iatek.com>,
	Project_Global_Chrome_Upstream_Group
	<Project_Global_Chrome_Upstream_Group@...iatek.com>,
	"linux-arm-kernel@...ts.infradead.org"
	<linux-arm-kernel@...ts.infradead.org>, "matthias.bgg@...il.com"
	<matthias.bgg@...il.com>
Subject: Re: [PATCH 2/2] mailbox: mtk-cmdq: Move pm_runimte_get and put to
 mbox_chan_ops API

On Mon, 2024-06-24 at 12:45 -0500, Jassi Brar wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  On Mon, Jun 24, 2024 at 6:29 AM AngeloGioacchino Del Regno
> <angelogioacchino.delregno@...labora.com> wrote:
> >
> > Il 20/06/24 16:39, Jassi Brar ha scritto:
> > > On Thu, Jun 20, 2024 at 1:33 AM Jason-JH Lin (林睿祥)
> > > <Jason-JH.Lin@...iatek.com> wrote:
> > >>
> > >> On Wed, 2024-06-19 at 10:38 -0500, Jassi Brar wrote:
> > >>>
> > >>> External email : Please do not click links or open attachments
> until
> > >>> you have verified the sender or the content.
> > >>>   On Wed, Jun 19, 2024 at 3:18 AM AngeloGioacchino Del Regno
> > >>> <angelogioacchino.delregno@...labora.com> wrote:
> > >>>> Il 18/06/24 17:59, Jassi Brar ha scritto:
> > >>> .....
> > >>>
> > >>>> For example, when static content is displayed on screen, the
> CMDQ
> > >>> mailbox never
> > >>>> gets shut down, but no communication happens for a relatively
> long
> > >>> time; the
> > >>>> overhead of actually shutting down the mailbox and setting it
> back
> > >>> up would be
> > >>>> increasing latency in an unacceptable manner.
> > >>>>
> > >>> Hmm...  in your driver,  startup() is _empty_   and  shutdown()
> is
> > >>> all
> > >>> under a spin-lock with irqs disabled, so that too shouldn't be
> > >>> expensive. Right?
> > >>> Then what causes unacceptable latencies?
> > >>>
> > >>
> > >> I found that the BUG report only occurred when opening the
> camera APP.
> > >> Maybe something in imgsys_cmdq_sendtask() is too expensive or
> somewhere
> > >> else in imgsys driver.
> > >>
> > > If you move anything from submit() into startup(), which is once
> per
> > > lifetime of a channel, it will only make imgsys_cmdq_sendtask()
> > > cheaper.
> > > Btw, the imgsys code is not public, I don't know how it looks.
> > >

It'll be upstreamed in the future, so I can only test it locally to
find the information we want.

> > >
> > >> I'm wondering why this BUG report is not occurred in display use
> case
> > >> which is more frequent than imgsys use case.
> > >> Does this mean sleep() is not always called in
> pm_runtime_get_sync()
> > >> and if we can guarantee this sleep() won't be called, then using
> > >> pm_runtime_get_sync() in atomic context is OK?
> > >>
> > > Instead of hacking around, maybe try using startup() and
> shutdown()
> > > which is for such uses? Maybe request/release channel as part of
> RPM
> > > in your client driver if you are worried about the noise?
> > >
> > >
> > >>>> This is why I opted for autosuspend - it's only bringing down
> > >>> certain clocks for
> > >>>> the CMDQ HW, adding up a bit of power saving to the mix which,
> for
> > >>> some use cases
> > >>>> such as mobile devices with relatively small batteries, is
> > >>> definitely important.
> > >>>>
> > >>>> I'll also briefly (and only briefly) mention that 120Hz
> displays
> > >>> are already a
> > >>>> common thing and in this case the gap between TX and ACK is
> ~8.33ms
> > >>> instead, let
> > >>>> alone that displays with a framerate of more than 120Hz also
> do
> > >>> exist even though
> > >>>> they're less common.
> > >>>>
> > >>> I don't know how even busier channels help your point.
> > >>>
> > >>>> All of the above describes a few of the reasons why
> autosuspend is
> > >>> a good choice
> > >>>> here, instead of a shutdown->startup flow.
> > >>>> And again - I can place some bets that PM would also be
> applicable
> > >>> to SoCs from
> > >>>> other vendors as well, with most probably different benefits
> (but
> > >>> still with some
> > >>>> power related benefits!) compared to MediaTek.
> > >>
> > >> I agree with Angelo's point!
> > >>
> > > Ok, but you or Angelo still don't explain "unacceptable
> latencies"
> > > when your startup() and shutdown() are empty. You just want api
> > > changed, which we can but at least do you part and tell me where
> the
> > > bottleneck (unexpected latency) comes from.
> > >
> >
> > "I want numbers" is a sensible request, honestly I would do the
> same so I totally
> > understand that.
> >
> > Jason, can you please perform latency measurements on 60Hz and
> *especially* ISP/cam
> > cases while "continuously" calling startup() and shutdown() for
> every power saving
> > operation?

You mean moving clk_builk_enable() and clk_bulk_disable() to startup()
and shutdown(), and then calling startup() in cmdq_runtime_resume() and
calling shutdown() in cmdq_runtime_suspend()?
If we do that, I think GCE clocks will never be closed because 
mbox_request_channel() and mbox_free_channel() will call startup() and
shutdown() already.

I found that display thread:0 will call mbox_send_message() every 16ms
when I moving UI continuously:
[   61.503604] thread_id:0, send_diff:16 ms
[   61.520266] thread_id:0, send_diff:16 ms
[   61.537143] thread_id:0, send_diff:16 ms
[   61.553609] thread_id:0, send_diff:16 ms
[   61.570811] thread_id:0, send_diff:17 ms
[   61.587075] thread_id:0, send_diff:16 ms
[   61.603527] thread_id:0, send_diff:16 ms
[   61.620082] thread_id:0, send_diff:16 ms
[   61.637496] thread_id:0, send_diff:17 ms
[   61.653876] thread_id:0, send_diff:16 ms
[   61.670162] thread_id:0, send_diff:16 ms
[   61.687318] thread_id:0, send_diff:17 ms
[   61.703747] thread_id:0, send_diff:16 ms
[   61.720656] thread_id:0, send_diff:16 ms
If I didn't do anything to UI, I won't see any mbox_send_message()
logs.

It means mbox_send_message() will only be called when the display is
updated:
[  782.777205] thread_id:0, send_diff:20072 ms
[  820.296738] thread_id:0, send_diff:37519 ms
[  863.034270] thread_id:0, send_diff:42737 ms
[  880.297759] thread_id:0, send_diff:17263 ms
[  917.795432] thread_id:0, send_diff:37497 ms


When opening camera APP, it seems to become 30fps:
[  120.551778] BUG: sleeping function called from invalid context at
drivers/base/power/runtime.c:1164
[  120.560844] in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid:
201, name: kworker/u17:1
[  120.569363] preempt_count: 1, expected: 0
[  120.573367] RCU nest depth: 0, expected: 0
[  120.577457] INFO: lockdep is turned off.
[  120.581372] irq event stamp: 0
[  120.584420] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
[  120.590687] hardirqs last disabled at (0): [<ffffffd3624a86e0>]
copy_process+0x578/0xf10
[  120.598780] softirqs last  enabled at (0): [<ffffffd3624a86ec>]
copy_process+0x584/0xf10
[  120.606865] softirqs last disabled at (0): [<0000000000000000>] 0x0
[  120.613128] CPU: 0 PID: 201 Comm: kworker/u17:1 Not tainted 6.1.92-
lockdep-14419-g8c5f20e56598-dirty #1
e971112e83e796239a41e46e8747794c737fb18c
[  120.626073] Hardware name: Google Ciri sku0/unprovisioned board (DT)
[  120.632420] Workqueue: imgsys_runner imgsys_runner_func
[  120.637647] Call trace:
[  120.640086]  dump_backtrace+0x100/0x120
[  120.643919]  show_stack+0x20/0x2c
[  120.647230]  dump_stack_lvl+0x84/0xb4
[  120.650891]  dump_stack+0x18/0x44
[  120.654203]  __might_resched+0x204/0x214
[  120.658123]  __might_sleep+0x50/0x80
[  120.661692]  __pm_runtime_resume+0x48/0xbc
[  120.665786]  cmdq_mbox_send_data+0x8c/0x434
[  120.669966]  msg_submit+0x84/0x104
[  120.673362]  mbox_send_message+0xb8/0x124
[  120.677365]  imgsys_cmdq_sendtask+0x688/0x928
[  120.681717]  imgsys_runner_func+0x44/0x78
[  120.685722]  process_one_work+0x274/0x574
[  120.689731]  worker_thread+0x240/0x3f0
[  120.693475]  kthread+0xfc/0x11c
[  120.696612]  ret_from_fork+0x10/0x20
[  120.710570] mtk-cam raw 16030000.cam-raw-a: [SOF] frame done reading
lost frames. req(7),ts(120697780083)
[  120.720352] mtk-cam raw 16030000.cam-raw-a: [SOF] SCQ_DELAY state:4
ts:120697780083
[  120.990091] thread_id:0, send_diff:1470 ms
[  121.007210] thread_id:0, send_diff:17 ms
[  121.016484] thread_id:0, send_diff:9 ms
[  121.032652] thread_id:0, send_diff:16 ms
[  121.049366] thread_id:0, send_diff:16 ms
[  121.066880] thread_id:0, send_diff:17 ms
[  121.087023] thread_id:0, send_diff:20 ms
[  121.099325] thread_id:0, send_diff:12 ms
[  121.135286] thread_id:0, send_diff:35 ms
[  121.149349] thread_id:0, send_diff:14 ms
[  121.179259] thread_id:0, send_diff:29 ms
[  121.203183] thread_id:0, send_diff:23 ms
[  121.242234] thread_id:0, send_diff:39 ms
[  121.280645] thread_id:0, send_diff:38 ms
[  121.318651] thread_id:0, send_diff:38 ms
[  121.356366] thread_id:0, send_diff:37 ms
[  121.394484] thread_id:0, send_diff:38 ms

The cmdq_runtime_suspend() will only be called when I disabled camera
APP:
[  468.137530] pm_on ts:465259, pm_off ts:4681375, diff:2877 ms
and the diff is the duration from opening the camera APP to closing the
camera APP.

From booting to seeing the UI and any operation of UI, I didn't see
any cmdq_runtime_suspend() logs from display mbox channel.
So cmdq_runtime_resume() and cmdq_runtime_suspend() are never called by
display mbox channel.
This can tell why only opening the camera APP can see this BUG report.

> >
> To be clear, do with the mailbox channel that you do with the clocks
> now, because your startup() is literally and shutdown() is
> practically
> empty.
> Call shutdown() when no request has come in for a while, so you know
> the client has quiesced likely. So try putting request/release in
> higher level RPM with autosuspend.

I'm not quit sure how to modify this from current usage.
Do you mean:
1. Move pm_runtime_get_sync() from cmdq_mbox_semd_data() to startup()
and keep pm_runtime_put_autosuspend() in cmdq_mbox_semd_data()?

2. Call shutdown() in cmdq_runtime_suspend()?

Regards,
Jason-JH.Lin

> 
> thanks

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ