linux-kernel - Re: [PATCH V2] remoteproc: core: Clear table_sz when rproc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250402014355.GA22575@nxa18884-linux>
Date: Wed, 2 Apr 2025 09:43:55 +0800
From: Peng Fan <peng.fan@....nxp.com>
To: Mathieu Poirier <mathieu.poirier@...aro.org>
Cc: Bjorn Andersson <andersson@...nel.org>,
	Arnaud Pouliquen <arnaud.pouliquen@...com>,
	"open list:REMOTE PROCESSOR (REMOTEPROC) SUBSYSTEM" <linux-remoteproc@...r.kernel.org>,
	open list <linux-kernel@...r.kernel.org>,
	Peng Fan <peng.fan@....com>
Subject: Re: [PATCH V2] remoteproc: core: Clear table_sz when rproc_shutdown

On Tue, Apr 01, 2025 at 10:05:03AM -0600, Mathieu Poirier wrote:
>On Tue, Apr 01, 2025 at 09:41:24AM +0800, Peng Fan wrote:
>> On Mon, Mar 31, 2025 at 09:40:41AM -0600, Mathieu Poirier wrote:
>> >On Sat, Mar 29, 2025 at 08:56:29PM +0800, Peng Fan wrote:
>> >> On Fri, Mar 28, 2025 at 08:14:41AM -0600, Mathieu Poirier wrote:
>> >> >On Fri, Mar 28, 2025 at 12:50:12PM +0800, Peng Fan wrote:
>> >> >> On Thu, Mar 27, 2025 at 11:46:33AM -0600, Mathieu Poirier wrote:
>> >> >> >Hi,
>> >> >> >
>> >> >> >On Wed, Mar 26, 2025 at 10:02:14AM +0800, Peng Fan (OSS) wrote:
>> >> >> >> From: Peng Fan <peng.fan@....com>
>> >> >> >> 
>> >> >> >> There is case as below could trigger kernel dump:
>> >> >> >> Use U-Boot to start remote processor(rproc) with resource table
>> >> >> >> published to a fixed address by rproc. After Kernel boots up,
>> >> >> >> stop the rproc, load a new firmware which doesn't have resource table
>> >> >> >> ,and start rproc.
>> >> >> >>
>> >> >> >
>> >> >> >If a firwmare image doesn't have a resouce table, rproc_elf_load_rsc_table()
>> >> >> >will return an error [1], rproc_fw_boot() will exit prematurely [2] and the
>> >> >> >remote processor won't be started.  What am I missing?
>> >> >> 
>> >> >> STM32 and i.MX use their own parse_fw implementation which allows no resource
>> >> >> table:
>> >> >> https://elixir.bootlin.com/linux/v6.13.7/source/drivers/remoteproc/stm32_rproc.c#L272
>> >> >> https://elixir.bootlin.com/linux/v6.13.7/source/drivers/remoteproc/imx_rproc.c#L598
>> >> >
>> >> >Ok, that settles rproc_fw_boot() but there is also rproc_find_loaded_rsc_table()
>> >> >that will return NULL if a resource table is not found and preventing the
>> >> >memcpy() in rproc_start() from happening:
>> >> >
>> >> >https://elixir.bootlin.com/linux/v6.14-rc6/source/drivers/remoteproc/remoteproc_core.c#L1288
>> >> 
>> >> 
>> >> Sorry, I forgot to mention below code:
>> >> loaded_table is a valid pointer for i.MX, see
>> >> https://elixir.bootlin.com/linux/v6.14-rc6/source/drivers/remoteproc/imx_rproc.c#L666,
>> >
>> >(SIGH)
>> >
>> >The changelong for this patch says "... load a new firmware which doesn't have a
>> >resource table..." and now you are telling me that @load_table is valid.  As
>> >such I have to _guess_ that @priv->rsc_table is not null.  So which is it -
>> >valid or not valid?  
>> 
>> As wrote in commit log, bootloader kicks the m7 and m7 publishes a valid
>> resource table to a fixed address.
>> 
>> When linux boots up, first stop m7, then load a new firmware which does not
>> have resource table, then stop m7.
>> 
>> Even the new firmware does not have resource table, the imx_rproc driver
>> still returns a valid resource table address which is got from device tree
>> (rsrc_table) in imx DTS when the driver probe.
>> 
>> @priv->rsc_table is always valid even the firwmare does not have a valid
>
>And that is where the problem is - why can't that situation be fixed instead of
>pushing it to the subsystem core?  Why can't you have code in
>imx_rproc_elf_find_loaded_rsc_table() that checks if there is a valid resource
>table at the address held by @priv->rsc_table and return NULL if there isn't?

It is ok address the issue in imx_rproc.c without touching core code.

priv->rsc_table contains valid resource table which is published when
m7 is kicked by bootloader, and m7 publishes a valid table to
priv->rsc_table.

It still contains valid content when linux stops m7.

To make it invalid when linux starts m7 with a firwmare(the elf image not has
resource table), need to clear the content of priv->rsc_table or
write some magic number when stop the m7 which was started by bootloader.

Then it is possible to check priv->rsc_table in
imx_rproc_elf_find_loaded_rsc_table.

The 2nd approach is to clear rproc->table_sz and rproc->table_ptr in
imx_rproc_parse_fw before rproc_elf_load_rsc_table.


>
>The core is already checking if @loaded_table is valid in rproc_start(), why
>can't that be used instead of adding yet another check?

Ah. I was thinking clear table_sz in rpoc_shutdown is an easy approach and
could benifit others in case other platforms meet similar issue in future.

If you think the current patch is not proper, I could do a v3 with the upper
2nd approach.

>
>> resource table. The TCM area is not writeable by Linux, so the firmware will
>> copy the resource table from TCM to DDR if the firmware has a resource table.
>> 
>> Hope this is clear.
>
>What is clear is that I spend 4 sessions on a 3-line patch, valuable time I
>could have spent reviewing other peoples' patches.

Sorry. Not intend to waste your time.

Thanks,
Peng
>
>> 
>> >
>> >If my assumption above is valid than fix that instead of hacking the remoteproc
>> >core.
>> 
>> I just found V1 was picked up by Bjorn.
>
>I am currently working with Bjorn on that.
>
>> It is not hack, clearing table_sz in core code does not hurt, I think.
>
>It is a hack for as long as you haven't provided a valid explanation for the
>changes you are proposing.  
>
>> 
>> If my assumption is not valid the changelog and your justification for
>> >this patch are wrong.  Either way I have spent way too much time on this patch
>> >already and dropping it.  The same goes for your other patch [1] - resent it
>> >when you will have properly address the work herein.   
>> 
>
>And yet you just sent a V2.
>
>> sure.
>> 
>> Thanks,
>> Peng
>> 
>> >
>> >[1]. [PATCH] remoteproc: imx_rproc: Add mutex protection for workqueue
>> >
>> >> 
>> >> So loaded_table is valid, it is memcpy trigger kernel panic because table_sz is
>> >> not zero while cached_table is NULL.
>> >> 	loaded_table = rproc_find_loaded_rsc_table(rproc, fw);
>> >> 	if (loaded_table) {
>> >> 		memcpy(loaded_table, rproc->cached_table, rproc->table_sz);
>> >> 		rproc->table_ptr = loaded_table;
>> >> 	}
>> >> 
>> >> Thanks,
>> >> Peng
>> >> 
>> >> >
>> >> >> 
>> >> >> Thanks,
>> >> >> Peng
>> >> >> 
>> >> >> >
>> >> >> >[1]. https://elixir.bootlin.com/linux/v6.14-rc6/source/drivers/remoteproc/remoteproc_elf_loader.c#L338
>> >> >> >[2]. https://elixir.bootlin.com/linux/v6.14-rc6/source/drivers/remoteproc/remoteproc_core.c#L1411 
>> >> >> >
>> >> >> >> When starting rproc with a firmware not have resource table,
>> >> >> >> `memcpy(loaded_table, rproc->cached_table, rproc->table_sz)` will
>> >> >> >> trigger dump, because rproc->cache_table is set to NULL during the last
>> >> >> >> stop operation, but rproc->table_sz is still valid.
>> >> >> >> 
>> >> >> >> This issue is found on i.MX8MP and i.MX9.
>> >> >> >> 
>> >> >> >> Dump as below:
>> >> >> >> Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
>> >> >> >> Mem abort info:
>> >> >> >>   ESR = 0x0000000096000004
>> >> >> >>   EC = 0x25: DABT (current EL), IL = 32 bits
>> >> >> >>   SET = 0, FnV = 0
>> >> >> >>   EA = 0, S1PTW = 0
>> >> >> >>   FSC = 0x04: level 0 translation fault
>> >> >> >> Data abort info:
>> >> >> >>   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
>> >> >> >>   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>> >> >> >>   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
>> >> >> >> user pgtable: 4k pages, 48-bit VAs, pgdp=000000010af63000
>> >> >> >> [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
>> >> >> >> Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
>> >> >> >> Modules linked in:
>> >> >> >> CPU: 2 UID: 0 PID: 1060 Comm: sh Not tainted 6.14.0-rc7-next-20250317-dirty #38
>> >> >> >> Hardware name: NXP i.MX8MPlus EVK board (DT)
>> >> >> >> pstate: a0000005 (NzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> >> >> >> pc : __pi_memcpy_generic+0x110/0x22c
>> >> >> >> lr : rproc_start+0x88/0x1e0
>> >> >> >> Call trace:
>> >> >> >>  __pi_memcpy_generic+0x110/0x22c (P)
>> >> >> >>  rproc_boot+0x198/0x57c
>> >> >> >>  state_store+0x40/0x104
>> >> >> >>  dev_attr_store+0x18/0x2c
>> >> >> >>  sysfs_kf_write+0x7c/0x94
>> >> >> >>  kernfs_fop_write_iter+0x120/0x1cc
>> >> >> >>  vfs_write+0x240/0x378
>> >> >> >>  ksys_write+0x70/0x108
>> >> >> >>  __arm64_sys_write+0x1c/0x28
>> >> >> >>  invoke_syscall+0x48/0x10c
>> >> >> >>  el0_svc_common.constprop.0+0xc0/0xe0
>> >> >> >>  do_el0_svc+0x1c/0x28
>> >> >> >>  el0_svc+0x30/0xcc
>> >> >> >>  el0t_64_sync_handler+0x10c/0x138
>> >> >> >>  el0t_64_sync+0x198/0x19c
>> >> >> >> 
>> >> >> >> Clear rproc->table_sz to address the issue.
>> >> >> >> 
>> >> >> >> While at here, also clear rproc->table_sz when rproc_fw_boot and
>> >> >> >> rproc_detach.
>> >> >> >> 
>> >> >> >> Fixes: 9dc9507f1880 ("remoteproc: Properly deal with the resource table when detaching")
>> >> >> >> Signed-off-by: Peng Fan <peng.fan@....com>
>> >> >> >> ---
>> >> >> >> 
>> >> >> >> V2:
>> >> >> >>  Clear table_sz when rproc_fw_boot and rproc_detach per Arnaud
>> >> >> >> 
>> >> >> >>  drivers/remoteproc/remoteproc_core.c | 3 +++
>> >> >> >>  1 file changed, 3 insertions(+)
>> >> >> >> 
>> >> >> >> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
>> >> >> >> index c2cf0d277729..1efa53d4e0c3 100644
>> >> >> >> --- a/drivers/remoteproc/remoteproc_core.c
>> >> >> >> +++ b/drivers/remoteproc/remoteproc_core.c
>> >> >> >> @@ -1442,6 +1442,7 @@ static int rproc_fw_boot(struct rproc *rproc, const struct firmware *fw)
>> >> >> >>  	kfree(rproc->cached_table);
>> >> >> >>  	rproc->cached_table = NULL;
>> >> >> >>  	rproc->table_ptr = NULL;
>> >> >> >> +	rproc->table_sz = 0;
>> >> >> >>  unprepare_rproc:
>> >> >> >>  	/* release HW resources if needed */
>> >> >> >>  	rproc_unprepare_device(rproc);
>> >> >> >> @@ -2025,6 +2026,7 @@ int rproc_shutdown(struct rproc *rproc)
>> >> >> >>  	kfree(rproc->cached_table);
>> >> >> >>  	rproc->cached_table = NULL;
>> >> >> >>  	rproc->table_ptr = NULL;
>> >> >> >> +	rproc->table_sz = 0;
>> >> >> >>  out:
>> >> >> >>  	mutex_unlock(&rproc->lock);
>> >> >> >>  	return ret;
>> >> >> >> @@ -2091,6 +2093,7 @@ int rproc_detach(struct rproc *rproc)
>> >> >> >>  	kfree(rproc->cached_table);
>> >> >> >>  	rproc->cached_table = NULL;
>> >> >> >>  	rproc->table_ptr = NULL;
>> >> >> >> +	rproc->table_sz = 0;
>> >> >> >>  out:
>> >> >> >>  	mutex_unlock(&rproc->lock);
>> >> >> >>  	return ret;
>> >> >> >> -- 
>> >> >> >> 2.37.1
>> >> >> >> 
>> >> >> >
>> >> >
>> >
>