[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <17d4fac0db55a8f9835b53d55463ed9c4331950d.camel@mediatek.com>
Date: Fri, 6 Dec 2024 08:54:33 +0000
From: CK Hu (胡俊光) <ck.hu@...iatek.com>
To: "sashal@...nel.org" <sashal@...nel.org>, AngeloGioacchino Del Regno
	<angelogioacchino.delregno@...labora.com>, "javier.carrasco.cruz@...il.com"
	<javier.carrasco.cruz@...il.com>, "airlied@...il.com" <airlied@...il.com>,
	"wenst@...omium.org" <wenst@...omium.org>
CC: "torvalds@...ux-foundation.org" <torvalds@...ux-foundation.org>,
	"sima@...ll.ch" <sima@...ll.ch>, "dri-devel@...ts.freedesktop.org"
	<dri-devel@...ts.freedesktop.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>
Subject: Re: [git pull] drm for 6.13-rc1
Hi, Sasha:
On Mon, 2024-11-25 at 01:35 +0100, Javier Carrasco wrote:
> External email : Please do not click links or open attachments until you have verified the sender or the content.
> 
> 
> On 24/11/2024 23:58, Dave Airlie wrote:
> > On Mon, 25 Nov 2024 at 02:41, Sasha Levin <sashal@...nel.org> wrote:
> > > 
> > > On Thu, Nov 21, 2024 at 10:25:45AM +1000, Dave Airlie wrote:
> > > > Hi Linus,
> > > > 
> > > > This is the main drm pull request for 6.13.
> > > > 
> > > > I've done a test merge into your tree, there were two conflicts both
> > > > of which seem easy enough to resolve for you.
> > > > 
> > > > There's a lot of rework, the panic helper support is being added to
> > > > more drivers, v3d gets support for HW superpages, scheduler
> > > > documentation, drm client and video aperture reworks, some new
> > > > MAINTAINERS added, amdgpu has the usual lots of IP refactors, Intel
> > > > has some Pantherlake enablement and xe is getting some SRIOV bits, but
> > > > just lots of stuff everywhere.
> > > > 
> > > > Let me know if there are any issues,
> > > 
> > > Hey Dave,
> > > 
> > > After the PR was merged, I've started seeing boot failures reported by
> > > KernelCI:
> > 
> > I'll add the mediatek names I see who touched anything in the area recently.
> > 
> > Dave.
> > > 
> > > [    4.395400] mediatek-drm mediatek-drm.5.auto: bound 1c014000.merge (ops 0xffffd35fd12975f8)
> > > [    4.396155] mediatek-drm mediatek-drm.5.auto: bound 1c000000.ovl (ops 0xffffd35fd12977b8)
> > > [    4.411951] mediatek-drm mediatek-drm.5.auto: bound 1c002000.rdma (ops 0xffffd35fd12989c0)
> > > [    4.536837] mediatek-drm mediatek-drm.5.auto: bound 1c004000.ccorr (ops 0xffffd35fd1296cf0)
> > > [    4.545181] mediatek-drm mediatek-drm.5.auto: bound 1c005000.aal (ops 0xffffd35fd1296a80)
> > > [    4.553344] mediatek-drm mediatek-drm.5.auto: bound 1c006000.gamma (ops 0xffffd35fd12972b0)
> > > [    4.561680] mediatek-drm mediatek-drm.5.auto: bound 1c014000.merge (ops 0xffffd35fd12975f8)
> > > [    4.570025] ------------[ cut here ]------------
> > > [    4.574630] refcount_t: underflow; use-after-free.
> > > [    4.579416] WARNING: CPU: 6 PID: 81 at lib/refcount.c:28 refcount_warn_saturate+0xf4/0x148
> > > [    4.587670] Modules linked in:
> > > [    4.590714] CPU: 6 UID: 0 PID: 81 Comm: kworker/u32:3 Tainted: G        W          6.12.0 #1 cab58e2e59020ebd4be8ada89a65f465a316c742
> > > [    4.602695] Tainted: [W]=WARN
> > > [    4.605649] Hardware name: Acer Tomato (rev2) board (DT)
> > > [    4.610947] Workqueue: events_unbound deferred_probe_work_func
> > > [    4.616768] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > > [    4.623715] pc : refcount_warn_saturate+0xf4/0x148
> > > [    4.628493] lr : refcount_warn_saturate+0xf4/0x148
> > > [    4.633270] sp : ffff8000807639c0
> > > [    4.636571] x29: ffff8000807639c0 x28: ffff34ff4116c640 x27: ffff34ff4368e080
> > > [    4.643693] x26: ffffd35fd1299ac8 x25: ffff34ff46c8c410 x24: 0000000000000000
> > > [    4.650814] x23: ffff34ff4368e080 x22: 00000000fffffdfb x21: 0000000000000002
> > > [    4.657934] x20: ffff34ff470c6000 x19: ffff34ff410c7c10 x18: 0000000000000006
> > > [    4.665055] x17: 666678302073706f x16: 2820656772656d2e x15: ffff800080763440
> > > [    4.672176] x14: 0000000000000000 x13: 2e656572662d7265 x12: ffffd35fd2ed14f0
> > > [    4.679297] x11: 0000000000000001 x10: 0000000000000001 x9 : ffffd35fd0342150
> > > [    4.686418] x8 : c0000000ffffdfff x7 : ffffd35fd2e21450 x6 : 00000000000affa8
> > > [    4.693539] x5 : ffffd35fd2ed1498 x4 : 0000000000000000 x3 : 0000000000000000
> > > [    4.700660] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff34ff40932580
> > > [    4.707781] Call trace:
> > > [    4.710216]  refcount_warn_saturate+0xf4/0x148 (P)
> > > [    4.714993]  refcount_warn_saturate+0xf4/0x148 (L)
> > > [    4.719772]  kobject_put+0x110/0x118
> > > [    4.723335]  put_device+0x1c/0x38
> > > [    4.726638]  mtk_drm_bind+0x294/0x5c0
> > > [    4.730289]  try_to_bring_up_aggregate_device+0x16c/0x1e0
> > > [    4.735673]  __component_add+0xbc/0x1c0
> > > [    4.739495]  component_add+0x1c/0x30
> > > [    4.743058]  mtk_disp_rdma_probe+0x140/0x210
> > > [    4.747314]  platform_probe+0x70/0xd0
> > > [    4.750964]  really_probe+0xc4/0x2a8
> > > [    4.754527]  __driver_probe_device+0x80/0x140
> > > [    4.758870]  driver_probe_device+0x44/0x120
> > > [    4.763040]  __device_attach_driver+0xc0/0x108
> > > [    4.767470]  bus_for_each_drv+0x8c/0xf0
> > > [    4.771294]  __device_attach+0xa4/0x198
> > > [    4.775117]  device_initial_probe+0x1c/0x30
> > > [    4.779286]  bus_probe_device+0xb4/0xc0
> > > [    4.783109]  deferred_probe_work_func+0xb0/0x100
> > > [    4.787714]  process_one_work+0x18c/0x420
> > > [    4.791712]  worker_thread+0x30c/0x418
> > > [    4.795449]  kthread+0x128/0x138
> > > [    4.798665]  ret_from_fork+0x10/0x20
> > > [    4.802229] ---[ end trace 0000000000000000 ]---
> > > 
> > > I don't think that I'll be able to bisect further as I don't have the
> > > relevant hardware available.
> > > 
> > > --
> > > Thanks,
> > > Sasha
> 
> 
> Hello, I am one of those who touched something in the area.
> 
> To check if my changes are the cause of the boot failures, please apply
> this patch:
> 
> diff --git a/drivers/gpu/drm/mediatek/mtk_drm_drv.c
> b/drivers/gpu/drm/mediatek/mtk_drm_drv.c
> index 9a8ef8558da9..85be035a209a 100644
> --- a/drivers/gpu/drm/mediatek/mtk_drm_drv.c
> +++ b/drivers/gpu/drm/mediatek/mtk_drm_drv.c
> @@ -373,11 +373,12 @@ static bool mtk_drm_get_all_drm_priv(struct device
> *dev)
>         struct mtk_drm_private *temp_drm_priv;
>         struct device_node *phandle = dev->parent->of_node;
>         const struct of_device_id *of_id;
> +       struct device_node *node;
>         struct device *drm_dev;
>         unsigned int cnt = 0;
>         int i, j;
> 
> -       for_each_child_of_node_scoped(phandle->parent, node) {
> +       for_each_child_of_node(phandle->parent, node) {
>                 struct platform_device *pdev;
> 
>                 of_id = of_match_node(mtk_drm_of_ids, node);
> 
Does Javier's patch fix the problem?
Regards,
CK
> 
> ---
> 
> 
> This chunk can be found in mtk_drm_get_all_drm_priv(), which is not
> listed in the trace, but it is called from mtk_drm_bind().
> 
> The loop did not release the child_node if cnt == MAX_CRTC (by means of
> a break), which goes against how for_each_child_of_node() should be
> handled. If the child_node is indeed required afterwards (it is not
> referenced anywhere after the loop), it should be acquired via
> of_node_get() and stored somewhere to be able to put it later.
> 
> Then another issue would lie underneath as the reference to the
> child_node is not stored in any way. But if this patch fixes the issue,
> then I suppose it should be applied immediately, and the rest should be
> discussed later on.
> 
> By the way, are there any logs with debug/error messages to analyze
> further is the issue is something different?
> 
> Thanks and best regards,
> Javier Carrasco
Powered by blists - more mailing lists
 
