[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z1hbyNXUubokloda@linux.intel.com>
Date: Tue, 10 Dec 2024 16:18:32 +0100
From: Stanislaw Gruszka <stanislaw.gruszka@...ux.intel.com>
To: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Cc: Jani Nikula <jani.nikula@...ux.intel.com>,
Genes Lists <lists@...ience.com>,
Sakari Ailus <sakari.ailus@...ux.intel.com>,
linux-kernel@...r.kernel.org, akpm@...ux-foundation.org,
torvalds@...ux-foundation.org, stable@...r.kernel.org,
linux-media@...r.kernel.org, bingbu.cao@...el.com,
Rodrigo Vivi <rodrigo.vivi@...el.com>,
Joonas Lahtinen <joonas.lahtinen@...ux.intel.com>,
Tvrtko Ursulin <tursulin@...ulin.net>,
David Airlie <airlied@...il.com>, Simona Vetter <simona@...ll.ch>,
intel-gfx@...ts.freedesktop.org, intel-xe@...ts.freedesktop.org,
dri-devel@...ts.freedesktop.org
Subject: Re: Linux 6.12.4 - crash dma_alloc_attrs+0x12b via ipu6
On Tue, Dec 10, 2024 at 01:37:11PM +0100, Greg Kroah-Hartman wrote:
> On Tue, Dec 10, 2024 at 02:24:56PM +0200, Jani Nikula wrote:
> > On Tue, 10 Dec 2024, Genes Lists <lists@...ience.com> wrote:
> > > On Tue, 2024-12-10 at 10:58 +0200, Jani Nikula wrote:
> > >> On Tue, 10 Dec 2024, Sakari Ailus <sakari.ailus@...ux.intel.com>
> > >> wrote:
> > >> > Hi,
> > >> >
> > >> > > ...
> > >> > > FYI 6.12.4 got a crash shortly after booting in dma_alloc_attrs -
> > >> > > maybe
> > >> > > triggered in ipu6_probe. Crash only happened on laptop with ipu6.
> > >> > > All
> > >> > > other machines are running fine.
> > >> >
> > >> > Have you read the dmesg further than the IPU6 related warning? The
> > >> > IPU6
> > >> > driver won't work (maybe not even probe?) but if the system
> > >> > crashes, it
> > >> > appears unlikely the IPU6 drivers would have something to do with
> > >> > that.
> > >> > Look for warnings on linked list corruption later, they seem to be
> > >> > coming
> > >> > from the i915 driver.
> > >>
> > >> And the list corruption is actually happening in
> > >> cpu_latency_qos_update_request(). I don't see any i915 changes in
> > >> 6.12.4
> > >> that could cause it.
> > >>
> > >> I guess the question is, when did it work? Did 6.12.3 work?
> > >>
> > >>
> > >> BR,
> > >> Jani.
> > >
> > >
> > > - 6.12.1 worked
> > >
> > > - mainline - works (but only with i915 patch set [1] otherwise there
> > > are no graphics at all)
> > >
> > > [1] https://patchwork.freedesktop.org/series/141911/
> > >
> > > - 6.12.3 - crashed (i see i915 not ipu6) and again it has
> > > cpu_latency_qos_update_request+0x61/0xc0
> >
> > Thanks for testing.
> >
> > There are no changes to either i915 or kernel/power between 6.12.1 and
> > 6.12.4.
> >
> > There are some changes to drm core, but none that could explain this.
> >
> > Maybe try the same kernels a few more times to see if it's really
> > deterministic? Not that I have obvious ideas where to go from there, but
> > it's a clue nonetheless.
>
> 'git bisect' would be nice to run if possible...
I've reproduced the issue. It's caused by 6.12.y commit:
commit 6ac269abab9ca5ae910deb2d3ca54351c3467e99
Author: Bingbu Cao <bingbu.cao@...el.com>
Date: Wed Oct 16 15:53:01 2024 +0800
media: ipu6: not override the dma_ops of device in driver
[ Upstream commit daabc5c64703432c4a8798421a3588c2c142c51b ]
It makes alloc_fw_msg_bufs() fail on isys_probe()
cpu_latency_qos_add_request(&isys->pm_qos, PM_QOS_DEFAULT_VALUE);
ret = alloc_fw_msg_bufs(isys, 20);
if (ret < 0)
goto out_remove_pkg_dir_shared_buffer;
And on error path we do not call cpu_latency_qos_remove_request()
what cause pm_qos_request list corruption (it is memory use
after free bug).
The problem will disappear after applying:
https://lore.kernel.org/stable/20241209175416.59433-1-stanislaw.gruszka@linux.intel.com/
since the allocation will not longer fail.
But we also need to handle fail case correctly by adding
cpu_latency_qos_remove_request() on error path. This requires
mainline fix, I'll post it.
Regards
Stanislaw
Powered by blists - more mailing lists