[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aQs9yFQloF9aFCbA@rli9-mobl>
Date: Wed, 5 Nov 2025 20:06:32 +0800
From: Philip Li <philip.li@...el.com>
To: Peter Zijlstra <peterz@...radead.org>
CC: "Chen, Yu C" <yu.c.chen@...el.com>, kernel test robot
<oliver.sang@...el.com>, Fernand Sieber <sieberf@...zon.com>,
<oe-lkp@...ts.linux.dev>, <lkp@...el.com>, <linux-kernel@...r.kernel.org>,
<x86@...nel.org>, <aubrey.li@...ux.intel.com>
Subject: Re: [tip:sched/core] [sched/fair] 79104becf4:
BUG:kernel_NULL_pointer_dereference,address
On Wed, Nov 05, 2025 at 12:00:26PM +0100, Peter Zijlstra wrote:
> On Tue, Oct 28, 2025 at 10:30:08AM +0800, Chen, Yu C wrote:
> > On 10/27/2025 10:09 PM, Peter Zijlstra wrote:
> > > On Mon, Oct 27, 2025 at 03:07:18PM +0100, Peter Zijlstra wrote:
> > > > On Mon, Oct 27, 2025 at 02:55:16PM +0100, Peter Zijlstra wrote:
> > > >
> > > > > > May I know if you are using the kernel config 0day attached?
> > > > > > I found that the config 0day attached
> > > > > > (https://download.01.org/0day-ci/archive/20251021/202510211205.1e0f5223-lkp@intel.com/config-6.18.0-rc1-00001-g79104becf42b)
> > > > > > has
> > > > > > CONFIG_IA32_EMULATION=y
> > > > > > CONFIG_IA32_EMULATION_DEFAULT_DISABLED=y
> > > >
> > > > Yep, deleting that entry makes it all work.
> > >
> > > 'work' might be over stating, it boots and starts trinity, which then
> > > promptly (as in a handful of seconds) triggers OOM and dies. Not
> > > actually reproducing the NULL deref I was looking for.
> >
> > Change the following line in job-script
> > export memory='16G'
> > to
> > export memory='64G'
> > ?
>
> Yes, that seems to help.
>
> > I had a try and can reproduce the NULL except at first run:
>
> Took me two runs, but yes, I can see it now.
>
> Anyway, this is two bugs in the robot, can we please fix all this to not
> happen again?
Got it, I will dig into the detail to understand the difference of local
reproduce and internal cluster run. The image, kconfig, and memory
are exactly the same for actual robot run and provided reproduce instruction,
since the attachment is reproduced from the job execution. I didn't find the
cause quickly, and i will be back to this asap and provide update.
>
> - .config has 32bit disabled while robot provides 32bit images. Clearly
> the actual robot runs 64bit images and the reproduction should
> provide those too.
>
> - job description is inaccurate in the amount of memory required.
>
> The reproduction steps must exactly match what the real robot runs, not
> something else.
>
Powered by blists - more mailing lists