linux-kernel - Re: [linus:master] [sched/eevdf] 2227a957e1: BUG:kernel_NULL_pointer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ae7ce03d-0938-44b9-a2b5-74842016f32b@antgroup.com>
Date: Wed, 31 Jan 2024 21:14:45 +0800
From: "Tiwei Bie" <tiwei.btw@...group.com>
To: Abel Wu <wuyun.abel@...edance.com>,
 kernel test robot <oliver.sang@...el.com>
Cc: oe-lkp@...ts.linux.dev, lkp@...el.com, linux-kernel@...r.kernel.org,
 Peter Zijlstra <peterz@...radead.org>, aubrey.li@...ux.intel.com,
 yu.c.chen@...el.com
Subject: Re: [linus:master] [sched/eevdf] 2227a957e1:
 BUG:kernel_NULL_pointer_dereference,address

On 1/31/24 8:28 PM, Abel Wu wrote:
> On 1/31/24 8:10 PM, Tiwei Bie Wrote:
>> On 1/30/24 6:13 PM, Abel Wu wrote:
>>> On 1/30/24 3:24 PM, kernel test robot Wrote:
>>>>
>>>> [  512.079810][ T8305] BUG: kernel NULL pointer dereference, address: 0000002c
>>>> [  512.080897][ T8305] #PF: supervisor read access in kernel mode
>>>> [  512.081636][ T8305] #PF: error_code(0x0000) - not-present page
>>>> [  512.082337][ T8305] *pde = 00000000
>>>> [  512.082829][ T8305] Oops: 0000 [#1] PREEMPT SMP
>>>> [  512.083407][ T8305] CPU: 1 PID: 8305 Comm: watchdog Tainted: G        W        N 6.7.0-rc1-00006-g2227a957e1d5 #1 819e6d1a8b887f5f97adb4aed77d98b15504c836
>>>> [  512.084986][ T8305] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
>>>> [ 512.086203][ T8305] EIP: set_next_entity (fair.c:?)
>>>
>>> There was actually a NULL-test in pick_eevdf() before this commit,
>>> but I removed it by intent as I found it impossible to be NULL after
>>> examining 'all' the cases.
>>>
>>> Also cc Tiwei who once proposed to add this check back.
>>> https://lore.kernel.org/all/20231208112100.18141-1-tiwei.btw@antgroup.com/
>>
>> Thanks for cc'ing me. That's the case I worried about and why I thought
>> it might be worthwhile to add the sanity check back. I just sent out a
>> new version of the above patch with updated commit log and error message.
> 
> I assuming the real problem is why it *can* be NULL at first place.
> IMHO the NULL check with a fallback selection doesn't solve this, but
> it indeed avoids kernel panic which is absolutely important.

I totally agree. The scheduling failure is unexpected and should be
addressed. And the sanity check is just to log the failures and avoid
unnecessary crashes in such situations. 

Regards,
Tiwei