lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <242025e9a8c84f6b96ba3f180ea01be9@hihonor.com>
Date:   Fri, 24 Nov 2023 03:15:46 +0000
From:   gaoxu <gaoxu2@...onor.com>
To:     Michal Hocko <mhocko@...e.com>
CC:     Andrew Morton <akpm@...ux-foundation.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Suren Baghdasaryan <surenb@...gle.com>,
        yipengxiang <yipengxiang@...onor.com>
Subject: 回复: [PATCH] mm,oom_reaper: avoid run queue_oom_reaper if task is not oom

On Thu, 24 Nov 2023 08:51  Michal Hocko <mhocko@...e.com> wrote:
> On Wed 22-11-23 12:46:44, gaoxu wrote:
>> The function queue_oom_reaper tests and sets tsk->signal->oom_mm->flags.
>> However, it is necessary to check if 'tsk' is an OOM victim before 
>> executing 'queue_oom_reaper' because the variable may be NULL.
>> 
>> We encountered such an issue, and the log is as follows:
>> [3701:11_see]Out of memory: Killed process 3154 (system_server) 
>> total-vm:23662044kB, anon-rss:0kB, file-rss:0kB, shmem-rss:0kB,
>> UID:1000 pgtables:4056kB oom_score_adj:-900
>
>> [3701:11_see][RB/E]rb_sreason_str_set: sreason_str set null_pointer
>> [3701:11_see][RB/E]rb_sreason_str_set: sreason_str set unknown_addr
>
> What are these?
This is a log message that we added ourselves.

>> [3701:11_see]Unable to handle kernel NULL pointer dereference at 
>> virtual address 0000000000000328 [3701:11_see]user pgtable: 4k pages, 
>> 39-bit VAs, pgdp=00000000821de000 [3701:11_see][0000000000000328] 
>> pgd=0000000000000000,
>> p4d=0000000000000000,pud=0000000000000000
>> [3701:11_see]tracing off
>> [3701:11_see]Internal error: Oops: 96000005 [#1] PREEMPT SMP 
>> [3701:11_see]Call trace:
>> [3701:11_see] queue_oom_reaper+0x30/0x170
>
> Could you resolve this offset into the code line please?
Due to the additional code we added for log purposes, the line numbers may not correspond to the original Linux code.

static void queue_oom_reaper(struct task_struct *tsk)
{
	/* mm is already queued? */
	if (test_and_set_bit(MMF_OOM_REAP_QUEUED, &tsk->signal->oom_mm->flags)) //a null pointer exception occurred
		return;
...
}
>> [3701:11_see] __oom_kill_process+0x590/0x860 [3701:11_see] 
>> oom_kill_process+0x140/0x274 [3701:11_see] out_of_memory+0x2f4/0x54c 
>> [3701:11_see] __alloc_pages_slowpath+0x5d8/0xaac
>> [3701:11_see] __alloc_pages+0x774/0x800 [3701:11_see] 
>> wp_page_copy+0xc4/0x116c [3701:11_see] do_wp_page+0x4bc/0x6fc 
>> [3701:11_see] handle_pte_fault+0x98/0x2a8 [3701:11_see] 
>> __handle_mm_fault+0x368/0x700 [3701:11_see] 
>> do_handle_mm_fault+0x160/0x2cc [3701:11_see] do_page_fault+0x3e0/0x818 
>> [3701:11_see] do_mem_abort+0x68/0x17c [3701:11_see] el0_da+0x3c/0xa0 
>> [3701:11_see] el0t_64_sync_handler+0xc4/0xec [3701:11_see] 
>> el0t_64_sync+0x1b4/0x1b8 [3701:11_see]tracing off
>> 
>> Signed-off-by: Gao Xu <gaoxu2@...onor.com>
>> ---
>>  mm/oom_kill.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 9e6071fde..3754ab4b6 
>> 100644
>> --- a/mm/oom_kill.c
>> +++ b/mm/oom_kill.c
>> @@ -984,7 +984,7 @@ static void __oom_kill_process(struct task_struct *victim, const char *message)
>>  	}
>>  	rcu_read_unlock();
>>  
>> -	if (can_oom_reap)
>> +	if (can_oom_reap && tsk_is_oom_victim(victim))
>>  		queue_oom_reaper(victim);
>
> I do not understand. We always do send SIGKILL and call mark_oom_victim(victim); on victim task when reaching out here. How can tsk_is_oom_victim can ever be false?
This is a low-probability issue, as it only occurred once during the monkey testing.
I haven't been able to find the root cause either.

>>  
>>  	mmdrop(mm);
>> --
>> 2.17.1
>> 
>> 
>
>--
> Michal Hocko
> SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ