linux-kernel - Re: 回复: [PATCH] pid: add handling of too many zombie processes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <Y+o6HKUY/OzThJe/@dhcp22.suse.cz>
Date:   Mon, 13 Feb 2023 14:24:44 +0100
From:   Michal Hocko <mhocko@...e.com>
To:     huyd12@...natelecom.cn
Cc:     liuq131@...natelecom.cn, akpm@...ux-foundation.org,
        agruenba@...hat.com, 'Christian Brauner' <christian@...uner.io>,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: 回复: [PATCH] pid: add handling
 of too many zombie processes

On Thu 09-02-23 15:14:57, huyd12@...natelecom.cn wrote:
> 
> Any comments will be appreciated.
> 
> 
> 
> -----邮件原件-----
> 发件人: liuq131@...natelecom.cn <liuq131@...natelecom.cn> 
> 发送时间: 2023年2月8日 17:49
> 收件人: akpm@...ux-foundation.org
> 抄送: agruenba@...hat.com; linux-mm@...ck.org; linux-kernel@...r.kernel.org;
> huyd12@...natelecom.cn; liuq <liuq131@...natelecom.cn>
> 主题: [PATCH] pid: add handling of too many zombie processes
> 
> There is a common situation that a parent process forks many child processes
> to execute tasks, but the parent process does not execute wait/waitpid when
> the child process exits, resulting in a large number of child processes
> becoming zombie processes.
> 
> At this time, if the number of processes in the system out of
> kernel.pid_max, the new fork syscall will fail, and the system will not be
> able to execute any command at this time (unless an old process exits)
> 
> eg:
> [root@...workstation ~]# ls
> -bash: fork: retry: Resource temporarily unavailable
> -bash: fork: retry: Resource temporarily unavailable
> -bash: fork: retry: Resource temporarily unavailable
> -bash: fork: retry: Resource temporarily unavailable
> -bash: fork: Resource temporarily unavailable [root@...workstation ~]#
> reboot
> -bash: fork: retry: Resource temporarily unavailable
> -bash: fork: retry: Resource temporarily unavailable
> -bash: fork: retry: Resource temporarily unavailable
> -bash: fork: retry: Resource temporarily unavailable
> -bash: fork: Resource temporarily unavailable
> 
> I dealt with this situation in the alloc_pid function, and found a process
> with the most zombie subprocesses, and more than 10(or other reasonable
> values?) zombie subprocesses, so I tried to kill this process to release the
> pid resources.

Abusing oom_kill_process is not the right approach. Also any hard coded limit
fir the number of zombies can turn out to be really tricky and it can
cause regressions.

Is there any reason you cannot contain those misbehaving workloads in a
pid controller?
-- 
Michal Hocko
SUSE Labs