linux-kernel - Re: [PATCH v2 1/1] binfmt_elf, coredump: Log the reason of the failed core dumps

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <202407130840.67879B31@keescook>
Date: Sat, 13 Jul 2024 09:31:43 -0700
From: Kees Cook <kees@...nel.org>
To: Roman Kisel <romank@...ux.microsoft.com>
Cc: akpm@...ux-foundation.org, apais@...ux.microsoft.com, ardb@...nel.org,
	bigeasy@...utronix.de, brauner@...nel.org, ebiederm@...ssion.com,
	jack@...e.cz, linux-fsdevel@...r.kernel.org,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	nagvijay@...rosoft.com, oleg@...hat.com, tandersen@...flix.com,
	vincent.whitchurch@...s.com, viro@...iv.linux.org.uk,
	apais@...rosoft.com, benhill@...rosoft.com, ssengar@...rosoft.com,
	sunilmut@...rosoft.com, vdso@...bites.dev
Subject: Re: [PATCH v2 1/1] binfmt_elf, coredump: Log the reason of the
 failed core dumps

On Fri, Jul 12, 2024 at 02:50:56PM -0700, Roman Kisel wrote:
> Missing, failed, or corrupted core dumps might impede crash
> investigations. To improve reliability of that process and consequently
> the programs themselves, one needs to trace the path from producing
> a core dumpfile to analyzing it. That path starts from the core dump file
> written to the disk by the kernel or to the standard input of a user
> mode helper program to which the kernel streams the coredump contents.
> There are cases where the kernel will interrupt writing the core out or
> produce a truncated/not-well-formed core dump without leaving a note.
> 
> Add logging for the core dump collection failure paths to be able to reason
> what has gone wrong when the core dump is malformed or missing.
> 
> Signed-off-by: Roman Kisel <romank@...ux.microsoft.com>
> ---
>  fs/binfmt_elf.c          |  60 ++++++++++++++++-----
>  fs/coredump.c            | 109 ++++++++++++++++++++++++++++++++-------
>  include/linux/coredump.h |   8 ++-
>  kernel/signal.c          |  22 +++++++-
>  4 files changed, 165 insertions(+), 34 deletions(-)
> 
> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> index a43897b03ce9..cfe84b9436af 100644
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
> @@ -1994,8 +1994,11 @@ static int elf_core_dump(struct coredump_params *cprm)
>  	 * Collect all the non-memory information about the process for the
>  	 * notes.  This also sets up the file header.
>  	 */
> -	if (!fill_note_info(&elf, e_phnum, &info, cprm))
> +	if (!fill_note_info(&elf, e_phnum, &info, cprm)) {
> +		pr_err_ratelimited("Error collecting note info, core dump of %s(PID %d) failed\n",
> +			current->comm, current->pid);

A couple things come to mind for me as I scanned through this:

- Do we want to report pid or tgid?
- Do we want to report the global value or the current pid namespace
  mapping?

Because I notice that the existing code:

>  			printk(KERN_WARNING "Pid %d(%s) over core_pipe_limit\n",
>  			       task_tgid_vnr(current), current->comm);

Is reporting tgid for current's pid namespace. We should be consistent.

I think all of this might need cleaning up first before adding new
reports. We should consolidate the reporting into a single function so
this is easier to extend in the future. Right now the proposed patch is
hand-building the report, and putting pid/comm in different places (at
the end, at the beginning, with/without "of", etc), which is really just
boilerplate repetition.

How about creating:

static void coredump_report_failure(const char *msg)
{
	char comm[TASK_COMM_LEN];

	task_get_comm(current, comm);

	pr_warn_ratelimited("coredump: %d(%*pE): %s\n",
			    task_tgid_vnr(current), strlen(comm), comm, msg);
}

Then in a new first patch, convert all the existing stuff:

	printk(KERN_WARNING ...)
	pr_info(...)
	etc

Since even the existing warnings are inconsistent and don't escape
newlines, etc. :)

Then in patch 2 use this to add the new warnings?

-- 
Kees Cook