linux-kernel - Re: [PATCH 1/1] binfmt_elf, coredump: Log the reason of the failed core dumps

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c4644f2c-fad3-4d98-8301-acdc0ff2f3a6@linux.microsoft.com>
Date: Tue, 18 Jun 2024 09:30:22 -0700
From: Roman Kisel <romank@...ux.microsoft.com>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc: akpm@...ux-foundation.org, apais@...ux.microsoft.com, ardb@...nel.org,
 brauner@...nel.org, ebiederm@...ssion.com, jack@...e.cz,
 keescook@...omium.org, linux-fsdevel@...r.kernel.org,
 linux-kernel@...r.kernel.org, linux-mm@...ck.org, nagvijay@...rosoft.com,
 oleg@...hat.com, tandersen@...flix.com, vincent.whitchurch@...s.com,
 viro@...iv.linux.org.uk, apais@...rosoft.com, ssengar@...rosoft.com,
 sunilmut@...rosoft.com, vdso@...bites.dev
Subject: Re: [PATCH 1/1] binfmt_elf, coredump: Log the reason of the failed
 core dumps



On 6/17/2024 11:18 PM, Sebastian Andrzej Siewior wrote:
> On 2024-06-17 16:41:30 [-0700], Roman Kisel wrote:
>> Missing, failed, or corrupted core dumps might impede crash
>> investigations. To improve reliability of that process and consequently
>> the programs themselves, one needs to trace the path from producing
>> a core dumpfile to analyzing it. That path starts from the core dump file
>> written to the disk by the kernel or to the standard input of a user
>> mode helper program to which the kernel streams the coredump contents.
>> There are cases where the kernel will interrupt writing the core out or
>> produce a truncated/not-well-formed core dump.
> 
> How much of this happened and how much of this is just "let me handle
> everything that could go wrong".
Some of that must be happening as there are truncated dump files. 
Haven't run the logging code at large scale yet with the systems being 
stressed a lot by the customer workloads to hit all edge cases. Sent the 
changes to the kernel mail list out of abundance of caution first, and 
being ecstatic about that: on the other thread Kees noticed I didn't use 
the ratelimited logging. That has absolutely made me day and whole week, 
just glowing :) Might've been a close call due to something in a crash loop.

I think it'd be fair to say that I am asking to please "let me handle 
(log) everything that could go wrong", ratelimited, as these error cases 
are present in the code, and logging can give a clue why the core dump 
collection didn't succeed and what one would need to explore to increase 
reliability of the system.

> The cases where it was interrupted without a hint probably deserve a
> note rather then leaving a half of coredump back.
Wholeheartedly agree!

> 
>> Signed-off-by: Roman Kisel <romank@...ux.microsoft.com>
>> diff --git a/fs/coredump.c b/fs/coredump.c
>> index a57a06b80f57..a7200c9024c6 100644
>> --- a/fs/coredump.c
>> +++ b/fs/coredump.c
>> @@ -777,9 +807,18 @@ void do_coredump(const kernel_siginfo_t *siginfo)
>>   		}
>>   		file_end_write(cprm.file);
>>   		free_vma_snapshot(&cprm);
>> +	} else {
>> +		pr_err("Core dump to |%s has been interrupted\n", cn.corename);
>> +		retval = -EAGAIN;
>> +		goto fail;
>>   	}
>> +	pr_info("Core dump to |%s: vma_count %d, vma_data_size %lu, written %lld bytes, pos %lld\n",
>> +		cn.corename, cprm.vma_count, cprm.vma_data_size, cprm.written, cprm.pos);
> 
> Probably too noisy in the default case. The offsets probably don't
> matter unless you debug.
Will make less noisy, thanks!

> 
>>   	if (ispipe && core_pipe_limit)
>>   		wait_for_dump_helpers(cprm.file);
>> +
>> +	retval = 0;
>> +
>>   close_fail:
>>   	if (cprm.file)
>>   		filp_close(cprm.file, NULL);
>> diff --git a/include/linux/coredump.h b/include/linux/coredump.h
>> index 0904ba010341..8b29be758a87 100644
>> --- a/include/linux/coredump.h
>> +++ b/include/linux/coredump.h
>> @@ -42,9 +42,9 @@ extern int dump_emit(struct coredump_params *cprm, const void *addr, int nr);
>>   extern int dump_align(struct coredump_params *cprm, int align);
>>   int dump_user_range(struct coredump_params *cprm, unsigned long start,
>>   		    unsigned long len);
>> -extern void do_coredump(const kernel_siginfo_t *siginfo);
>> +extern int do_coredump(const kernel_siginfo_t *siginfo);
>>   #else
>> -static inline void do_coredump(const kernel_siginfo_t *siginfo) {}
>> +static inline int do_coredump(const kernel_siginfo_t *siginfo) {}
> 
> This probably does not compile.
D'oh! It does compile, and somehow the warning didn't show up for me. 
Fortunately, you and the kernel robot noticed that one silly piece I 
wrote here. Thank you very much!

For the inclined reader, both C99 and C11 require just these two things 
of the "return" statement (6.8.6.4 The return statement/Constraints):

"A return statement with an expression shall not appear in a function 
whose return type is void. A return statement without an expression 
shall only appear in a function whose return type is void".

One can omit the "return" statement in which case a warning is emitted 
(by gcc), and instead of "ret", gcc emits "ud2" or "brk #0x1000" or 
"trap", etc. to cause a fault.

> 
>>   #endif
>>   
>>   #if defined(CONFIG_COREDUMP) && defined(CONFIG_SYSCTL)
>> diff --git a/kernel/signal.c b/kernel/signal.c
>> index 1f9dd41c04be..f2ecf29a994d 100644
>> --- a/kernel/signal.c
>> +++ b/kernel/signal.c
>> @@ -2675,6 +2675,7 @@ bool get_signal(struct ksignal *ksig)
>>   	struct sighand_struct *sighand = current->sighand;
>>   	struct signal_struct *signal = current->signal;
>>   	int signr;
>> +	int ret;
>>   
>>   	clear_notify_signal();
>>   	if (unlikely(task_work_pending(current)))
>> @@ -2891,7 +2892,9 @@ bool get_signal(struct ksignal *ksig)
>>   			 * first and our do_group_exit call below will use
>>   			 * that value and ignore the one we pass it.
>>   			 */
>> -			do_coredump(&ksig->info);
>> +			ret = do_coredump(&ksig->info);
>> +			if (ret)
>> +				pr_err("coredump has not been created, error %d\n", ret);
> 
> So you preserve the error code just for one additional note.
Couldn't see how not to do that and report the error code... Might move 
the declaration closer to the point of use, into the innermost 
enclosing/basic block. The C standard used by the kernel permits mixed 
declaration and code, yet not much of that seems to be actually used and 
I hesitated to do

		if (sig_kernel_coredump(signr)) {
			if (print_fatal_signals)
				print_fatal_signal(signr);
			proc_coredump_connector(current);
-			do_coredump(&ksig->info);
+			int ret = do_coredump(&ksig->info);
+			if (ret)
+				pr_err("coredump has not been created, error %d\n", ret);

Feel like moving the declaration inside that "if" statement if that 
looks better.

> 
>>   		}
>>   
>>   		/*
> 
> Sebastian

-- 
Thank you,
Roman