[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <d85b210a-6388-41a3-9c97-35eee0603c99@linux.microsoft.com>
Date: Sun, 14 Jul 2024 07:33:53 -0700
From: Roman Kisel <romank@...ux.microsoft.com>
To: Kees Cook <kees@...nel.org>
Cc: akpm@...ux-foundation.org, apais@...ux.microsoft.com, ardb@...nel.org,
bigeasy@...utronix.de, brauner@...nel.org, ebiederm@...ssion.com,
jack@...e.cz, linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, nagvijay@...rosoft.com, oleg@...hat.com,
tandersen@...flix.com, vincent.whitchurch@...s.com, viro@...iv.linux.org.uk,
apais@...rosoft.com, benhill@...rosoft.com, ssengar@...rosoft.com,
sunilmut@...rosoft.com, vdso@...bites.dev
Subject: Re: [PATCH v2 1/1] binfmt_elf, coredump: Log the reason of the failed
core dumps
On 7/13/2024 9:31 AM, Kees Cook wrote:
> On Fri, Jul 12, 2024 at 02:50:56PM -0700, Roman Kisel wrote:
>> Missing, failed, or corrupted core dumps might impede crash
>> investigations. To improve reliability of that process and consequently
>> the programs themselves, one needs to trace the path from producing
>> a core dumpfile to analyzing it. That path starts from the core dump file
>> written to the disk by the kernel or to the standard input of a user
>> mode helper program to which the kernel streams the coredump contents.
>> There are cases where the kernel will interrupt writing the core out or
>> produce a truncated/not-well-formed core dump without leaving a note.
>>
>> Add logging for the core dump collection failure paths to be able to reason
>> what has gone wrong when the core dump is malformed or missing.
>>
>> Signed-off-by: Roman Kisel <romank@...ux.microsoft.com>
>> ---
>> fs/binfmt_elf.c | 60 ++++++++++++++++-----
>> fs/coredump.c | 109 ++++++++++++++++++++++++++++++++-------
>> include/linux/coredump.h | 8 ++-
>> kernel/signal.c | 22 +++++++-
>> 4 files changed, 165 insertions(+), 34 deletions(-)
>>
>> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
>> index a43897b03ce9..cfe84b9436af 100644
>> --- a/fs/binfmt_elf.c
>> +++ b/fs/binfmt_elf.c
>> @@ -1994,8 +1994,11 @@ static int elf_core_dump(struct coredump_params *cprm)
>> * Collect all the non-memory information about the process for the
>> * notes. This also sets up the file header.
>> */
>> - if (!fill_note_info(&elf, e_phnum, &info, cprm))
>> + if (!fill_note_info(&elf, e_phnum, &info, cprm)) {
>> + pr_err_ratelimited("Error collecting note info, core dump of %s(PID %d) failed\n",
>> + current->comm, current->pid);
>
> A couple things come to mind for me as I scanned through this:
>
> - Do we want to report pid or tgid?
> - Do we want to report the global value or the current pid namespace
> mapping?
>
> Because I notice that the existing code:
>
>> printk(KERN_WARNING "Pid %d(%s) over core_pipe_limit\n",
>> task_tgid_vnr(current), current->comm);
>
> Is reporting tgid for current's pid namespace. We should be consistent.
>
Thanks, will update the code to be consistent with the existing logging.
> I think all of this might need cleaning up first before adding new
> reports. We should consolidate the reporting into a single function so
> this is easier to extend in the future. Right now the proposed patch is
> hand-building the report, and putting pid/comm in different places (at
> the end, at the beginning, with/without "of", etc), which is really just
> boilerplate repetition.
100% agreed.
>
> How about creating:
>
> static void coredump_report_failure(const char *msg)
> {
> char comm[TASK_COMM_LEN];
>
> task_get_comm(current, comm);
>
> pr_warn_ratelimited("coredump: %d(%*pE): %s\n",
> task_tgid_vnr(current), strlen(comm), comm, msg);
> }
>
> Then in a new first patch, convert all the existing stuff:
>
> printk(KERN_WARNING ...)
> pr_info(...)
> etc
>
> Since even the existing warnings are inconsistent and don't escape
> newlines, etc. :)
>
> Then in patch 2 use this to add the new warnings?
Absolutely love that! Couldn't possibly appreciate your help more :)
>
--
Thank you,
Roman
Powered by blists - more mailing lists