linux-kernel - Re: [RFC PATCH] piped/ptraced coredump (was: Dump smaller VMAs first in ELF cores)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <E3873B59-D80F-42E7-B571-DBE3A63A0C77@juniper.net>
Date: Mon, 5 Aug 2024 17:56:11 +0000
From: Brian Mak <makb@...iper.net>
To: Linus Torvalds <torvalds@...ux-foundation.org>
CC: Oleg Nesterov <oleg@...hat.com>,
        "Eric W. Biederman"
	<ebiederm@...ssion.com>,
        Kees Cook <kees@...nel.org>, Alexander Viro
	<viro@...iv.linux.org.uk>,
        Christian Brauner <brauner@...nel.org>, Jan Kara
	<jack@...e.cz>,
        "linux-fsdevel@...r.kernel.org"
	<linux-fsdevel@...r.kernel.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH] piped/ptraced coredump (was: Dump smaller VMAs first
 in ELF cores)

On Aug 4, 2024, at 10:47 AM, Linus Torvalds <torvalds@...ux-foundation.org> wrote:

> On Sun, 4 Aug 2024 at 08:23, Oleg Nesterov <oleg@...hat.com> wrote:
>> 
>> What do you think?
> 
> Eww. I really don't like giving the dumper ptrace rights.
> 
> I think the real limitations of the "dump to pipe" is that it's just
> being very stupid. Which is fine in the sense that core dumps aren't
> likely to be a huge priority. But if or when they _are_ a priority,
> it's not a great model.
> 
> So I prefer the original patch because it's also small, but it's
> conceptually much smaller.
> 
> That said, even that simplified v2 looks a bit excessive to me.
> 
> Does it really help so much to create a new array of core_vma_metadata
> pointers - could we not just sort those things in place?

Hi Linus,

Thanks for taking the time to reply.

Yep, I don't see any immediate reason for why we can't sort this in
place to begin with.

Thanks, Eric, for originally bringing this up. Will send out a v3 with
these edits.

> Also, honestly, if the issue is core dump truncation, at some point we
> should just support truncating individual mappings rather than the
> whole core file anyway. No?

Do you mean support truncating VMAs in addition to sorting or as a
replacement to sorting? If you mean in addition, then I agree, there may
be some VMAs that are known to not contain information critical to
debugging, but may aid, and therefore have less priority.

If you mean as a replacement to sorting, then we'd need to know exactly
which VMAs to keep/discard, which is a non-trivial task, as discussed in
v1 of my patch, and so it doesn't seem like a viable alternative.

> Depending on what the major issue is, we might also tweak the
> heuristics for which vmas get written out.
> 
> For example, I wouldn't be surprised if there's a fair number of "this
> read-only private file mapping gets written out because it has been
> written to" due to runtime linking. And I kind of suspect that in many
> cases that's not all that interesting.
> 
> Anyway, I assume that Brian had some specific problem case that
> triggered this all, and I'd like to know a bit more.

Yes, there were a couple problem cases that triggered the need for this
patch. I'll repeat what i said in v1 about this:

At Juniper, we have some daemons that can consume a lot of memory, where
upon crash, can result in core dumps of several GBs. While dumping,
we've encountered these two scenarios resulting in a unusable core:

1. Disk space is low at the time of core dump, resulting in a truncated
core once the disk is full.

2. A daemon has a TimeoutStopSec option configured in its systemd unit
file, where upon core dumping, could timeout (triggering a SIGKILL) if
the core dump is too large and is taking too long to dump.

In both scenarios, we see that the core file is already several GB, and
still does not contain the information necessary to form a backtrace,
thus creating the need for this change. In the second scenario, we are
unable to increase the timeout option due to our recovery time objective
requirements.

Best,
Brian Mak

>           Linus