linux-kernel - Re: [LSF/MM/BPF TOPIC] tracing the source of errors

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20240207171600.GC6226@frogsfrogsfrogs>
Date: Wed, 7 Feb 2024 09:16:00 -0800
From: "Darrick J. Wong" <djwong@...nel.org>
To: Miklos Szeredi <miklos@...redi.hu>
Cc: lsf-pc <lsf-pc@...ts.linux-foundation.org>,
	linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org
Subject: Re: [LSF/MM/BPF TOPIC] tracing the source of errors

On Wed, Feb 07, 2024 at 10:54:34AM +0100, Miklos Szeredi wrote:
> [I'm not planning to attend LSF this year, but I thought this topic
> might be of interest to those who will.]
> 
> The errno thing is really ancient and yet quite usable.  But when
> trying to find out where a particular EINVAL is coming from, that's
> often mission impossible.
> 
> Would it make sense to add infrastructure to allow tracing the source
> of errors?  E.g.
> 
> strace --errno-trace ls -l foo
> ...
> statx(AT_FDCWD, "foo", ...) = -1 ENOENT [fs/namei.c:1852]
> ...
> 
> Don't know about others, but this issue comes up quite often for me.
> 
> I would implement this with macros that record the place where a
> particular error has originated, and some way to query the last one
> (which wouldn't be 100% accurate, but good enough I guess).

Hmmm, weren't Kent and Suren working on code tagging for memory
allocation profiling?  It would be kinda nice to wrap that up in the
error return paths as well.

Granted then we end up with some nasty macro mess like:

[Pretend that there's a struct errno_tag, DEFINE_ALLOC_TAG, and
__alloc_tag_add symbols that looks mostly like struct alloc_tag from [1]
and then (backslashes elided)]

#define Err(x)
({
	int __errno = (x);
	DEFINE_ERRNO_TAG(_errno_tag);

	trace_return_errno(__this_address, __errno)
	__errno_tag_add(&_errno_tag, __errno);
	__errno;
})

	foo = kmalloc(...);
	if (!foo)
		return Err(-ENOMEM);

or

	if (fs_is_messed_up())
		return Err(-EINVAL);

This would get us the ability to ftrace for where errno returns
initiate, as well as collect counters for how often we're actually
doing that in production.  You could even add time_stats too, but
annotating the entire kernel might be a stretch.

--D

[1] https://lwn.net/Articles/906660/

> Thanks,
> Miklos
>