linux-kernel - Re: [PATCH v2 2/5] lib: add error_report_notify to collect debugging tools' reports

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YAVzNrBIlp0UJKfs@alley>
Date:   Mon, 18 Jan 2021 12:38:30 +0100
From:   Petr Mladek <pmladek@...e.com>
To:     Alexander Potapenko <glider@...gle.com>
Cc:     linux-kernel@...r.kernel.org, akpm@...ux-foundation.org,
        andreyknvl@...gle.com, dvyukov@...gle.com, mingo@...hat.com,
        elver@...gle.com, rostedt@...dmis.org,
        sergey.senozhatsky@...il.com, linux-mm@...ck.org
Subject: Re: [PATCH v2 2/5] lib: add error_report_notify to collect debugging
 tools' reports

On Fri 2021-01-15 14:03:33, Alexander Potapenko wrote:
> With the introduction of various production error-detection tools, such as
> MTE-based KASAN and KFENCE, the need arises to efficiently notify the
> userspace OS components about kernel errors. Currently, no facility exists
> to notify userspace about a kernel error from such bug-detection tools.
> The problem is obviously not restricted to the above bug detection tools,
> and applies to any error reporting mechanism that does not panic the
> kernel; this series, however, will only add support for KASAN and KFENCE
> reporting.
> 
> +++ b/lib/error_report_notify.c
> +/*
> + * Trace hook for the error_report_start event. In an unlikely case of another
> + * task already printing a report bail out, otherwise save the current pid
> + * together with in_task() return value.

This is not reliable. Some events might get lost.

> + * Because reporting code can be called from low-level routines (e.g. locking
> + * primitives or allocator guts), report recording is implemented using a
> + * seqlock lock-free algorithm.
> + */
> +static void probe_report_start(void *ignore, enum error_detector detector,
> +			       unsigned long id)
> +{
> +	/*
> +	 * Acquire the writer lock. Any racing probe_report_start will not
> +	 * record anything. Pairs with the release in probe_report_end().
> +	 */
> +	if (atomic_cmpxchg_acquire(&current_pid, -1, get_encoded_pid()) != -1)
> +		return;
> +	current_pos = 0;
> +	truncated = false;
> +}
> +
> +/*
> + * Trace hook for the error_report_end event. If an event from the mismatching
> + * error_report_start is received, it is ignored. Otherwise, null-terminate the
> + * buffer, increase the report count (effectively releasing the report to
> + * last_report_show() and schedule a notification about a new report.
> + */
> +static void probe_report_end(void *ignore, enum error_detector detector,
> +			     unsigned long id)
> +{
> +	pid_t pid = atomic_read(&current_pid);
> +	int idx;
> +
> +	if (pid != get_encoded_pid())
> +		return;
> +
> +	idx = (atomic_read(&num_reports) + 1) % 2;
> +	if (current_pos == BUF_SIZE)
> +		report_buffer[idx][current_pos - 1] = 0;
> +	else
> +		report_buffer[idx][current_pos] = 0;
> +
> +	/* Pairs with acquire in last_report_show(). */
> +	atomic_inc_return_release(&num_reports);
> +	schedule_delayed_work(&reporting_done, 0);

Why delayed work when it gets queued immediately?

> +	/*
> +	 * Release the writer lock. Pairs with the acquire in
> +	 * probe_report_start().
> +	 */
> +	atomic_set_release(&current_pid, -1);
> +}
> +
> +/*
> + * Skip one or two leading pair of brackets containing the log timestamp and
> + * the task/CPU ID, plus the leading space, from the report line, e.g.:
> + *   [    0.698431][    T7] BUG: KFENCE: use-after-free ...
> + * becomes:
> + *   BUG: KFENCE: use-after-free ...
> + *
> + * Report size is only 4K, and this boilerplate can easily account for half of
> + * that amount.
> + */
> +static void skip_extra_info(const char **buf, size_t *len)
> +{
> +	int num_brackets = IS_ENABLED(CONFIG_PRINTK_TIME) +
> +			   IS_ENABLED(CONFIG_PRINTK_CALLER);

The timestamp can be disabled also at runtime by
/sys/module/printk/parameters/time

> +	const char *found;
> +
> +	if (!buf || !len)
> +		return;
> +
> +	while (num_brackets--) {
> +		if (!*len || *buf[0] != '[')
> +			return;
> +		found = strnchr(*buf, *len, ']');
> +		if (!found)
> +			return;
> +		*len -= found - *buf + 1;
> +		*buf = found + 1;
> +	}
> +	if (*len && *buf[0] == ' ') {
> +		++*buf;
> +		--*len;
> +	}
> +}
> +
> +/*
> + * Trace hook for the console event. If a line comes from a task/CPU that did
> + * not send the error_report_start event, that line is ignored. Otherwise, it
> + * is stored in the report_buffer[(num_reports + 1) % 2].
> + *
> + * To save space, the leading timestamps and (when enabled) CPU/task info is
> + * stripped away. The buffer may contain newlines, so this procedure is
> + * repeated for every line.

IMHO, removing the timestamp is a bad idea. It will complicate
matching the message with other events.

I am afraid that some reports would get shrunken anyway.
I suggest to use some lockless ringbuffer to store these
messages. Either ftrace or printk one.

But the biggest problem would be that console might be called
asynchronously from another process. The messages might get
lost in the meantime.

There are many other more reliable solutions:

Either add hook into vprintk_store() and make another copy of the
message into your buffer. But there will still be the problem
that you could not store more reports in parallel. So, it won't
be reliable anyway.

Or add a hook into /dev/kmsg interface. It allows to read any
stored message immediately and quickly. Well, you would need
to have your own reader of this interface.

Or add your own reader of the main printk log.

Or give up on having your own buffers. Instead just inform userspace
about a new report via the pooled interface. And get the messages
from the main log, e.g. via /dev/kmsg.


I agree with Andrew that this might be _very_ dangerous feature.
It is another way how to get messages from kernel. It looks like
a hack how to get certain messages from the main kernel log
into a separate log. This might end up with implementing
yet another printk() or trace_printk().

I suggest to integrate this better with printk():

   + We already have printk_context() that allows to handle
     messages a special way.

   + Just store extra flag into the main log buffer for these
     messages.

   + Allow to read the messages via some interface, either
     /dev/kmsg or a new one. We already store PID and context
     info into the main log buffer. It will allow to put
     relevant lines together.

   + Anyway, I strongly suggest to avoid your own buffer. It looks
     like an overhead. It it would get more complicated when
     there are more users, ...


> + */
> +static void probe_console(void *ignore, const char *buf, size_t len)
> +{
> +	int pid = atomic_read(&current_pid);
> +	size_t to_copy, cur_len;
> +	char *newline;
> +	int idx;
> +
> +	if (pid != get_encoded_pid() || truncated)
> +		return;
> +
> +	idx = (atomic_read(&num_reports) + 1) % 2;
> +	while (len) {
> +		newline = strnchr(buf, len, '\n');
> +		if (newline)
> +			cur_len = newline - buf + 1;
> +		else
> +			cur_len = len;
> +		/* Adjust len now, because skip_extra_info() may change cur_len. */
> +		len -= cur_len;
> +		skip_extra_info(&buf, &cur_len);
> +		to_copy = min(cur_len, BUF_SIZE - current_pos);
> +		memcpy(report_buffer[idx] + current_pos, buf, to_copy);
> +		current_pos += to_copy;
> +		if (cur_len > to_copy) {
> +			truncated = true;
> +			memcpy(report_buffer[idx] + current_pos - sizeof(TRUNC_MSG),
> +			       TRUNC_MSG, sizeof(TRUNC_MSG));
> +			break;
> +		}
> +		buf += cur_len;
> +	}
> +}

I still have to think about it. This feature heavily depends on
printk(). It makes sense to integrate it there. But we also
need to keep it simple because printk() is already pretty
complicated.

Best Regards,
Petr