linux-kernel - Re: recursion handling: Re: [PATCH next v2 3/3] printk: remove logbuf_lock, add syslog

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87sg8imwx5.fsf@jogness.linutronix.de>
Date:   Sun, 06 Dec 2020 22:50:54 +0106
From:   John Ogness <john.ogness@...utronix.de>
To:     Petr Mladek <pmladek@...e.com>
Cc:     Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
        Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        linux-kernel@...r.kernel.org
Subject: Re: recursion handling: Re: [PATCH next v2 3/3] printk: remove logbuf_lock, add syslog_lock

On 2020-12-04, Petr Mladek <pmladek@...e.com> wrote:
> On Tue 2020-12-01 21:59:41, John Ogness wrote:
>> Since the ringbuffer is lockless, there is no need for it to be
>> protected by @logbuf_lock. Remove @logbuf_lock.
>> 
>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -1847,6 +1811,65 @@ static void call_console_drivers(const char *ext_text, size_t ext_len,
>>  	}
>>  }
>
> The recursion-related code needs some explanation or we should do it
> another way. I spent quite some time on it and I am still not sure
> that I understand it.

Sorry. :-/

> Let me describe how I understand it.
>
>> +#ifdef CONFIG_PRINTK_NMI
>> +#define NUM_RECURSION_CTX 2
>> +#else
>> +#define NUM_RECURSION_CTX 1
>> +#endif
>
> OK, the number of context is limited because interrupts are disabled
> inside print_enter()/printk_exit(). It is basically the same reason
> why we have only two printk_safe buffers (NNI + other contexts).

Correct.

> What is the exact reason to disable interrupts around the entire
> vprintk_store(), please? It should get documented.

It simplifies the context tracking. Also, in mainline interrupts are
already disabled for all of vprintk_store(). AFAIK latencies due to
logbuf_lock contention were not an issue.

> One reason is the use of per-cpu variables. Alternative solution would
> be to store printk_context into task_struct. Well, I am not sure if
> "current" task is available during early boot. But it might solve
> problems with per-cpu variables that are not working during early
> boot.
>
> That said, I am not sure if it is worth it.

I really don't want to touch task_struct. IMHO the usefulness of that
struct is limited, considering that printk can be called from scheduling
and interrupting contexts.

>> +
>> +struct printk_recursion {
>> +	char	count[NUM_RECURSION_CTX];
>> +};
>>
>> +static DEFINE_PER_CPU(struct printk_recursion, percpu_printk_recursion);
>> +static char printk_recursion_count[NUM_RECURSION_CTX];
>
> This is pretty confusing. The array is hidden in a struct when per-cpu
> variables are used. And a naked array is used for early boot.
>
> Is the structure really needed? What about?
>
> static DEFINE_PER_CPU(char [PRINTK_CTX_NUM], printk_count);
> static char printk_count_early[NUM_RECURSION_CTX];

OK.

>> +
>> +static char *get_printk_count(void)
>> +{
>> +	struct printk_recursion *rec;
>> +	char *count;
>> +
>> +	if (!printk_percpu_data_ready()) {
>> +		count = &printk_recursion_count[0];
>
> I see why you avoided per-cpu variables for early boot. I am just
> curious how printk_context variable works these days. It is used by
> any printk(), including early code, see vprintk_func().

IMO printk_context is serving a different purpose. With the existance of
logbuf_lock, printk_context exists for the sole purpose of making sure
logbuf_lock is not taken recursively or that the CPU does not spin on it
in NMI context. printk_context is simply gating calls to the safe
buffers.

For the lockless ringbuffer, there is no issue of taking a lock
recursively or dangers from NMI. There is no need for the printk_context
"gate". However, IMHO there is a real danger if a bug in printk (or its
ringbuffer) lead to infinite recursion. This new recursion counter is
offering safety against this scenario. Until now this scenario has been
ignored. So I suppose if we are comfortable with continuing to ignore
the scenario, then we don't need to track the recursion level.

To test, I triggered artificial WARNs in vsnprintf() of printk code. I
found it nice to be able to see the stack trace going into printk and at
the same time I was relieved that such a nested warning was not blasting
the system into infinite recursion.

>> +	} else {
>> +		rec = this_cpu_ptr(&percpu_printk_recursion);
>> +
>> +		count = &rec->count[0];
>> +	}
>> +
>> +#ifdef CONFIG_PRINTK_NMI
>> +	if (in_nmi())
>> +		count++;
>> +#endif
>
> This is extremely confusing. It is far from obvious that
> the pointer and not the value is incremented.
>
> If we really need this to avoid per-cpu variables during early boot
> then a more clear implementation would be:
>
> char *get_printk_counter_by_ctx()
> {
> 	int ctx = 0;
>
> 	if (in_nmi)
> 		ctx = 1;
>
> 	if (!printk_percpu_data_ready())
> 		return &printk_count_early[ctx];
>
> 	return this_cpu_ptr(printk_count[ctx]);
> }

Yes, much cleaner. Thank you.

>> +
>> +	return count;
>> +}
>> +
>> +static bool printk_enter(unsigned long *flags)
>> +{
>> +	char *count;
>> +
>> +	local_irq_save(*flags);
>> +	count = get_printk_count();
>> +	/* Only 1 level of recursion allowed. */
>
> We should allow at least some level of recursion. Otherwise, we would
> not see warnings printed from vsprintf code.

With 1 level, you will see warnings from vsprintf code. I'm not sure it
makes sense to allow more than 1 level. It causes exponential logging.

>> +	if (*count > 1) {
>> +		local_irq_restore(*flags);
>> +		return false;
>> +	}
>> +	(*count)++;
>> +
>> +	return true;
>> +}
>
> This should be unified with printk_context, printk_nmi_enter(),
> printk_nmi_exit(). It does not make sense to have two separate
> printk context counters.
>
> Or is there any plan to remove printk_safe and printk_context?

Yes, I plan to remove the safe buffers, which also removes printk_safe.c
and the printk_context "gate".

> BTW: I prefer to use the bitmask approach. It allows to check
> the normal context by a single operation (no bit is set).
> There is no need to go through all counters.

OK.

> Note that we need at least one more context for gdb.

Ah yes, thank you.

John Ogness