linux-kernel - Re: [PATCH v12 0/3] printk: Make printk() completely async

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20160713142224.GI32373@pathway.suse.cz>
Date:	Wed, 13 Jul 2016 16:22:24 +0200
From:	Petr Mladek <pmladek@...e.com>
To:	Sergey Senozhatsky <sergey.senozhatsky@...il.com>
Cc:	Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Jan Kara <jack@...e.com>, Tejun Heo <tj@...nel.org>,
	Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
	linux-kernel@...r.kernel.org,
	Byungchul Park <byungchul.park@....com>
Subject: Re: [PATCH v12 0/3] printk: Make printk() completely async

On Wed 2016-07-13 22:24:38, Sergey Senozhatsky wrote:
> Hello,
> 
> On (07/13/16 13:14), Petr Mladek wrote:
> [..]
> > > > [ 4913.000310] item8750item8751item8752item8753item8754item8755item8756item8757item8758item8759
> > > > [ 4913.000327] item8760item8761item8762item8763item8764item8765item8766item8767item8768item8769
> > > > [ 4913.000342] item8770item8771item8772item8773item8774item8775item8776item8777item8778item8779
> > > > [ 4913.000356] item8780
> > > > [ 4913.000357] item8781
> > > > [ 4913.000358] item8782
> > > > [...]
> > > 
> > > hm.. so are there any 'concurrent' printk()-s coming from other CPUs
> > > that are not getting printed on the console (because of loglevel restrictions),
> > > but still screw up the cont buffer?.... otherwise, my expectation was that in
> > > this particular case cpu issues a new pr_cont() only after it has printed
> > > the current message via call_console_drivers()->write(). so partially flushed
> > > cont buffer was not really expected to happen. I was wrong, obviously.
> > 
> > To be honest. I am not 100% sure what happens here. One theory is
> > that the printk kthread is waken because of some previous
> > non-continuous message. It steals console_lock() and partially flushes
> > the cont buffer. In this case, the pr_cont() calls are not able to get
> > console_lock() and basically work in the async mode. The pr_cont()
> > calls have to store each piece sepatately because the partially
> > flushed cont buffer is blocked until flushed completely.
> > 
> > Hmm, the strange thing is that I see this problem even when I force
> > the global synch more by
> > 
> >     echo Y >  /sys/module/printk/parameters/synchronous
> 
> oh, even in sync mode. hm...

It seems that the rsyslogd daemon steps in. I have dumped all
processes that called console_cont_flush() and really flushed
something. It was mostly the process that triggered the test
for-cycle and called console_unlock() from every pr_cont
the syncronous way. But there was the following sinner
from time to time:

   rs:main Q:Reg-771   [001] d... 87486.753353: console_unlock:
 rs:main Q:Reg calling console_cont_flush
rs:main Q:Reg-771   [001] d... 87486.753360: <stack trace>
 => console_unlock
 => do_con_write.part.22
 => con_write
 => do_output_char
 => n_tty_write
 => tty_write
 => __vfs_write
 => vfs_write
 => SyS_write
 => entry_SYSCALL_64_fastpath

It means that some other process does console_unlock() and blocks
using the cont buffer.

I am not sure what is the exact purpose of this syscall. But
it seems reasonable that rsystemlog is active when there is
some activity in the kernel log buffer.


> > > just an idea.
> > > ... or try to make KERN_CONT SMP-safe. there are many pr_cont() call
> > > sites. ACPI is one notable example. the others include OOM, some cgroup
> > > related output (or... was it memcg), etc., etc.
> > > 
> > > so we *may be* can have a per-cpu cont buffer and add new API
> > > pr_cont_begin()/pr_cont_end(), that would disable preemption.
> > > 
> > > 
> > > +	pr_cont_begin() /* preempt_disable() */
> > > 
> > > 	for (.....)
> > > 		pr_cont("%pS ....);
> > > 
> > > +	pr_cont_end() /* preempt_enable() */
> > >
> > > pr_cont_end() also can flush this CPU's cont buffer and store the log
> > > line. this will probably break very long cont lines (not sure if we
> > > have any of those though). and may be flush_on_panic() would have to
> > > do some extra work iterating each cpu.
> > 
> > It would work but I am a bit scared by the complexity. I think
> > that we should find a compromise between complexity and
> > reliability.
> 
> I understand your concerns.
> but, realistically, KERN_CONT will still be SMP unsafe and will not do
> what people probably expect it to do: e.g. ACPI_ERROR or ACPI_anything.
> it does acpi_os_vprintf() -> printk(KERN_CONT).
> 
> the whole thing is already complicated, we already have all those len,
> cons etc. checks in various places, it also has that owner task, etc.
> 
> static struct cont {
> 	char buf[LOG_LINE_MAX];
> 	size_t len;			/* length == 0 means unused buffer */
> 	size_t cons;			/* bytes written to console */
> 	struct task_struct *owner;	/* task of first print*/
> 	u64 ts_nsec;			/* time of first print */
> 	u8 level;			/* log level of first message */
> 	u8 facility;			/* log facility of first message */
> 	enum log_flags flags;		/* prefix, newline flags */
> 	bool flushed:1;			/* buffer sealed and committed */
> } cont;
> 
> so I think, personally... it's sort of kind of not exactly what people
> need (aka 'broken'). so my idea is to forbid concurrent cont buffer usage.
> each CPU will have it's own buffer, and touch it with preemption_disabled.
> there will be no more partial flushes, cont buffer will be flushed when it's
> done -- via pr_cont_end()->log_store()->wake_up(printk_kthread). IOW it will
> just add the cont buffer content to log_buf and print it via console_unlock(),
> like the rest of the messages. no more console_cont_flush().

I think that people primary wants to have the messages stored in the
log buffer and visible on the console. Also they expect that printk
will be an easy interface.

Any more complicated code and more temporary buffers will increase
the risk that a message will get lost somewhere and it will be hard
to dig it from a crashdump, ...

I would be great to get a perfect log. But I am afraid that we would
need to redesign the printk code first to keep it maintainable.

I am still thinking if we could do better with the single cont
buffer.

BTW: I think that the LOG_CONT/LOG_PREFIX/LOG_NEWLINE informations
are stored correctly in the ring buffer. I think that dmesg
could do a better job when showing parts of the continuous messages
that are were stored separately.

Best Regards,
Petr