[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171218103452.GB31274@jagdpanzerIV>
Date: Mon, 18 Dec 2017 19:36:24 +0900
From: Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
To: Petr Mladek <pmladek@...e.com>,
Steven Rostedt <rostedt@...dmis.org>
Cc: Tejun Heo <tj@...nel.org>,
Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
Jan Kara <jack@...e.cz>,
Andrew Morton <akpm@...ux-foundation.org>,
Peter Zijlstra <peterz@...radead.org>,
Rafael Wysocki <rjw@...ysocki.net>,
Pavel Machek <pavel@....cz>,
Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
linux-kernel@...r.kernel.org
Subject: Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
On (12/18/17 18:36), Sergey Senozhatsky wrote:
[..]
> I see *a lot* of cases when CPU that call printk in a loop does not
> end up flushing its messages. And the problem seems to be - preemption.
>
>
> CPU0 CPU1
>
> for_each_process_thread(g, p)
> printk()
> console_unlock() printk
> console_trylock() fails
> sets console_owner
> sees console_owner
> sets console_waiter
> call_console_drivers
> clears console_owner
> sees console_waiter
> hand off spins with local_irq disabled
> sees that it has acquired console_sem ownership
>
> enables local_irq
> printk
> .. << preemption >>
> printk
> ... unbound number of printk-s
> printk
> ..
> printk
> back to TASK_RUNNING
> goes to console_unlock()
> printk
> local_irq_save
>
> ???
> *** if it will see console_waiter [being in any
> context] it will hand off. otherwise, preemption
> again and CPU0 can add even more messages to logbuf
>
> local_irq_restore
>
> << preemption >>
hm... adding preemption_disable()/preemption_enable() to vprintk_emit()
does not fix the issues on my side.
preemption_disable();
...
if (console_trylock()) {
console_unlock();
} else {
// console_owner check and loop if needed
// console_unlock();
}
preemption_enable();
the root cause seems to be the fact that log_store() is significantly
faster than msg_print_text() + call_console_drivers().
looking at this
systemd-udevd-671 [002] dn.3 66.736432: offloading: set console_owner :0
systemd-udevd-671 [002] dn.3 66.749927: offloading: clear console_owner waiter != NULL :0
systemd-udevd-671 [002] dn.3 66.749931: offloading: set console_owner :0
systemd-udevd-671 [002] dn.3 66.763426: offloading: clear console_owner waiter != NULL :0
systemd-udevd-671 [002] dn.3 66.763430: offloading: set console_owner :0
systemd-udevd-671 [002] dn.3 66.776925: offloading: clear console_owner waiter != NULL :0
which is this thing
len += msg_print_text(msg, console_prev, false,
text + len, sizeof(text) - len);
console_idx = log_next(console_idx);
console_seq++;
console_prev = msg->flags;
raw_spin_unlock(&logbuf_lock);
/*
* While actively printing out messages, if another printk()
* were to occur on another CPU, it may wait for this one to
* finish. This task can not be preempted if there is a
* waiter waiting to take over.
*/
raw_spin_lock(&console_owner_lock);
console_owner = current;
raw_spin_unlock(&console_owner_lock);
+ trace_offloading("set console_owner", " ", 0);
/* The waiter may spin on us after setting console_owner */
spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_);
stop_critical_timings(); /* don't trace print latency */
call_console_drivers(level, text, len);
start_critical_timings();
raw_spin_lock(&console_owner_lock);
waiter = READ_ONCE(console_waiter);
console_owner = NULL;
raw_spin_unlock(&console_owner_lock);
+ trace_offloading("clear console_owner",
+ " waiter != NULL ", !!waiter);
it takes call_console_drivers() 0.01+ of a second to print some of
the messages [I think we can ignore raw_spin_lock(&console_owner_lock)
and fully blame call_console_drivers()]. so vprintk_emit() seems to be
gazillion times faster and i_do_printks can add tons of messages while
some other process prints just one.
to be more precise, I see from the traces that i_do_printks can add 1100
messages to the logbuf while call_console_drivers() prints just one.
systemd-udevd-671 owns the lock. sets the console_owner. i_do_printks
keeps adding printks. there kworker/0:1-135 that just ahead of
i_do_printks-1992 and registers itself as the console_sem waiter.
systemd-udevd-671 [003] d..3 66.334866: offloading: set console_owner :0
kworker/0:1-135 [000] d..2 66.335999: offloading: vprintk_emit()->trylock FAIL will spin? :1
i_do_printks-1992 [002] d..2 66.345474: offloading: vprintk_emit()->trylock FAIL will spin? :0 x 1100
...
systemd-udevd-671 [003] d..3 66.345917: offloading: cleaar console_owner waiter != NULL :1
i_do_printks-1992 finishes printing [it does limited number of printks],
it does not compete for console_sem anymore, so those are other tasks
that have to flush pending messages stored by i_do_printks-1992 :(
-ss
Powered by blists - more mailing lists