lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Thu, 11 Jul 2019 17:33:03 +0200 From: Dietmar Eggemann <dietmar.eggemann@....com> To: Peter Zijlstra <peterz@...radead.org> Cc: luca abeni <luca.abeni@...tannapisa.it>, linux-kernel@...r.kernel.org, Greg Kroah-Hartman <gregkh@...uxfoundation.org>, "Rafael J . Wysocki" <rafael@...nel.org>, Ingo Molnar <mingo@...hat.com>, Vincent Guittot <vincent.guittot@...aro.org>, "Paul E . McKenney" <paulmck@...ux.ibm.com>, Joel Fernandes <joel@...lfernandes.org>, Quentin Perret <quentin.perret@....com>, Luc Van Oostenryck <luc.vanoostenryck@...il.com>, Morten Rasmussen <morten.rasmussen@....com>, Juri Lelli <juri.lelli@...hat.com>, Daniel Bristot de Oliveira <bristot@...hat.com>, Patrick Bellasi <patrick.bellasi@....com>, Tommaso Cucinotta <tommaso.cucinotta@...tannapisa.it> Subject: Re: [RFC PATCH 3/6] sched/dl: Try better placement even for deadline tasks that do not block On 7/11/19 2:00 PM, Peter Zijlstra wrote: > On Thu, Jul 11, 2019 at 01:17:17PM +0200, Dietmar Eggemann wrote: >> On 7/9/19 3:42 PM, Peter Zijlstra wrote: > >>>>> That is, we only do those callbacks from: >>>>> >>>>> schedule_tail() >>>>> __schedule() >>>>> rt_mutex_setprio() >>>>> __sched_setscheduler() >>>>> >>>>> and the above looks like it can happen outside of those. > >> Is this what you are concerned about? >> >> (2 Cpus (CPU1, CPU2), 4 deadline task (thread0-X)) with >> >> @@ -1137,6 +1137,13 @@ static inline void rq_pin_lock(struct rq *rq, struct rq_flags *rf) >> rf->cookie = lockdep_pin_lock(&rq->lock); >> >> #ifdef CONFIG_SCHED_DEBUG >> +#ifdef CONFIG_SMP >> + /* >> + * There should not be pending callbacks at the start of rq_lock(); >> + * all sites that handle them flush them at the end. >> + */ >> + WARN_ON_ONCE(rq->balance_callback); >> +#endif >> >> >> [ 87.251237] *** <--- queue_balance_callback(migrate_dl_task) p=[thread0-3 3627] on CPU2 >> [ 87.251261] WARNING: CPU: 2 PID: 3627 at kernel/sched/sched.h:1145 __schedule+0x56c/0x690 >> [ 87.615882] WARNING: CPU: 2 PID: 3616 at kernel/sched/sched.h:1145 task_rq_lock+0xe8/0xf0 >> [ 88.176844] WARNING: CPU: 2 PID: 3616 at kernel/sched/sched.h:1145 load_balance+0x4d0/0xbc0 >> [ 88.381905] WARNING: CPU: 2 PID: 3616 at kernel/sched/sched.h:1145 load_balance+0x7d8/0xbc0 > > I'm not sure how we get 4 warns, I was thinking that as soon as we exit > __schedule() we'd procress the callback so further warns would be > avoided. Reducing the warning to only fire on CPU1 I got another test-run: [ 6688.373607] *** <--- queue_balance_callback(migrate_dl_task) p=[thread0-3 4343] on CPU1 [ 6688.381557] WARNING: CPU: 1 PID: 4343 at kernel/sched/sched.h:1146 try_to_wake_up+0x614/0x788 ... [ 6688.505000] try_to_wake_up+0x614/0x788 [ 6688.508794] default_wake_function+0x34/0x48 [ 6688.513017] autoremove_wake_function+0x3c/0x68 [ 6688.517497] __wake_up_common+0x90/0x158 [ 6688.521374] __wake_up_common_lock+0x88/0xd0 [ 6688.525595] __wake_up+0x40/0x50 [ 6688.528787] wake_up_klogd_work_func+0x4c/0x88 [ 6688.533184] irq_work_run_list+0x8c/0xd8 [ 6688.537063] irq_work_tick+0x48/0x60 [ 6688.540598] update_process_times+0x44/0x60 [ 6688.544735] tick_sched_handle.isra.5+0x44/0x68 [ 6688.549215] tick_sched_timer+0x50/0xa0 [ 6688.553007] __hrtimer_run_queues+0x11c/0x3d0 [ 6688.557316] hrtimer_interrupt+0xd8/0x248 [ 6688.561282] arch_timer_handler_phys+0x38/0x58 [ 6688.565678] handle_percpu_devid_irq+0x90/0x2b8 [ 6688.570160] generic_handle_irq+0x34/0x50 [ 6688.574124] __handle_domain_irq+0x68/0xc0 [ 6688.578175] gic_handle_irq+0x60/0xb0 ... [ 6688.589909] WARNING: CPU: 1 PID: 4343 at kernel/sched/sched.h:1146 scheduler_tick+0xe8/0x128 ... [ 6688.714463] scheduler_tick+0xe8/0x128 [ 6688.718170] update_process_times+0x48/0x60 [ 6688.722306] tick_sched_handle.isra.5+0x44/0x68 [ 6688.726786] tick_sched_timer+0x50/0xa0 [ 6688.730579] __hrtimer_run_queues+0x11c/0x3d0 [ 6688.734887] hrtimer_interrupt+0xd8/0x248 [ 6688.738852] arch_timer_handler_phys+0x38/0x58 [ 6688.743246] handle_percpu_devid_irq+0x90/0x2b8 [ 6688.747727] generic_handle_irq+0x34/0x50 [ 6688.751692] __handle_domain_irq+0x68/0xc0 [ 6688.755741] gic_handle_irq+0x60/0xb0 ... [ 6688.767476] WARNING: CPU: 1 PID: 4343 at kernel/sched/sched.h:1146 task_rq_lock+0xc0/0x100 ... [ 6688.891511] task_rq_lock+0xc0/0x100 [ 6688.895046] dl_task_timer+0x48/0x2c8 [ 6688.898666] __hrtimer_run_queues+0x11c/0x3d0 [ 6688.902975] hrtimer_interrupt+0xd8/0x248 [ 6688.906939] arch_timer_handler_phys+0x38/0x58 [ 6688.911334] handle_percpu_devid_irq+0x90/0x2b8 [ 6688.915815] generic_handle_irq+0x34/0x50 [ 6688.919779] __handle_domain_irq+0x68/0xc0 [ 6688.923828] gic_handle_irq+0x60/0xb0 ... [ 6688.944618] WARNING: CPU: 1 PID: 4343 at kernel/sched/sched.h:1146 update_blocked_averages+0x84c/0x9a0 ... [ 6689.071664] update_blocked_averages+0x84c/0x9a0 [ 6689.076231] run_rebalance_domains+0x74/0xb0 [ 6689.080452] __do_softirq+0x154/0x3f0 [ 6689.084074] irq_exit+0xf0/0xf8 [ 6689.087178] __handle_domain_irq+0x6c/0xc0 [ 6689.091228] gic_handle_irq+0x60/0xb0 ... [ 6689.303143] WARNING: CPU: 1 PID: 4343 at kernel/sched/sched.h:1146 __schedule+0x478/0x698 ... [ 6689.440861] __schedule+0x478/0x698 [ 6689.444310] schedule+0x38/0xc0 [ 6689.447416] do_notify_resume+0x88/0x380 [ 6689.451294] work_pending+0x8/0x14 ... [ 6689.459256] *** ---> migrate_dl_task() p=[thread0-3 4343] to CPU-1 > >> [ 88.586991] *** ---> migrate_dl_task() p=[thread0-3 3627] to CPU1 > > But yes, something like this. Basucally I want to avoid calling > queue_balance_callback() from a context where we'll not follow up with > balance_callback(). Understood.
Powered by blists - more mailing lists