linux-kernel - Re: CONFIG_PREEMPT_RT local_softirq

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150309160845.GE13768@linutronix.de>
Date:	Mon, 9 Mar 2015 17:08:45 +0100
From:	Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To:	Brian Silverman <brian@...oton-tech.com>
Cc:	linux-kernel@...r.kernel.org, linux-rt-users@...r.kernel.org,
	Austin Schuh <austin@...oton-tech.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...e.hu>
Subject: Re: CONFIG_PREEMPT_RT local_softirq_pending warning when ISR blocks

* Brian Silverman | 2015-03-05 00:16:20 [-0500]:

>Beforehand, 000 is spending most of its time in interrupts, but bash
>is doing something related to memory management on it in between.
>            bash-14721 [000] ......1  6854.629126: rt_spin_lock <-free_pcppages_bulk
>            bash-14721 [000] ....1.1  6854.629126: mm_page_pcpu_drain: page=ffffea000fa1aa40 pfn=4097705 order=0 migratetype=1
>            bash-14721 [000] ......1  6854.629127: get_pfnblock_flags_mask <-free_pcppages_bulk
>            bash-14721 [000] ......1  6854.629127: __mod_zone_page_state <-free_pcppages_bulk
>            bash-14721 [000] ....1.1  6854.629127: mm_page_pcpu_drain: page=ffffea000f572ac0 pfn=4021419 order=0 migratetype=0
>            bash-14721 [000] ......1  6854.629128: get_pfnblock_flags_mask <-free_pcppages_bulk
>            bash-14721 [000] ......1  6854.629128: __mod_zone_page_state <-free_pcppages_bulk
>... # lots more virtually identical repetitions of those last 3 lines
>            bash-14721 [000] ....1.1  6854.629139: mm_page_pcpu_drain: page=ffffea000f481a80 pfn=4005994 order=0 migratetype=1
>            bash-14721 [000] ......1  6854.629139: get_pfnblock_flags_mask <-free_pcppages_bulk
You free memory and hold the zone->lock

>            bash-14721 [000] d.....1  6854.629139: do_IRQ <-ret_from_intr
>            bash-14721 [000] d.....1  6854.629139: irq_enter <-do_IRQ
>... # wakes up the can1 ISR thread on 001 and the can0 one on 000
>(same physical IRQ line)
>            bash-14721 [000] d...3.1  6854.629261: sched_switch: prev_comm=bash prev_pid=14721 prev_prio=120 prev_state=R+ ==> next_comm=irq/18-can0 next_pid=2015 next_prio=28

I would assume that this one raises NET_RX softirq. But at the bottom
you have the irq handler on the other CPU which confuses me…

>... # runs the can0 ISR
>     irq/18-can0-2015  [000] d...3..  6854.629283: sched_switch: prev_comm=irq/18-can0 prev_pid=2015 prev_prio=28 prev_state=S ==> next_comm=ksoftirqd/0 next_pid=3 next_prio=98
>...
>     ksoftirqd/0-3     [000] ....1.1  6854.629291: softirq_entry: vec=1 [action=TIMER]
>...
>     ksoftirqd/0-3     [000] ....1.1  6854.629293: softirq_exit: vec=1 [action=TIMER]
only the timer since nobody raised NET_RX

>...
>     ksoftirqd/0-3     [000] .......  6854.629298: schedule <-smpboot_thread_fn ...
>     ksoftirqd/0-3     [000] d...3..  6854.629304: sched_switch: prev_comm=ksoftirqd/0 prev_pid=3 prev_prio=98 prev_state=S ==> next_comm=bash next_pid=14721 next_prio=28
>...
>            bash-14721 [000] d...1.1  6854.629308: smp_trace_reschedule_interrupt <-trace_reschedule_interrupt
># Actually unnecessary schedule IPI from 001?
>            bash-14721 [000] d...1.1  6854.629309: irq_enter <-smp_trace_reschedule_interrupt
>...
>            bash-14721 [000] ....1.1  6854.629316: __tick_nohz_task_switch <-__schedule
>            bash-14721 [000] ......1  6854.629316: __mod_zone_page_state <-free_pcppages_bulk
>            bash-14721 [000] ....1.1  6854.629317: mm_page_pcpu_drain: page=ffffea000f57a900 pfn=4021924 order=0 migratetype=0
>            bash-14721 [000] ......1  6854.629317: get_pfnblock_flags_mask <-free_pcppages_bulk
>            bash-14721 [000] ......1  6854.629317: __mod_zone_page_state <-free_pcppages_bulk
and it continues cleaning up memory.

>... # more of this like it was doing before
>I don't see it unlocking the problematic mutex before the trace ends.
>
>The relevant part for 001 starts while it's running the can1 ISR.
>     irq/18-can1-7228  [001] ....1.1  6854.629275: netif_rx: dev=can1 skbaddr=ffff880412d8fc00 len=16
>     irq/18-can1-7228  [001] ......1  6854.629275: migrate_disable <-netif_rx_internal
>     irq/18-can1-7228  [001] ......2  6854.629275: enqueue_to_backlog <-netif_rx_internal

enqueue_to_backlog() looks like packet reception so this should be
handled in napi so I assume we run in NET_RX softirq

>     irq/18-can1-7228  [001] d.....2  6854.629276: _raw_spin_lock <-enqueue_to_backlog
>     irq/18-can1-7228  [001] d...1.2  6854.629276: __raise_softirq_irqoff <-enqueue_to_backlog
>     irq/18-can1-7228  [001] d...1.2  6854.629276: do_raise_softirq_irqoff <-__raise_softirq_irqoff
>     irq/18-can1-7228  [001] d...2.2  6854.629276: softirq_raise: vec=3 [action=NET_RX]
>... # continues handling the can1 interrupt
>     irq/18-can1-7228  [001] ......6  6854.629286: rt_spin_lock <-get_page_from_freelist
>     irq/18-can1-7228  [001] ......6  6854.629287: rt_spin_lock_slowlock <-get_page_from_freelist
>     irq/18-can1-7228  [001] ......6  6854.629287: _raw_spin_lock <-rt_spin_lock_slowlock
>     irq/18-can1-7228  [001] ....1.6  6854.629287: __try_to_take_rt_mutex <-rt_spin_lock_slowlock
>     irq/18-can1-7228  [001] ....1.6  6854.629287: _raw_spin_lock_irq <-rt_spin_lock_slowlock
>     irq/18-can1-7228  [001] d...2.6  6854.629288: _raw_spin_unlock_irq <-rt_spin_lock_slowlock
>     irq/18-can1-7228  [001] ....1.6  6854.629288: task_blocks_on_rt_mutex <-rt_spin_lock_slowlock

it might be zone->lock it goes after. It boosts the bash process which
seems to free memory so it would make sense.

>... # rt_mutex/scheduling stuff
>     irq/18-can1-7228  [001] d...4.6  6854.629291: sched_pi_setprio: comm=bash pid=14721 oldprio=120 newprio=28
>... # more scheduler stuff
>     irq/18-can1-7228  [001] d...3.6  6854.629299: native_smp_send_reschedule <-rt_mutex_setprio
>... # more scheduler stuff
>     irq/18-can1-7228  [001] d...2.6  6854.629307: pick_next_task_fair <-__schedule
>     irq/18-can1-7228  [001] d...2.6  6854.629307: pick_next_task_stop <-__schedule
>     irq/18-can1-7228  [001] d...2.6  6854.629307: pick_next_task_dl <-__schedule
>     irq/18-can1-7228  [001] d...2.6  6854.629307: pick_next_task_rt <-__schedule
>     irq/18-can1-7228  [001] d...2.6  6854.629307: pick_next_task_fair <-__schedule
>     irq/18-can1-7228  [001] d...2.6  6854.629308: pick_next_task_idle <-__schedule
>     irq/18-can1-7228  [001] d...3.6  6854.629308: sched_switch: prev_comm=irq/18-can1 prev_pid=7228 prev_prio=28 prev_state=D ==>
>next_comm=swapper/1 next_pid=0 next_prio=120
>...
>          <idle>-0     [001] d...1..  6854.629319: softirq_check_pending_idle <-tick_nohz_idle_enter
>My tracing_off() call is in softirq_check_pending_idle, so that's it.

It looks like your softirq for net_rx is getting a packet and then after
raising NET_RX (again?) it blocks on a lock. In order to get this lock
it boosts and schedules bash. It gets runable but on the other CPU. On
CPU1 there is nothig going is nothing going and the only runable task is
the idle thread. And this is probably where the warning is written
because we go to idle while we should process a softirq instead.

>Thanks,
>Brian Silverman

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/