linux-kernel - Re: CONFIG_PREEMPT_RT local_softirq

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGt3f4k0d5LRwuec1gpm=+NW325OskPxZbuTroEBSO9d1MMZaQ@mail.gmail.com>
Date:	Mon, 9 Mar 2015 20:36:27 -0400
From:	Brian Silverman <brian@...oton-tech.com>
To:	Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc:	linux-kernel@...r.kernel.org, linux-rt-users@...r.kernel.org,
	Austin Schuh <austin@...oton-tech.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...e.hu>
Subject: Re: CONFIG_PREEMPT_RT local_softirq_pending warning when ISR blocks

On Mon, Mar 9, 2015 at 12:08 PM, Sebastian Andrzej Siewior
<bigeasy@...utronix.de> wrote:
> * Brian Silverman | 2015-03-05 00:16:20 [-0500]:
>
>>Beforehand, 000 is spending most of its time in interrupts, but bash
>>is doing something related to memory management on it in between.
>>            bash-14721 [000] ......1  6854.629126: rt_spin_lock <-free_pcppages_bulk
>>            bash-14721 [000] ....1.1  6854.629126: mm_page_pcpu_drain: page=ffffea000fa1aa40 pfn=4097705 order=0 migratetype=1
>>            bash-14721 [000] ......1  6854.629127: get_pfnblock_flags_mask <-free_pcppages_bulk
>>            bash-14721 [000] ......1  6854.629127: __mod_zone_page_state <-free_pcppages_bulk
>>            bash-14721 [000] ....1.1  6854.629127: mm_page_pcpu_drain: page=ffffea000f572ac0 pfn=4021419 order=0 migratetype=0
>>            bash-14721 [000] ......1  6854.629128: get_pfnblock_flags_mask <-free_pcppages_bulk
>>            bash-14721 [000] ......1  6854.629128: __mod_zone_page_state <-free_pcppages_bulk
>>... # lots more virtually identical repetitions of those last 3 lines
>>            bash-14721 [000] ....1.1  6854.629139: mm_page_pcpu_drain: page=ffffea000f481a80 pfn=4005994 order=0 migratetype=1
>>            bash-14721 [000] ......1  6854.629139: get_pfnblock_flags_mask <-free_pcppages_bulk
> You free memory and hold the zone->lock
>
>>            bash-14721 [000] d.....1  6854.629139: do_IRQ <-ret_from_intr
>>            bash-14721 [000] d.....1  6854.629139: irq_enter <-do_IRQ
>>... # wakes up the can1 ISR thread on 001 and the can0 one on 000
>>(same physical IRQ line)
>>            bash-14721 [000] d...3.1  6854.629261: sched_switch: prev_comm=bash prev_pid=14721 prev_prio=120 prev_state=R+ ==> next_comm=irq/18-can0 next_pid=2015 next_prio=28
>
> I would assume that this one raises NET_RX softirq. But at the bottom
> you have the irq handler on the other CPU which confuses me…

There wasn't actually any traffic on can0 for this test, so it didn't.
The can0 ISR only makes a few reads/writes to the device, never
calling netif_rx.

The can1 handler (which actually raises a NET_RX softirq) runs on 001
because it's pinned there.

>
>>... # runs the can0 ISR
>>     irq/18-can0-2015  [000] d...3..  6854.629283: sched_switch: prev_comm=irq/18-can0 prev_pid=2015 prev_prio=28 prev_state=S ==> next_comm=ksoftirqd/0 next_pid=3 next_prio=98
>>...
>>     ksoftirqd/0-3     [000] ....1.1  6854.629291: softirq_entry: vec=1 [action=TIMER]
>>...
>>     ksoftirqd/0-3     [000] ....1.1  6854.629293: softirq_exit: vec=1 [action=TIMER]
> only the timer since nobody raised NET_RX

Correct. I included that as context for what 000 spent its time doing
with the lock held, not implying that it's incorrect.

>
>>...
>>     ksoftirqd/0-3     [000] .......  6854.629298: schedule <-smpboot_thread_fn ...
>>     ksoftirqd/0-3     [000] d...3..  6854.629304: sched_switch: prev_comm=ksoftirqd/0 prev_pid=3 prev_prio=98 prev_state=S ==> next_comm=bash next_pid=14721 next_prio=28
>>...
>>            bash-14721 [000] d...1.1  6854.629308: smp_trace_reschedule_interrupt <-trace_reschedule_interrupt
>># Actually unnecessary schedule IPI from 001?
>>            bash-14721 [000] d...1.1  6854.629309: irq_enter <-smp_trace_reschedule_interrupt
>>...
>>            bash-14721 [000] ....1.1  6854.629316: __tick_nohz_task_switch <-__schedule
>>            bash-14721 [000] ......1  6854.629316: __mod_zone_page_state <-free_pcppages_bulk
>>            bash-14721 [000] ....1.1  6854.629317: mm_page_pcpu_drain: page=ffffea000f57a900 pfn=4021924 order=0 migratetype=0
>>            bash-14721 [000] ......1  6854.629317: get_pfnblock_flags_mask <-free_pcppages_bulk
>>            bash-14721 [000] ......1  6854.629317: __mod_zone_page_state <-free_pcppages_bulk
> and it continues cleaning up memory.
>
>>... # more of this like it was doing before
>>I don't see it unlocking the problematic mutex before the trace ends.
>>
>>The relevant part for 001 starts while it's running the can1 ISR.
>>     irq/18-can1-7228  [001] ....1.1  6854.629275: netif_rx: dev=can1 skbaddr=ffff880412d8fc00 len=16
>>     irq/18-can1-7228  [001] ......1  6854.629275: migrate_disable <-netif_rx_internal
>>     irq/18-can1-7228  [001] ......2  6854.629275: enqueue_to_backlog <-netif_rx_internal
>
> enqueue_to_backlog() looks like packet reception so this should be
> handled in napi so I assume we run in NET_RX softirq

enqueue_to_backlog is the napi function that actually raises the NET_RX softirq.

>>     irq/18-can1-7228  [001] d.....2  6854.629276: _raw_spin_lock <-enqueue_to_backlog
>>     irq/18-can1-7228  [001] d...1.2  6854.629276: __raise_softirq_irqoff <-enqueue_to_backlog
>>     irq/18-can1-7228  [001] d...1.2  6854.629276: do_raise_softirq_irqoff <-__raise_softirq_irqoff
>>     irq/18-can1-7228  [001] d...2.2  6854.629276: softirq_raise: vec=3 [action=NET_RX]
This is where the softirq gets raised.

>>... # continues handling the can1 interrupt
>>     irq/18-can1-7228  [001] ......6  6854.629286: rt_spin_lock <-get_page_from_freelist
>>     irq/18-can1-7228  [001] ......6  6854.629287: rt_spin_lock_slowlock <-get_page_from_freelist
>>     irq/18-can1-7228  [001] ......6  6854.629287: _raw_spin_lock <-rt_spin_lock_slowlock
>>     irq/18-can1-7228  [001] ....1.6  6854.629287: __try_to_take_rt_mutex <-rt_spin_lock_slowlock
>>     irq/18-can1-7228  [001] ....1.6  6854.629287: _raw_spin_lock_irq <-rt_spin_lock_slowlock
>>     irq/18-can1-7228  [001] d...2.6  6854.629288: _raw_spin_unlock_irq <-rt_spin_lock_slowlock
>>     irq/18-can1-7228  [001] ....1.6  6854.629288: task_blocks_on_rt_mutex <-rt_spin_lock_slowlock
>
> it might be zone->lock it goes after. It boosts the bash process which
> seems to free memory so it would make sense.
>
>>... # rt_mutex/scheduling stuff
>>     irq/18-can1-7228  [001] d...4.6  6854.629291: sched_pi_setprio: comm=bash pid=14721 oldprio=120 newprio=28
>>... # more scheduler stuff
>>     irq/18-can1-7228  [001] d...3.6  6854.629299: native_smp_send_reschedule <-rt_mutex_setprio
>>... # more scheduler stuff
>>     irq/18-can1-7228  [001] d...2.6  6854.629307: pick_next_task_fair <-__schedule
>>     irq/18-can1-7228  [001] d...2.6  6854.629307: pick_next_task_stop <-__schedule
>>     irq/18-can1-7228  [001] d...2.6  6854.629307: pick_next_task_dl <-__schedule
>>     irq/18-can1-7228  [001] d...2.6  6854.629307: pick_next_task_rt <-__schedule
>>     irq/18-can1-7228  [001] d...2.6  6854.629307: pick_next_task_fair <-__schedule
>>     irq/18-can1-7228  [001] d...2.6  6854.629308: pick_next_task_idle <-__schedule
>>     irq/18-can1-7228  [001] d...3.6  6854.629308: sched_switch: prev_comm=irq/18-can1 prev_pid=7228 prev_prio=28 prev_state=D ==>
>>next_comm=swapper/1 next_pid=0 next_prio=120
>>...
>>          <idle>-0     [001] d...1..  6854.629319: softirq_check_pending_idle <-tick_nohz_idle_enter
>>My tracing_off() call is in softirq_check_pending_idle, so that's it.
>
> It looks like your softirq for net_rx is getting a packet and then after
> raising NET_RX (again?) it blocks on a lock. In order to get this lock
> it boosts and schedules bash. It gets runable but on the other CPU. On
> CPU1 there is nothig going is nothing going and the only runable task is
> the idle thread. And this is probably where the warning is written
> because we go to idle while we should process a softirq instead.

That sounds like the issue. Doing the softirq instead of going idle in
this situation seems like it means calling thread_do_softirq() from
__schedule, but I don't know where the right place is. Can anybody
give me some help on where exactly to check for softirqs from?

Thanks,
Brian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/