netdev - Re: [syzbot] INFO: task hung in __lru_add_drain

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <87k0jua92f.ffs@tglx>
Date:   Mon, 06 Sep 2021 01:36:56 +0200
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Hillf Danton <hdanton@...a.com>,
        syzbot <syzbot+a9b681dcbc06eb2bca04@...kaller.appspotmail.com>
Cc:     linux-kernel@...r.kernel.org, netdev@...r.kernel.org,
        syzkaller-bugs@...glegroups.com, eric.dumazet@...il.com
Subject: Re: [syzbot] INFO: task hung in __lru_add_drain_all

Hillf,

On Fri, Sep 03 2021 at 19:10, Hillf Danton wrote:
>
> See if ksoftirqd is preventing bound workqueue work from running.

What?

> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -521,6 +521,7 @@ asmlinkage __visible void __softirq_entr
>  	bool in_hardirq;
>  	__u32 pending;
>  	int softirq_bit;
> +	bool is_ksoftirqd = __this_cpu_read(ksoftirqd) == current;
>  
>  	/*
>  	 * Mask out PF_MEMALLOC as the current task context is borrowed for the
> @@ -565,6 +566,8 @@ restart:
>  		}
>  		h++;
>  		pending >>= softirq_bit;
> +		if (is_ksoftirqd && in_task())

Can you please explain how this would ever be true?

 #define in_task()	(!(in_nmi() | in_hardirq() | in_serving_softirq()))

in_task() is guaranteed to be false here, because in_serving_softirq()
is guaranteed to be true simply because this is the softirq processing
context.

> +			cond_resched();

___do_softirq() returns after 2 msec of softirq processing whether it is
invoked on return from interrupt or in ksoftirqd context. On return from
interrupt this wakes ksoftirqd and returns. In ksoftirqd this is a
rescheduling point.

But that only works when the action handlers, e.g. net_rx_action(),
behave well and respect that limit as well.

net_rx_action() has it's own time limit: netdev_budget_usecs

That defaults to: 2 * USEC_PER_SEC / HZ 

The config has HZ=100, so this loop should terminate after

    2 * 1e6 / 100 = 20000us = 20ms

The provided C-reproducer does not change that default.

But again this loop can only terminate if napi_poll() and the
subsequently invoked callchain behaves well.

So instead of sending obviously useless "debug" patches, why are you not
grabbing the kernel config and the reproducer and figure out what the
root cause is?

Enable tracing, add some trace_printks and let ftrace_dump_on_oops spill
it out when the problem triggers. That will pinpoint the issue.

Thanks,

        tglx