linux-kernel - Re: [PATCH net-next] net: link_watch: prevent starvation when processing linkwatch wq

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Mon, 27 May 2019 18:17:44 -0700
From:   Stephen Hemminger <stephen@...workplumber.org>
To:     Yunsheng Lin <linyunsheng@...wei.com>
Cc:     <davem@...emloft.net>, <hkallweit1@...il.com>,
        <f.fainelli@...il.com>, <netdev@...r.kernel.org>,
        <linux-kernel@...r.kernel.org>, <linuxarm@...wei.com>
Subject: Re: [PATCH net-next] net: link_watch: prevent starvation when
 processing linkwatch wq

On Tue, 28 May 2019 09:04:18 +0800
Yunsheng Lin <linyunsheng@...wei.com> wrote:

> On 2019/5/27 22:58, Stephen Hemminger wrote:
> > On Mon, 27 May 2019 09:47:54 +0800
> > Yunsheng Lin <linyunsheng@...wei.com> wrote:
> >   
> >> When user has configured a large number of virtual netdev, such
> >> as 4K vlans, the carrier on/off operation of the real netdev
> >> will also cause it's virtual netdev's link state to be processed
> >> in linkwatch. Currently, the processing is done in a work queue,
> >> which may cause worker starvation problem for other work queue.
> >>
> >> This patch releases the cpu when link watch worker has processed
> >> a fixed number of netdev' link watch event, and schedule the
> >> work queue again when there is still link watch event remaining.
> >>
> >> Signed-off-by: Yunsheng Lin <linyunsheng@...wei.com>  
> > 
> > Why not put link watch in its own workqueue so it is scheduled
> > separately from the system workqueue?  
> 
> From testing and debuging, the workqueue runs on the cpu where the
> workqueue is schedule when using normal workqueue, even using its
> own workqueue instead of system workqueue. So if the cpu is busy
> processing the linkwatch event, it is not able to process other
> workqueue' work when the workqueue is scheduled on the same cpu.
> 
> Using unbound workqueue may solve the cpu starvation problem.
> But the __linkwatch_run_queue is called with rtnl_lock, so if it
> takes a lot time to process, other need to take the rtnl_lock may
> not be able to move forward.

Agree with the starvation issue. My cocern is that large number of
events that end up being delayed would impact things that are actually
watching for link events (like routing daemons).

It probably would be not accepted to do rtnl_unlock/sched_yield/rtnl_lock
in the loop, but that is another alternative.