netdev - Re: [PATCH net-next 0/4] bonding work-queues, try

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <550181E6.2080100@redhat.com>
Date:	Thu, 12 Mar 2015 13:09:10 +0100
From:	Nikolay Aleksandrov <nikolay@...hat.com>
To:	Mahesh Bandewar <maheshb@...gle.com>,
	Jay Vosburgh <j.vosburgh@...il.com>,
	Andy Gospodarek <andy@...yhouse.net>,
	Veaceslav Falico <vfalico@...il.com>,
	David Miller <davem@...emloft.net>
CC:	Maciej Zenczykowski <maze@...gle.com>,
	netdev <netdev@...r.kernel.org>,
	Eric Dumazet <edumazet@...gle.com>
Subject: Re: [PATCH net-next 0/4] bonding work-queues, try_rtnl() & notifications

On 12/03/15 06:54, Mahesh Bandewar wrote:
> This patch series tries to address the issue discovered in various work-
> queues and the way these handlers deal with the RTNL. Especially for 
> notification handling. If RTNL can not be acquired, these handlers ignore
> sending notifications and just re-arm the timer. This could be very 
> problematic if the re-arm timer has larger value (e.g. in minutes).
> 
> 
> Mahesh Bandewar (4):
>   bonding: Handle notifications during work-queue processing gracefully
>   bonding: Do not ignore notifications for miimon-work-queue
>   bonding: Do not ignore notifications for AD-work-queue
>   bonding: Do not ignore notifications for ARP-work-queue
> 
>  drivers/net/bonding/bond_3ad.c  | 22 ++++++++++++---
>  drivers/net/bonding/bond_main.c | 62 ++++++++++++++++++++++++-----------------
>  include/net/bonding.h           | 22 +++++++++++++++
>  3 files changed, 77 insertions(+), 29 deletions(-)
> 

Hello Mahesh,
The current behaviour was chosen as a best-effort type because such change
could skew the monitoring/AD timers because you re-schedule every time you
cannot acquire rtnl and move their timers with 1 tick ahead which could lead 
to false link drops, state machines running late and so on.
Currently the monitoring/AD functions have priority over the notifications as in
we might miss a notification but we try not to miss a monitoring/AD event, thus I'd
suggest to solve this in a different manner. If you'd like to have the best of both
worlds i.e. not miss a monitoring event and also not miss any notifications you should
use a different strategy. One idea would be to use a new "work" which checks some
notification bits and sends various types of notifications, then the monitoring
works can just schedule it whenever they need to send notifications and can keep their
running times accurate. The new work should handle the rtnl side of things and re-schedule
itself if necessary, now that is only an idea from the top of my head I haven't checked
if it's feasible or if there isn't a better approach so I'd be open to other ideas. Also
if you do decide to go this way, you should be very careful with the order of work
cancellation in bond_close(). In fact, IIRC, this has already been done by the slave
array update that you introduced and also by the igmp resend work.

There's also a bug in bond_should_notify_peers(), as in it's not RCU safe
on its own, it's currently safe because its callers use it in RCU protected sections
but here you move it outside so it should be fixed first (take a look at how "slave" is used).

Nik
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html