linux-kernel - RE: [PATCH] CIFS: Decrease reconnection delay when switching nics

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <614F550557B82C44AC27C492ADA391AA045A5B27@TK5EX14MBXC284.redmond.corp.microsoft.com>
Date:	Thu, 28 Feb 2013 13:01:20 +0000
From:	Tom Talpey <ttalpey@...rosoft.com>
To:	"Stefan (metze) Metzmacher" <metze@...ba.org>,
	Jeff Layton <jlayton@...ba.org>
CC:	Steve French <sfrench@...ba.org>,
	Dave Chiluk <chiluk@...onical.com>,
	"samba-technical@...ts.samba.org" <samba-technical@...ts.samba.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-cifs@...r.kernel.org" <linux-cifs@...r.kernel.org>
Subject: RE: [PATCH] CIFS: Decrease reconnection delay when switching nics

> -----Original Message-----
> From: samba-technical-bounces@...ts.samba.org [mailto:samba-technical-
> bounces@...ts.samba.org] On Behalf Of Stefan (metze) Metzmacher
> Sent: Wednesday, February 27, 2013 7:16 PM
> To: Jeff Layton
> Cc: Steve French; Dave Chiluk; samba-technical@...ts.samba.org; linux-
> kernel@...r.kernel.org; linux-cifs@...r.kernel.org
> Subject: Re: [PATCH] CIFS: Decrease reconnection delay when switching nics
> 
> Am 27.02.2013 17:34, schrieb Jeff Layton:
> > On Wed, 27 Feb 2013 12:06:14 +0100
> > "Stefan (metze) Metzmacher" <metze@...ba.org> wrote:
> >
> >> Hi Dave,
> >>
> >>> When messages are currently in queue awaiting a response, decrease
> >>> amount of time before attempting cifs_reconnect to SMB_MAX_RTT =
> 10
> >>> seconds. The current wait time before attempting to reconnect is
> >>> currently 2*SMB_ECHO_INTERVAL(120
> >>> seconds) since the last response was recieved.  This does not take
> >>> into account the fact that messages waiting for a response should be
> >>> serviced within a reasonable round trip time.
> >>
> >> Wouldn't that mean that the client will disconnect a good connection,
> >> if the server doesn't response within 10 seconds?
> >> Reads and Writes can take longer than 10 seconds...
> >>
> >
> > Where does this magic value of 10s come from? Note that a slow server
> > can take *minutes* to respond to writes that are long past the EOF.
> >
> >>> This fixes the issue where user moves from wired to wireless or vice
> >>> versa causing the mount to hang for 120 seconds, when it could
> >>> reconnect considerably faster.  After this fix it will take
> >>> SMB_MAX_RTT (10 seconds) from the last time the user attempted to
> >>> access the volume or SMB_MAX_RTT after the last echo.  The worst
> >>> case of the latter scenario being
> 2*SMB_ECHO_INTERVAL+SMB_MAX_RTT+small scheduling delay (about 130
> seconds).
> >>> Statistically speaking it would normally reconnect sooner.  However
> >>> in the best case where the user changes nics, and immediately tries
> >>> to access the cifs share it will take SMB_MAX_RTT=10 seconds.
> >>
> >> I think it would be better to detect the broken connection by using
> >> an AF_NETLINK socket listening for RTM_DELADDR messages?
> >>
> >> metze
> >>
> >
> > Ick -- that sounds horrid ;)
> 
> This is what winbindd uses to detect that a source ip of outgoing connections
> are gone. I don't know much of the kernel, there might be a better way from
> within the kernel to detect this. But this is exactly the correct thing to do to
> failover to another interface, as it just happens when the ip is removed
> without messing with a timeout value.
> 
> Another optimization would be to use tcp keepalives (I think there 10
> seconds would be ok), I think that's what Windows SMB3 clients are using.

Yes, they do. See MS-SMB2 behavior note 144 attached to section 3.2.5.14.9.

10 seconds seems a fairly rapid keepalive interval. The TCP stack probably
won't allow it to be less than the maximum retransmit, for instance.

Tom.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/