lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190527180743.GA20702@192-168-150-246.7~>
Date:   Tue, 28 May 2019 02:07:43 +0800
From:   Yao Liu <yotta.liu@...oud.cn>
To:     Josef Bacik <josef@...icpanda.com>
Cc:     Jens Axboe <axboe@...nel.dk>,
        linux-block <linux-block@...r.kernel.org>,
        nbd <nbd@...er.debian.org>,
        linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: Re: [PATCH 1/3] nbd: fix connection timed out error after
 reconnecting to server

On Fri, May 24, 2019 at 09:07:42AM -0400, Josef Bacik wrote:
> On Fri, May 24, 2019 at 05:43:54PM +0800, Yao Liu wrote:
> > Some I/O requests that have been sent succussfully but have not yet been
> > replied won't be resubmitted after reconnecting because of server restart,
> > so we add a list to track them.
> > 
> > Signed-off-by: Yao Liu <yotta.liu@...oud.cn>
> 
> Nack, this is what the timeout stuff is supposed to handle.  The commands will
> timeout and we'll resubmit them if we have alive sockets.  Thanks,
> 
> Josef
> 

On the one hand, if num_connections == 1 and the only sock has dead,
then we do nbd_genl_reconfigure to reconnect within dead_conn_timeout,
nbd_xmit_timeout will not resubmit commands that have been sent
succussfully but have not yet been replied. The log is as follows:
 
[270551.108746] block nbd0: Receive control failed (result -104)
[270551.108747] block nbd0: Send control failed (result -32)
[270551.108750] block nbd0: Request send failed, requeueing
[270551.116207] block nbd0: Attempted send on invalid socket
[270556.119584] block nbd0: reconnected socket
[270581.161751] block nbd0: Connection timed out
[270581.165038] block nbd0: shutting down sockets
[270581.165041] print_req_error: I/O error, dev nbd0, sector 5123224 flags 8801
[270581.165149] print_req_error: I/O error, dev nbd0, sector 5123232 flags 8801
[270581.165580] block nbd0: Connection timed out
[270581.165587] print_req_error: I/O error, dev nbd0, sector 844680 flags 8801
[270581.166184] print_req_error: I/O error, dev nbd0, sector 5123240 flags 8801
[270581.166554] block nbd0: Connection timed out
[270581.166576] print_req_error: I/O error, dev nbd0, sector 844688 flags 8801
[270581.167124] print_req_error: I/O error, dev nbd0, sector 5123248 flags 8801
[270581.167590] block nbd0: Connection timed out
[270581.167597] print_req_error: I/O error, dev nbd0, sector 844696 flags 8801
[270581.168021] print_req_error: I/O error, dev nbd0, sector 5123256 flags 8801
[270581.168487] block nbd0: Connection timed out
[270581.168493] print_req_error: I/O error, dev nbd0, sector 844704 flags 8801
[270581.170183] print_req_error: I/O error, dev nbd0, sector 5123264 flags 8801
[270581.170540] block nbd0: Connection timed out
[270581.173333] block nbd0: Connection timed out
[270581.173728] block nbd0: Connection timed out
[270581.174135] block nbd0: Connection timed out
 
On the other hand, if we wait nbd_xmit_timeout to handle resubmission,
the I/O requests will have a big delay. For example, if timeout time is 30s,
and from sock dead to nbd_genl_reconfigure returned OK we only spend
2s, the I/O requests will still be handled by nbd_xmit_timeout after 30s.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ