lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sat, 2 Apr 2016 04:14:59 +0300
From:	Leon Romanovsky <leon@...n.nu>
To:	santosh shilimkar <santosh.shilimkar@...cle.com>
Cc:	linux-rdma@...r.kernel.org,
	Wengang Wang <wen.gang.wang@...cle.com>, netdev@...r.kernel.org
Subject: Re: [PATCH] RDS: sync congestion map updating

On Fri, Apr 01, 2016 at 12:47:24PM -0700, santosh shilimkar wrote:
> (cc-ing netdev)
> On 3/30/2016 7:59 PM, Wengang Wang wrote:
> >
> >
> >在 2016年03月31日 09:51, Wengang Wang 写道:
> >>
> >>
> >>在 2016年03月31日 01:16, santosh shilimkar 写道:
> >>>Hi Wengang,
> >>>
> >>>On 3/30/2016 9:19 AM, Leon Romanovsky wrote:
> >>>>On Wed, Mar 30, 2016 at 05:08:22PM +0800, Wengang Wang wrote:
> >>>>>Problem is found that some among a lot of parallel RDS
> >>>>>communications hang.
> >>>>>In my test ten or so among 33 communications hang. The send
> >>>>>requests got
> >>>>>-ENOBUF error meaning the peer socket (port) is congested. But
> >>>>>meanwhile,
> >>>>>peer socket (port) is not congested.
> >>>>>
> >>>>>The congestion map updating can happen in two paths: one is in
> >>>>>rds_recvmsg path
> >>>>>and the other is when it receives packets from the hardware. There
> >>>>>is no
> >>>>>synchronization when updating the congestion map. So a bit
> >>>>>operation (clearing)
> >>>>>in the rds_recvmsg path can be skipped by another bit operation
> >>>>>(setting) in
> >>>>>hardware packet receving path.
> >>>>>
> >
> >To be more detailed.  Here, the two paths (user calls recvmsg and
> >hardware receives data) are for different rds socks. thus the
> >rds_sock->rs_recv_lock is not helpful to sync the updating on congestion
> >map.
> >
> For archive purpose, let me try to conclude the thread. I synced
> with Wengang offlist and came up with below fix. I was under
> impression that __set_bit_le() was atmoic version. After fixing
> it like patch(end of the email), the bug gets addressed.
> 
> I will probably send this as fix for stable as well.
> 
> 
> From 5614b61f6fdcd6ae0c04e50b97efd13201762294 Mon Sep 17 00:00:00 2001
> From: Santosh Shilimkar <santosh.shilimkar@...cle.com>
> Date: Wed, 30 Mar 2016 23:26:47 -0700
> Subject: [PATCH] RDS: Fix the atomicity for congestion map update
> 
> Two different threads with different rds sockets may be in
> rds_recv_rcvbuf_delta() via receive path. If their ports
> both map to the same word in the congestion map, then
> using non-atomic ops to update it could cause the map to
> be incorrect. Lets use atomics to avoid such an issue.
> 
> Full credit to Wengang <wen.gang.wang@...cle.com> for
> finding the issue, analysing it and also pointing out
> to offending code with spin lock based fix.

I'm glad that you solved the issue without spinlocks.
Out of curiosity, I see that this patch is needed to be sent
to Dave and applied by him. Is it right?

➜  linus-tree git:(master) ./scripts/get_maintainer.pl -f net/rds/cong.c
Santosh Shilimkar <santosh.shilimkar@...cle.com> (supporter:RDS -
RELIABLE DATAGRAM SOCKETS)
"David S. Miller" <davem@...emloft.net> (maintainer:NETWORKING
[GENERAL])
netdev@...r.kernel.org (open list:RDS - RELIABLE DATAGRAM SOCKETS)
linux-rdma@...r.kernel.org (open list:RDS - RELIABLE DATAGRAM SOCKETS)
rds-devel@....oracle.com (moderated list:RDS - RELIABLE DATAGRAM
SOCKETS)
linux-kernel@...r.kernel.org (open list)

> 
> Signed-off-by: Wengang Wang <wen.gang.wang@...cle.com>
> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@...cle.com>

Reviewed-by: Leon Romanovsky <leon@...n.nu>

Powered by blists - more mailing lists