[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <56FF4AFE.9080606@oracle.com>
Date: Fri, 1 Apr 2016 21:30:54 -0700
From: "santosh.shilimkar@...cle.com" <santosh.shilimkar@...cle.com>
To: linux-rdma@...r.kernel.org,
Wengang Wang <wen.gang.wang@...cle.com>, netdev@...r.kernel.org
Subject: Re: [PATCH] RDS: sync congestion map updating
On 4/1/16 6:14 PM, Leon Romanovsky wrote:
> On Fri, Apr 01, 2016 at 12:47:24PM -0700, santosh shilimkar wrote:
>> (cc-ing netdev)
>> On 3/30/2016 7:59 PM, Wengang Wang wrote:
>>>
>>>
>>> 在 2016年03月31日 09:51, Wengang Wang 写道:
>>>>
>>>>
>>>> 在 2016年03月31日 01:16, santosh shilimkar 写道:
>>>>> Hi Wengang,
>>>>>
>>>>> On 3/30/2016 9:19 AM, Leon Romanovsky wrote:
>>>>>> On Wed, Mar 30, 2016 at 05:08:22PM +0800, Wengang Wang wrote:
>>>>>>> Problem is found that some among a lot of parallel RDS
>>>>>>> communications hang.
>>>>>>> In my test ten or so among 33 communications hang. The send
>>>>>>> requests got
>>>>>>> -ENOBUF error meaning the peer socket (port) is congested. But
>>>>>>> meanwhile,
>>>>>>> peer socket (port) is not congested.
>>>>>>>
>>>>>>> The congestion map updating can happen in two paths: one is in
>>>>>>> rds_recvmsg path
>>>>>>> and the other is when it receives packets from the hardware. There
>>>>>>> is no
>>>>>>> synchronization when updating the congestion map. So a bit
>>>>>>> operation (clearing)
>>>>>>> in the rds_recvmsg path can be skipped by another bit operation
>>>>>>> (setting) in
>>>>>>> hardware packet receving path.
>>>>>>>
>>>
>>> To be more detailed. Here, the two paths (user calls recvmsg and
>>> hardware receives data) are for different rds socks. thus the
>>> rds_sock->rs_recv_lock is not helpful to sync the updating on congestion
>>> map.
>>>
>> For archive purpose, let me try to conclude the thread. I synced
>> with Wengang offlist and came up with below fix. I was under
>> impression that __set_bit_le() was atmoic version. After fixing
>> it like patch(end of the email), the bug gets addressed.
>>
>> I will probably send this as fix for stable as well.
>>
>>
>> From 5614b61f6fdcd6ae0c04e50b97efd13201762294 Mon Sep 17 00:00:00 2001
>> From: Santosh Shilimkar <santosh.shilimkar@...cle.com>
>> Date: Wed, 30 Mar 2016 23:26:47 -0700
>> Subject: [PATCH] RDS: Fix the atomicity for congestion map update
>>
>> Two different threads with different rds sockets may be in
>> rds_recv_rcvbuf_delta() via receive path. If their ports
>> both map to the same word in the congestion map, then
>> using non-atomic ops to update it could cause the map to
>> be incorrect. Lets use atomics to avoid such an issue.
>>
>> Full credit to Wengang <wen.gang.wang@...cle.com> for
>> finding the issue, analysing it and also pointing out
>> to offending code with spin lock based fix.
>
> I'm glad that you solved the issue without spinlocks.
> Out of curiosity, I see that this patch is needed to be sent
> to Dave and applied by him. Is it right?
>
Right. I was planning send this one along with one more fix
together on netdev for Dave to pick it up.
> ➜ linus-tree git:(master) ./scripts/get_maintainer.pl -f net/rds/cong.c
> Santosh Shilimkar <santosh.shilimkar@...cle.com> (supporter:RDS -
> RELIABLE DATAGRAM SOCKETS)
> "David S. Miller" <davem@...emloft.net> (maintainer:NETWORKING
> [GENERAL])
> netdev@...r.kernel.org (open list:RDS - RELIABLE DATAGRAM SOCKETS)
> linux-rdma@...r.kernel.org (open list:RDS - RELIABLE DATAGRAM SOCKETS)
> rds-devel@....oracle.com (moderated list:RDS - RELIABLE DATAGRAM
> SOCKETS)
> linux-kernel@...r.kernel.org (open list)
>
>>
>> Signed-off-by: Wengang Wang <wen.gang.wang@...cle.com>
>> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@...cle.com>
>
> Reviewed-by: Leon Romanovsky <leon@...n.nu>
>
Thanks for review.
Powered by blists - more mailing lists