[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <528587D0.5060105@redhat.com>
Date: Fri, 15 Nov 2013 10:32:48 +0800
From: Honggang LI <honli@...hat.com>
To: Venkat Venkatsubra <venkat.x.venkatsubra@...cle.com>,
Josh Hunt <joshhunt00@...il.com>
CC: David Miller <davem@...emloft.net>, jjolly@...e.com,
LKML <linux-kernel@...r.kernel.org>, netdev@...r.kernel.org
Subject: Re: [PATCH] rds: Error on offset mismatch if not loopback
On 11/14/2013 09:43 PM, Venkat Venkatsubra wrote:
>
> -----Original Message-----
> From: Honggang LI [mailto:honli@...hat.com]
> Sent: Wednesday, November 13, 2013 6:56 PM
> To: Josh Hunt; Venkat Venkatsubra
> Cc: David Miller; jjolly@...e.com; LKML; netdev@...r.kernel.org
> Subject: Re: [PATCH] rds: Error on offset mismatch if not loopback
>
> On 11/14/2013 01:40 AM, Josh Hunt wrote:
>> On Wed, Nov 13, 2013 at 9:16 AM, Venkat Venkatsubra
>> <venkat.x.venkatsubra@...cle.com> wrote:
>>> -----Original Message-----
>>> From: Josh Hunt [mailto:joshhunt00@...il.com]
>>> Sent: Tuesday, November 12, 2013 10:25 PM
>>> To: David Miller
>>> Cc: jjolly@...e.com; LKML; Venkat Venkatsubra; netdev@...r.kernel.org
>>> Subject: Re: [PATCH] rds: Error on offset mismatch if not loopback
>>>
>>> On Tue, Nov 12, 2013 at 10:22 PM, Josh Hunt <joshhunt00@...il.com> wrote:
>>>> On Sat, Sep 22, 2012 at 2:25 PM, David Miller <davem@...emloft.net> wrote:
>>>>> From: John Jolly <jjolly@...e.com>
>>>>> Date: Fri, 21 Sep 2012 15:32:40 -0600
>>>>>
>>>>>> Attempting an rds connection from the IP address of an IPoIB
>>>>>> interface to itself causes a kernel panic due to a BUG_ON() being triggered.
>>>>>> Making the test less strict allows rds-ping to work without
>>>>>> crashing the machine.
>>>>>>
>>>>>> A local unprivileged user could use this flaw to crash the system.
>>>>>>
>>>>>> Signed-off-by: John Jolly <jjolly@...e.com>
>>>>> Besides the questions being asked of you by Venkat Venkatsubra,
>>>>> this patch has another issue.
>>>>>
>>>>> It has been completely corrupted by your email client, it has
>>>>> turned all TAB characters into spaces, making the patch useless.
>>>>>
>>>>> Please learn how to send a patch unmolested in the body of your
>>>>> email. Test it by emailing the patch to yourself, and verifying
>>>>> that you can in fact apply the patch you receive in that email.
>>>>> Then, and only then, should you consider making a new submission of
>>>>> this patch.
>>>>>
>>>>> Use Documentation/email-clients.txt for guidance.
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>> linux-kernel" in the body of a message to majordomo@...r.kernel.org
>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>> Please read the FAQ at http://www.tux.org/lkml/
>>>> I think this issue was lost in the shuffle. It appears that redhat,
>>>> ubuntu, and oracle are maintaining local patches to resolve this:
>>>>
>>>> https://oss.oracle.com/git/?p=redpatch.git;a=commit;h=c7b6a0a1d8d636
>>>> 85
>>>> 2be130fa15fa8be10d4704e8
>>>> https://bugzilla.redhat.com/show_bug.cgi?id=822754
>>>> http://ubuntu.5.x6.nabble.com/CVE-2012-2372-RDS-local-ping-DOS-td498
>>>> 53
>>>> 88.html
>>>>
>>>> Given that Oracle has applied it I'll make the assumption that
>>>> Venkat's question was answered at some point.
>>>>
>>>> David - I can resubmit the patch with the proper signed-off-by and
>>>> formatting if you are willing to apply it unless John wants to try
>>>> again. I think it's time this got upstream.
>>>>
>>>> --
>>>> Josh
>>> Ugh.. hopefully resending with all the html crap removed...
>>>
>>> --
>>> Josh
>>>
>>> Hi Josh,
>>>
>>> No, I still didn't get an answer for how "off" could be non-zero in case of rds-ping to hit BUG_ON(off % RDS_FRAG_SIZE).
>>> Because, rds-ping uses zero byte messages to ping.
>>> If you have a test case that reproduces the kernel panic I can try it out and see how that can happen.
>>> The Oracle's internal code I checked doesn't have that patch applied.
>>>
>>> Venkat
>> No I don't have a test case. I came across this CVE while doing an
>> audit and noticed it was patched in Ubuntu's kernel and other distros,
>> but was not in the upstream kernel yet. Quick googling of lkml showed
>> that there were at least two attempts to get this patch upstream, but
>> both had issues due to not following the proper submission process:
>>
>> https://lkml.org/lkml/2012/10/22/433
>> https://lkml.org/lkml/2012/9/21/505
>>
>> From my searching it appears the initial bug was found by someone at redhat:
>> https://bugzilla.redhat.com/show_bug.cgi?id=822754
>>
>> I've added Li Honggang the reporter of this issue from Redhat to the
>> mail. Hopefully he can share his testcase.
> The test case is very simple:
> Steps to Reproduce:
> 1. yum install -y rds-tools
>
> 2. [root@...a3 ~]# ifconfig ib0 | grep 'inet addr'
> inet addr:172.31.0.3 Bcast:172.31.0.255 Mask:255.255.255.0
>
> 3. [root@...a3 ~]# /usr/bin/rds-ping 172.31.0.3 <<<< kernel panic (You may need to wait for a few seconds before the kernel panic.)
>> and possibly requires certain hardware as Jay writes in the first link above:
>> "...some Infiniband HCAs(QLogic, possibly others) the machine will panic..."
> This bug can be reproduced with Mellanox HCAs (mlx4_ib.ko and mthca.ko), QLogic HCA (ib_qib.ko). I did not test the QLogic HCA running "ib_ipath.ko".
>
> As I know the upstream code of RDS is broken. There are *many* RDS bugs.
>
> Best regards.
> Honggang
>> I was referring to this oracle commit:
>> https://oss.oracle.com/git/?p=redpatch.git;a=commit;h=c7b6a0a1d8d63685
>> 2be130fa15fa8be10d4704e8
>>
>> I have no experience with this code. There were a few comments around
>> the reset and xmit fns about making sure the caller did certain things
>> if not they were racy, but I have no idea if that's coming into play
>> here.
>>
> Hi Honggang,
>
> I ran rds-ping over local interface for 30 minutes. I stopped it after that.
> It didn't hit any panic.
>
> # ip addr show dev ib0
> 6: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc pfifo_fast qlen 1024
> link/infiniband 80:00:00:48:fe:80:00:00:00:00:00:00:00:21:28:00:01:cf:63:db brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
> inet 10.196.4.125/30 brd 10.196.4.127 scope global ib0
> inet6 fe80::221:2800:1cf:63db/64 scope link
> valid_lft forever preferred_lft forever
> #
>
> # rds-ping 10.196.4.125
> 1: 170 usec
> 2: 171 usec
> ....
> ....
> ....
> 1860: 173 usec
> 1861: 171 usec
> 1862: 177 usec
> 1863: 168 usec
> 1864: 171 usec
> 1865: 175 usec
> ^C#
>
> I tested with Oracle UEK2 which is based on 2.6.39 kernel. Mellanox IB adaptor.
> 19:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] (rev b0)
>
> There is something about your setup that must be causing it for you.
> Can I work with you offline if you are available ?
>
> The panic you are hitting is not making sense to me.
>
> Venkat
Hi, Venkat
It seems we are in different time zone. Please contact me via email if
you need I do something for this bug. Could you please try upstream
kernel 2.6.39. I confirmed that the bug can be reproduced with Mellanox
and QLogic HCA when running upstream kernel-2.6.39.
[root@...a01 ~]# ifconfig mlx4_ib1
Ifconfig uses the ioctl access method to get the full address
information, which limits hardware addresses to 8 bytes.
Because Infiniband address has 20 bytes, only the first 8 bytes are
displayed correctly.
Ifconfig is obsolete! For replacement check ip.
mlx4_ib1 Link encap:InfiniBand HWaddr
80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
inet addr:172.31.2.1 Bcast:172.31.2.255 Mask:255.255.255.0
inet6 addr: fe80::7ae7:d1ff:ff6b:b01/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:5 overruns:0 carrier:0
collisions:0 txqueuelen:256
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
[root@...a01 ~]# rpm -qf /usr/bin/rds-ping
rds-tools-2.0.6-3.el6.x86_64
[root@...a01 ~]# uname -a
Linux rdma01.rhts.eng.nay.redhat.com 2.6.39 #1 SMP Thu Nov 14 20:25:45
EST 2013 x86_64 x86_64 x86_64 GNU/Linux
[root@...a01 ~]# ibstat
CA 'mlx4_0'
CA type: MT26428
Number of ports: 2
Firmware version: 2.8.600
Hardware version: b0
Node GUID: 0x78e7d1ffff6b0b00
System image GUID: 0x78e7d1ffff6b0b03
Port 1:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 1
LMC: 0
SM lid: 4
Capability mask: 0x02510868
Port GUID: 0x78e7d1ffff6b0b01
Link layer: InfiniBand
Port 2:
State: Down
Physical state: Polling
Rate: 70
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x02510868
Port GUID: 0x78e7d1ffff6b0b02
Link layer: InfiniBand
[root@...a01 ~]# lspci | grep Mellanox
1f:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0
5GT/s - IB QDR / 10GigE] (rev b0)
[root@...a01 ~]# ssh 172.31.2.2 hostname (make sure the IPoIB
interface works)
rdma02.rhts.eng.nay.redhat.com
[root@...a01 ~]# ssh 172.31.2.1 hostname
rdma01.rhts.eng.nay.redhat.com
[root@...a01 ~]# /usr/bin/rds-ping 172.31.2.1 (kernel panic, please see
the attachment for console log)
View attachment "upstream-kernel-2.6.39-rds-ping-panic.log" of type "text/x-log" (6545 bytes)
Powered by blists - more mailing lists