linux-kernel - Re: selftests: net/af_unix test_unix

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <39ea7c85-c235-bc49-cd49-a2d7633eda4c@alu.unizg.hr>
Date:   Mon, 14 Aug 2023 10:54:56 +0200
From:   Mirsad Todorovac <mirsad.todorovac@....unizg.hr>
To:     Kuniyuki Iwashima <kuniyu@...zon.com>
Cc:     alexander@...alicyn.com, davem@...emloft.net, edumazet@...gle.com,
        fw@...len.de, kuba@...nel.org, linux-kernel@...r.kernel.org,
        linux-kselftest@...r.kernel.org, netdev@...r.kernel.org,
        pabeni@...hat.com, shuah@...nel.org
Subject: Re: selftests: net/af_unix test_unix_oob [FAILED]

On 8/8/23 10:53, Mirsad Todorovac wrote:
> On 8/8/23 01:09, Mirsad Todorovac wrote:
>> On 8/7/23 22:46, Kuniyuki Iwashima wrote:
>>> From: Mirsad Todorovac <mirsad.todorovac@....unizg.hr>
>>> Date: Mon, 7 Aug 2023 21:44:41 +0200
>>>> Hi all,
>>>>
>>>> In the kernel 6.5-rc5 build on Ubuntu 22.04 LTS (jammy jellyfish) on a Ryzen 7950 assembled box,
>>>> vanilla torvalds tree kernel, the test test_unix_oob unexpectedly fails:
>>>>
>>>> # selftests: net/af_unix: test_unix_oob
>>>> # Test 2 failed, sigurg 23 len 63 OOB %
>>>>
>>>> It is this code:
>>>>
>>>>           /* Test 2:
>>>>            * Verify that the first OOB is over written by
>>>>            * the 2nd one and the first OOB is returned as
>>>>            * part of the read, and sigurg is received.
>>>>            */
>>>>           wait_for_data(pfd, POLLIN | POLLPRI);
>>>>           len = 0;
>>>>           while (len < 70)
>>>>                   len = recv(pfd, buf, 1024, MSG_PEEK);
>>>>           len = read_data(pfd, buf, 1024);
>>>>           read_oob(pfd, &oob);
>>>>           if (!signal_recvd || len != 127 || oob != '#') {
>>>>                   fprintf(stderr, "Test 2 failed, sigurg %d len %d OOB %c\n",
>>>>                   signal_recvd, len, oob);
>>>>                   die(1);
>>>>           }
>>>>
>>>> In 6.5-rc4, this test was OK, so it might mean we have a regression?
>>>
>>> Thanks for reporting.
>>>
>>> I confirmed the test doesn't fail on net-next at least, but it's based
>>> on v6.5-rc4.
>>>
>>>    ---8<---
>>>    [root@...alhost ~]# ./test_unix_oob
>>>    [root@...alhost ~]# echo $?
>>>    0
>>>    [root@...alhost ~]# uname -r
>>>    6.5.0-rc4-01192-g66244337512f
>>>    ---8<---
>>>
>>> I'll check 6.5-rc5 later.
>>
>> Hi, Kuniyuki,
>>
>> It seems that there is a new development. I could reproduce the error with the failed test 2
>> as early as 6.0-rc1. However, the gotcha is that the error appears to be sporadically manifested
>> (possibly a race)?
>>
>> I am currently attempting a bisect.
> 
> Bisect had shown that the condition existed already at 5.11 torvalds tree.
> 
> It has to do with the configs chosen (I used the configs from seltests/*/config merged), but it
> is also present in the Ubuntu production build:
> 
> marvin@...iant:~$ cd linux/kernel/linux_torvalds
> marvin@...iant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
> marvin@...iant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
> marvin@...iant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
> marvin@...iant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
> marvin@...iant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
> Test 2 failed, sigurg 23 len 63 OOB %
> marvin@...iant:~/linux/kernel/linux_torvalds$ uname -rms
> Linux 6.4.8-060408-generic x86_64
> marvin@...iant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
> marvin@...iant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
> marvin@...iant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
> marvin@...iant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
> marvin@...iant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
> marvin@...iant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
> marvin@...iant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
> marvin@...iant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
> marvin@...iant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
> marvin@...iant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
> marvin@...iant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
> marvin@...iant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
> marvin@...iant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
> Test 1 failed sigurg 0 len 63
> marvin@...iant:~/linux/kernel/linux_torvalds$
> 
> It happens on rare occasions, so it seems to be a hard-to-spot race.
> 
> Normal test running test_unix_oob once never noticed that, save by accident, which brought the problem to attention ...
> 
> However, the problem seems to be config-driven rather than kernel-version-driven.
> 
> marvin@...iant:~/linux/kernel/linux_torvalds$ for a in {0..100000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
> Test 3.1 Inline failed, len 1 oob % atmark 0
> Test 1 Inline failed, sigurg 0 len 63
> Test 1 Inline failed, sigurg 0 len 63
> Test 1 Inline failed, sigurg 0 len 63
> Test 2 Inline failed, len 63 atmark 1
> Test 3 Inline failed, sigurg 23 len 63 data x
> Test 3 Inline failed, sigurg 23 len 63 data x
> Test 3 Inline failed, sigurg 23 len 63 data x
> Test 3 Inline failed, sigurg 23 len 63 data x
> Test 2 Inline failed, len 63 atmark 1
> Test 3.1 Inline failed, len 1 oob % atmark 0
> Test 2 failed, sigurg 23 len 63 OOB %
> marvin@...iant:~/linux/kernel/linux_torvalds$ uname -rms
> Linux 6.5.0-060500rc4-generic x86_64
> marvin@...iant:~/linux/kernel/linux_torvalds$
> 
> At moments, I was able to reproduce with certain configs, but now something odd happens.
> 
> I will keep investigating.

Please not that the bug persisted in 6.5-rc6:

marvin@...iant:~/linux/kernel/linux_torvalds$ for a in {0..100000}; do !!; done
for a in {0..100000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
Test 2 failed, sigurg 23 len 63 OOB %
Test 2 Inline failed, len 63 atmark 1
Test 3 Inline failed, sigurg 23 len 63 data x
Test 2 failed, sigurg 23 len 63 OOB %
Test 3.1 Inline failed, len 1 oob % atmark 0
Test 3 Inline failed, sigurg 23 len 63 data x
Test 1 Inline failed, sigurg 0 len 63
Test 1 Inline failed, sigurg 0 len 63
Test 3.1 Inline failed, len 1 oob % atmark 0
Test 1 Inline failed, sigurg 0 len 63
Test 2 failed, sigurg 23 len 63 OOB %
Test 1 Inline failed, sigurg 0 len 63
Test 2 failed, sigurg 23 len 63 OOB %
Test 3.1 Inline failed, len 1 oob % atmark 0
Test 3.1 Inline failed, len 1 oob % atmark 0
marvin@...iant:~/linux/kernel/linux_torvalds$

The bug can be triggered as a non-privileged user, but is not clear whether it is exploitable to elevate privileges.

Best regards,
Mirsad Todorovac