netdev - Re: [PATCH bpf-next 0/5] fix test

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <b76b1bf0-c197-1a12-fc88-3d6f22079728@lab.ntt.co.jp>
Date:   Mon, 28 May 2018 13:16:24 +0900
From:   Prashant Bhole <bhole_prashant_q7@....ntt.co.jp>
To:     John Fastabend <john.fastabend@...il.com>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>
Cc:     "David S . Miller" <davem@...emloft.net>,
        Shuah Khan <shuah@...nel.org>, netdev@...r.kernel.org
Subject: Re: [PATCH bpf-next 0/5] fix test_sockmap

On 5/25/2018 11:01 PM, John Fastabend wrote:
> On 05/25/2018 01:28 AM, Prashant Bhole wrote:
>>
>>
>> On 5/24/2018 1:58 PM, John Fastabend wrote:
>>> On 05/23/2018 09:47 PM, Prashant Bhole wrote:
>>>>
>>>>
>>>> On 5/23/2018 6:44 PM, Prashant Bhole wrote:
>>>>>
>>>>>
>>>>> On 5/22/2018 2:08 AM, John Fastabend wrote:
>>>>>> On 05/20/2018 10:13 PM, Prashant Bhole wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 5/19/2018 1:42 AM, John Fastabend wrote:
>>>>>>>> On 05/18/2018 12:17 AM, Prashant Bhole wrote:
>>>>>>>>> This series fixes bugs in test_sockmap code. They weren't caught
>>>>>>>>> previously because failure in RX/TX thread was not notified to the
>>>>>>>>> main thread.
>>>>>>>>>
>>>>>>>>> Also fixed data verification logic and slightly improved test
>>>>>>>>> output
>>>>>>>>> such that parameters values (cork, apply, start, end) of failed
>>>>>>>>> test
>>>>>>>>> can be easily seen.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Great, this was on my list so thanks for taking care of it.
>>>>>>>>
>>>>>>>>> Note: Even after fixing above problems there are issues with tests
>>>>>>>>> which set cork parameter. Tests fail (RX thread timeout) when cork
>>>>>>>>> value is non-zero and overall data sent by TX thread isn't
>>>>>>>>> multiples
>>>>>>>>> of cork value.
>>>>>>>>
>>>>>>>>
>>>>>>>> This is expected. When 'cork' is set the sender should only xmit
>>>>>>>> the data when 'cork' bytes are available. If the user doesn't
>>>>>>>> provide the N bytes the data is cork'ed waiting for the bytes and
>>>>>>>> if the socket is closed the state is cleaned up. What these tests
>>>>>>>> are testing is the cleanup path when a user doesn't provide the
>>>>>>>> N bytes. In practice this is used to validate headers and prevent
>>>>>>>> users from sending partial headers. We want to keep these tests
>>>>>>>> because
>>>>>>>> they verify a tear-down path in the code.
>>>>>>>
>>>>>>> Ok.
>>>>>>>
>>>>>>>>
>>>>>>>> After your changes do these get reported as failures? If so we
>>>>>>>> need to account for the above in the calculations.
>>>>>>>
>>>>>>> Yes, cork related test are reported as failures because of RX thread
>>>>>>> timeout.
>>>>>>>
>>>>>>> So with your above description, I think we need to differentiate cork
>>>>>>> tests with partial data and full data. In partial data test we can
>>>>>>> have
>>>>>>> something like "timeout_expected" flag. Any other way to fix it?
>>>>>>>
>>>>>>
>>>>>> Adding a flag seems reasonable to me. Lets do this for now. Also I
>>>>>> plan to add more negative tests so we can either use the same
>>>>>> flag or a new one for those cases as well.
>>>>>>
>>>>>
>>>>> John,
>>>>> I worked on this for some time and noticed that the RX-timeout of
>>>>> tests with cork parameter is dependent on various parameters. So we
>>>>> can not set a flag like the way 'drop_expected' flag is set before
>>>>> executing the test.
>>>>>
>>>>> So I decided to write a function which judges all parameters before
>>>>> each test and decides whether a test with cork parameter will
>>>>> timeout or not. Then the conditions in the function became
>>>>> complicated. For example some tests fail if opt->rate < 17 (with
>>>>> some other conditions). Here is 17 is related to FRAGS_PER_SKB.
>>>>> Consider following two examples.
>>>> I'm sorry. Correction: s/FRAGS_PER_SKB/MAX_SKB_FRAGS/
>>>>
>>>>>
>>>>> ./test_sockmap --cgroup /mnt/cgroup2 -r 16 -i 1 -l 30 -t sendpage
>>>>> --txmsg --txmsg_cork 1024   # RX timeout occurs
>>>>>
>>>>> ./test_sockmap --cgroup /mnt/cgroup2 -r 17 -i 1 -l 30 -t sendpage
>>>>> --txmsg --txmsg_cork 1024   # Success!
>>>>>
>>>
>>> Ah yes this hits the buffer limit and flushes the queue. The kernel
>>> side doesn't know how to merge those specific sendpage requests so
>>> it gives each request its own buffer and when the limit is reached
>>> we flush it.
>>>
>>>>> Do we need to keep such tests? if yes, then I will continue with
>>>>> adding such conditions in the function.
>>>>>
>>>
>>> Yes, these tests are needed because they are testing the edge cases.
>>> These are probably the most important tests because my normal usage
>>> will catch any issues in the "good" cases its these types of things
>>> that can go unnoticed (at least for a short while) if we don't have
>>> specific tests for them.
>>
>> I tried but it is difficult to come up with a right set of conditions
>> which lead to test failure.
>>
> 
> Agreed, it can be yes. How about adding your logic for all tests except
> "cork" cases. If there is a flag to set if the timeout is expected we
> can always manually set it in the test invocation. Might not be as
> nice as automatically learning the expected results but possibly easier
> than building some complicated logic to figure it out.
> 
> Would you mind submitting your series again without the "cork" tests
> being tracked? And if you want add a bit to tell if the "cork" tests are
> going to timeout or not setting it per test manually. But I think
> your series can just omit the cork test for now and still be useful.

Ok. I will submit the series again. Without any change in actual 
patches, but cover letter reorganized. Thanks.

-Prashant