netdev - Re: selftest/net: so_txtime.sh fails intermittently

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CA+FuTSdYOnJCsGuj43xwV1jxvYsaoa_LzHQF9qMyhrkLrivxKw@mail.gmail.com>
Date:   Wed, 20 Nov 2019 19:34:08 -0500
From:   Willem de Bruijn <willemdebruijn.kernel@...il.com>
To:     Naresh Kamboju <naresh.kamboju@...aro.org>
Cc:     "open list:KERNEL SELFTEST FRAMEWORK" 
        <linux-kselftest@...r.kernel.org>, Netdev <netdev@...r.kernel.org>,
        Shuah Khan <shuah@...nel.org>,
        Anders Roxell <anders.roxell@...aro.org>,
        lkft-triage@...ts.linaro.org,
        "David S. Miller" <davem@...emloft.net>,
        jesus.sanchez-palencia@...el.com
Subject: Re: selftest/net: so_txtime.sh fails intermittently - read Resource
 temporarily unavailable

On Wed, Nov 20, 2019 at 1:33 AM Naresh Kamboju
<naresh.kamboju@...aro.org> wrote:
>
> On Fri, 15 Nov 2019 at 21:52, Willem de Bruijn
> <willemdebruijn.kernel@...il.com> wrote:
> >
> > On Thu, Nov 14, 2019 at 3:47 AM Naresh Kamboju
>
> > This appears to have been flaky from the start, particularly on qemu_arm.
>
> This is because of emulating 2 CPU.
> I am gonna change this to emulate 4 CPU for qemu_arm.
>
> >
> > Looking at a few runs..
> >
> > failing runs exceeds bounds:
> > https://lkft.validation.linaro.org/scheduler/job/1006586
> ...
> > delay29722: expected20000_(us) #
> > # ./so_txtime exceeds variance (2000 us)
> > "
> > These are easy to suppress, by just increasing cfg_variance_us and
> > optionally also increasing the delivery time scale.
>
> Alright !
> The variance is 2000.
> static int cfg_variance_us = 2000
>
> > Naresh, when you mention "multiple boards" are there specific
> > microarchitectural details of the hosts that I should take into
> > account aside from the qemu-arm virtualized environment itself?
>
> The easy to reproduce way is running 32-bit kernel and rootfs on
> x86_64 machine.

Thanks. As soon as I disabled kvm acceleration, it proved also easy to
reproduce on an x86_64 guest inside an x86_64 host.

> # ./so_txtime read Resource temporarily unavailable
> read: Resource_temporarily #

This occurs due to sch_etf dropping the packet on dequeue in
etf_dequeue_timesortedlist because of dequeue time is after the
scheduled delivery time.

There is some inevitable delay and jitter in scheduling the dequeue
timer. The q->delta argument to ETF enables scheduling ahead of the
deadline. Unfortunately, in this virtualized environment even the
current setting in so_txtime.sh of 200 us is proves too short. It
already seemed high to me at the time.

Doubling to 400 usec and also doubling cfg_variance_us to 4000 greatly
reduces -if not fully solves- the failure rate for me.

This type of drop is also reported through the socket error queue. To
avoid ending up with wholly meaningless time bounds, we can retry on
these known failures as long as failure rate is already low.