lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20180210013610.hkeblyfx76r2hgzt@ast-mbp>
Date:   Fri, 9 Feb 2018 17:36:11 -0800
From:   Alexei Starovoitov <alexei.starovoitov@...il.com>
To:     Daniel Borkmann <daniel@...earbox.net>
Cc:     Li Zhijian <zhijianx.li@...el.com>,
        "ast@...nel.org" <ast@...nel.org>,
        "shuah@...nel.org" <shuah@...nel.org>,
        "linux-kselftest@...r.kernel.org" <linux-kselftest@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "Li, Philip" <philip.li@...el.com>, netdev@...r.kernel.org
Subject: Re: [Resend] Question: kselftests: bpf/test_maps failed

On Fri, Feb 09, 2018 at 03:01:57PM +0100, Daniel Borkmann wrote:
> On 02/09/2018 06:14 AM, Li Zhijian wrote:
> > Hi
> > 
> > INTEL 0-Day noticed that bpf/test_maps has different results at different platforms.
> > when it fails, the details are like
> 
> Sorry for the late reply and thanks for reporting! More below:
> 
> > ------------------
> >   880 Failed to create hashmap key=16 value=131072 'Cannot allocate memory'
> >   881 Failed to create hashmap key=8 value=32768 'Cannot allocate memory'
> >   882 Failed to create hashmap key=8 value=131072 'Cannot allocate memory'
> >   883 Failed to create hashmap key=16 value=32768 'Cannot allocate memory'
> >   884 Failed to create hashmap key=8 value=16384 'Cannot allocate memory'
> >   885 Failed to create hashmap key=16 value=16384 'Cannot allocate memory'
> >   886 Failed to create hashmap key=8 value=65536 'Cannot allocate memory'
> >   887 Failed to create hashmap key=16 value=131072 'Cannot allocate memory'
> >   888 Failed to create hashmap key=16 value=32768 'Cannot allocate memory'
> >   889 Failed to create hashmap key=16 value=65536 'Cannot allocate memory'
> >   890 Failed to create hashmap key=8 value=65536 'Cannot allocate memory'
> >   891 Failed to create hashmap key=8 value=131072 'Cannot allocate memory'
> >   892 Failed to create hashmap key=8 value=131072 'Cannot allocate memory'
> >   893 Failed to create hashmap key=16 value=32768 'Cannot allocate memory'
> >   894 Failed to create hashmap key=8 value=16384 'Cannot allocate memory'
> >   895 Failed to create hashmap key=8 value=131072 'Cannot allocate memory'
> >   896 Failed to create hashmap key=16 value=8192 'Cannot allocate memory'
> >   897 Failed to create hashmap key=8 value=32768 'Cannot allocate memory'
> >   898 Failed to create hashmap key=16 value=8192 'Cannot allocate memory'
> >   899 Failed to create hashmap key=8 value=262144 'Cannot allocate memory'
> >   900 Failed to create hashmap key=8 value=262144 'Cannot allocate memory'
> >   901 Failed to create hashmap key=8 value=262144 'Cannot allocate memory'
> >   902 Failed to create hashmap key=16 value=262144 'Cannot allocate memory'
> >   903 Failed to create hashmap key=8 value=262144 'Cannot allocate memory'
> >   904 Failed to create hashmap key=8 value=262144 'Cannot allocate memory'
> >   905 test_maps: test_maps.c:955: run_parallel: Assertion `status == 0' failed.
> >   906 Aborted
> >   907 not ok 1..3 selftests:  test_maps [FAIL]
> > ------------------
> > 
> > After a simply looking at the code, looks it's related to the cpu number and system memory.
> > 
> > below are the result under different platform
> > 1. Good
> > model: Sandy Bridge
> > nr_node: 1
> > nr_cpu: 4
> > memory: 6G
> > 
> > 2. Good
> > model: qemu-system-x86_64 -enable-kvm
> > nr_cpu: 2
> > memory: 4G
> > 
> > 3. Bad
> > model: Ivytown Ivy Bridge-EP
> > nr_cpu: 48
> > memory: 64G
> > 
> > 4. Bad
> > model: Skylake
> > nr_cpu: 104
> > memory: 64G
> > 
> > I try to change the process number to 10 from 100, so it can pass at above Skylake(4) machine.
> > ------------
> > lizhijian@...well-OptiPlex-9020:~/lkp/linux/tools/testing/selftests/bpf$ git diff
> > diff --git a/tools/testing/selftests/bpf/test_maps.c b/tools/testing/selftests/bpf/test_maps.c
> > index 040356e..b788ca1 100644
> > --- a/tools/testing/selftests/bpf/test_maps.c
> > +++ b/tools/testing/selftests/bpf/test_maps.c
> > @@ -960,7 +960,7 @@ static void test_map_stress(void)
> >  {
> >         run_parallel(100, test_hashmap, NULL);
> >         run_parallel(100, test_hashmap_percpu, NULL);
> > -       run_parallel(100, test_hashmap_sizes, NULL);
> > +       run_parallel(10, test_hashmap_sizes, NULL);
> >         run_parallel(100, test_hashmap_walk, NULL);
> >  
> >         run_parallel(100, test_arraymap, NULL);
> 
> Unless Alexei has some better idea, I think if the bpf_create_map() error in
> the stress test is about ENOMEM, then we shouldn't fail hard via exit(), for
> all other cases we should however. So probably makes sense to just check for
> errno == ENOMEM in case of fd < 0 in test_hashmap_sizes() and then continue
> to keep trying under stress. Feel free to send a patch, Li.

that's probably good path for now.
I also see that test_maps fails on freshly booted kernel with such assert,
but then restarting test_maps again works and repeated runs succeed too.
I suspect there is a deeper issue here related to memory allocation.
Either slab or percpu allocator are behaving funky.
It needs to be further debugged.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ