[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3684be7d-aae9-9478-6826-a8c92f01a2cd@netronome.com>
Date: Thu, 15 Aug 2019 15:02:02 +0100
From: Quentin Monnet <quentin.monnet@...ronome.com>
To: Andrii Nakryiko <andrii.nakryiko@...il.com>
Cc: Alexei Starovoitov <alexei.starovoitov@...il.com>,
Edward Cree <ecree@...arflare.com>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
bpf <bpf@...r.kernel.org>,
Network Development <netdev@...r.kernel.org>,
oss-drivers@...ronome.com
Subject: Re: [RFC bpf-next 0/3] tools: bpftool: add subcommand to count map
entries
2019-08-14 13:18 UTC-0700 ~ Andrii Nakryiko <andrii.nakryiko@...il.com>
> On Wed, Aug 14, 2019 at 10:12 AM Quentin Monnet
> <quentin.monnet@...ronome.com> wrote:
>>
>> 2019-08-14 09:58 UTC-0700 ~ Alexei Starovoitov
>> <alexei.starovoitov@...il.com>
>>> On Wed, Aug 14, 2019 at 9:45 AM Edward Cree <ecree@...arflare.com> wrote:
>>>>
>>>> On 14/08/2019 10:42, Quentin Monnet wrote:
>>>>> 2019-08-13 18:51 UTC-0700 ~ Alexei Starovoitov
>>>>> <alexei.starovoitov@...il.com>
>>>>>> The same can be achieved by 'bpftool map dump|grep key|wc -l', no?
>>>>> To some extent (with subtleties for some other map types); and we use a
>>>>> similar command line as a workaround for now. But because of the rate of
>>>>> inserts/deletes in the map, the process often reports a number higher
>>>>> than the max number of entries (we observed up to ~750k when max_entries
>>>>> is 500k), even is the map is only half-full on average during the count.
>>>>> On the worst case (though not frequent), an entry is deleted just before
>>>>> we get the next key from it, and iteration starts all over again. This
>>>>> is not reliable to determine how much space is left in the map.
>>>>>
>>>>> I cannot see a solution that would provide a more accurate count from
>>>>> user space, when the map is under pressure?
>>>> This might be a really dumb suggestion, but: you're wanting to collect a
>>>> summary statistic over an in-kernel data structure in a single syscall,
>>>> because making a series of syscalls to examine every entry is slow and
>>>> racy. Isn't that exactly a job for an in-kernel virtual machine, and
>>>> could you not supply an eBPF program which the kernel runs on each entry
>>>> in the map, thus supporting people who want to calculate something else
>>>> (mean, min and max, whatever) instead of count?
>>>
>>> Pretty much my suggestion as well :)
>
> I also support the suggestion to count it from BPF side. It's flexible
> and powerful approach and doesn't require adding more and more nuanced
> sub-APIs to kernel to support subset of bulk operations on map
> (subset, because we'll expose count, but what about, e.g., p50, etc,
> there will always be something more that someone will want and it just
> doesn't scale).
Hi Andrii,
Yes, that makes sense.
>
>>>
>>> It seems the better fix for your nat threshold is to keep count of
>>> elements in the map in a separate global variable that
>>> bpf program manually increments and decrements.
>>> bpftool will dump it just as regular map of single element.
>>> (I believe it doesn't recognize global variables properly yet)
>>> and BTF will be there to pick exactly that 'count' variable.
>>>
>>
>> It would be with an offloaded map, but yes, I suppose we could keep
>> track of the numbers in a separate map. We'll have a look into this.
>
> See if you can use a global variable, that way you completely
> eliminate any overhead from BPF side of things, except for atomic
> increment.
Offloaded maps do not implement the map_direct_value_addr() operation,
so global variables are not supported at the moment. I need to dive
deeper into this and see what is required to add that support.
Thanks for your advice!
Quentin
Powered by blists - more mailing lists