[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAEf4Bza2eD4de6m2e_vmbB9pDsCYr+jsWfMe+u2wWrfRaxXZdw@mail.gmail.com>
Date: Thu, 14 May 2020 16:06:23 -0700
From: Andrii Nakryiko <andrii.nakryiko@...il.com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: Thomas Gleixner <tglx@...utronix.de>,
Jakub Kicinski <kuba@...nel.org>,
Andrii Nakryiko <andriin@...com>, linux-arch@...r.kernel.org,
bpf <bpf@...r.kernel.org>, Networking <netdev@...r.kernel.org>,
Alexei Starovoitov <ast@...com>,
Daniel Borkmann <daniel@...earbox.net>,
Kernel Team <kernel-team@...com>,
"Paul E . McKenney" <paulmck@...nel.org>,
Jonathan Lemon <jonathan.lemon@...il.com>
Subject: Re: [PATCH bpf-next 1/6] bpf: implement BPF ring buffer and verifier
support for it
On Thu, May 14, 2020 at 3:56 PM Alexei Starovoitov
<alexei.starovoitov@...il.com> wrote:
>
> On Thu, May 14, 2020 at 02:30:11PM -0700, Andrii Nakryiko wrote:
> > On Thu, May 14, 2020 at 1:39 PM Thomas Gleixner <tglx@...utronix.de> wrote:
> > >
> > > Jakub Kicinski <kuba@...nel.org> writes:
> > >
> > > > On Wed, 13 May 2020 12:25:27 -0700 Andrii Nakryiko wrote:
> > > >> One interesting implementation bit, that significantly simplifies (and thus
> > > >> speeds up as well) implementation of both producers and consumers is how data
> > > >> area is mapped twice contiguously back-to-back in the virtual memory. This
> > > >> allows to not take any special measures for samples that have to wrap around
> > > >> at the end of the circular buffer data area, because the next page after the
> > > >> last data page would be first data page again, and thus the sample will still
> > > >> appear completely contiguous in virtual memory. See comment and a simple ASCII
> > > >> diagram showing this visually in bpf_ringbuf_area_alloc().
> > > >
> > > > Out of curiosity - is this 100% okay to do in the kernel and user space
> > > > these days? Is this bit part of the uAPI in case we need to back out of
> > > > it?
> > > >
> > > > In the olden days virtually mapped/tagged caches could get confused
> > > > seeing the same physical memory have two active virtual mappings, or
> > > > at least that's what I've been told in school :)
> > >
> > > Yes, caching the same thing twice causes coherency problems.
> > >
> > > VIVT can be found in ARMv5, MIPS, NDS32 and Unicore32.
> > >
> > > > Checking with Paul - he says that could have been the case for Itanium
> > > > and PA-RISC CPUs.
> > >
> > > Itanium: PIPT L1/L2.
> > > PA-RISC: VIPT L1 and PIPT L2
> > >
> > > Thanks,
> > >
> >
> > Jakub, thanks for bringing this up.
> >
> > Thomas, Paul, what kind of problems are we talking about here? What
> > are the possible problems in practice?
>
> VIVT cpus will have issues with coherency protocol between cpus.
> I don't think it applies to this case.
> Here all cpus we have the same phys page seen in two virtual pages.
> That mapping is the same across all cpus.
> But any given range of virtual addresses in these two pages will
> be accessed by only one cpu at a time.
> At least that's my understanding of Andrii's algorithm.
> We probably need to white board the overlapping case a bit more.
> Worst case I think it's fine to disallow this new ring buffer
> on such architectures. The usability from bpf program side
> is too great to give up.
>From what Paul described, I think this will work in any case. Each
byte of reserved/committed record is going to be both written and
consumed using exactly the same virtual mapping and only that one.
E.g., in case of samples starting at the end of ringbuf and ending at
the beginning. Header and first part will be read using first set of
mapped pages, while second part will be written and read using second
set of pages (never first set of pages). So it seems like everything
should be fine even on VIVT architectures?
More visually, copying diagram from the code:
------------------------------------------------------
| meta pages | mapping 1 | mapping 2 |
------------------------------------------------------
| | 1 2 3 4 5 6 7 8 9 | 1 2 3 4 5 6 7 8 9 |
------------------------------------------------------
| | TA DA | TA DA |
------------------------------------------------------
^^^^^^^
DA is always written/read using "mapping 1", while TA is always
written/read through mapping 2. Never DA is accessed through "mapping
2", nor TA is accessed through "mapping 1".
Powered by blists - more mailing lists