lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKgT0Ucz4R=xOCWgauDO_i6PX7=hgiohXngo2Mea5R8GC_s2qQ@mail.gmail.com>
Date: Mon, 19 Aug 2024 08:52:58 -0700
From: Alexander Duyck <alexander.duyck@...il.com>
To: Yunsheng Lin <linyunsheng@...wei.com>
Cc: davem@...emloft.net, kuba@...nel.org, pabeni@...hat.com, 
	netdev@...r.kernel.org, linux-kernel@...r.kernel.org, 
	Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org
Subject: Re: [PATCH net-next v13 11/14] mm: page_frag: introduce
 prepare/probe/commit API

On Fri, Aug 16, 2024 at 5:01 AM Yunsheng Lin <linyunsheng@...wei.com> wrote:
>
> On 2024/8/15 23:25, Alexander Duyck wrote:
> > On Wed, Aug 14, 2024 at 8:05 PM Yunsheng Lin <linyunsheng@...wei.com> wrote:
> >>
> >> On 2024/8/15 5:00, Alexander H Duyck wrote:
> >
> > ...
> >
> >>>> +static inline void page_frag_alloc_abort(struct page_frag_cache *nc,
> >>>> +                                     unsigned int fragsz)
> >>>> +{
> >>>> +    nc->pagecnt_bias++;
> >>>> +    nc->remaining += fragsz;
> >>>> +}
> >>>> +
> >>>
> >>> This doesn't add up. Why would you need abort if you have commit? Isn't
> >>> this more of a revert? I wouldn't think that would be valid as it is
> >>> possible you took some sort of action that might have resulted in this
> >>> memory already being shared. We shouldn't allow rewinding the offset
> >>> pointer without knowing that there are no other entities sharing the
> >>> page.
> >>
> >> This is used for __tun_build_skb() in drivers/net/tun.c as below, mainly
> >> used to avoid performance penalty for XDP drop case:
> >
> > Yeah, I reviewed that patch. As I said there, rewinding the offset
> > should be avoided unless you can verify you are the only owner of the
> > page as you have no guarantees that somebody else didn't take an
> > access to the page/data to send it off somewhere else. Once you expose
> > the page to any other entity it should be written off or committed in
> > your case and you should move on to the next block.
>
> Yes, the expectation is that somebody else didn't take an access to the
> page/data to send it off somewhere else between page_frag_alloc_va()
> and page_frag_alloc_abort(), did you see expectation was broken in that
> patch? If yes, we should fix that by using page_frag_free_va() related
> API instead of using page_frag_alloc_abort().

The problem is when you expose it to XDP there are a number of
different paths it can take. As such you shouldn't be expecting XDP to
not do something like that. Basically you have to check the reference
count before you can rewind the page.

> >
> >
> >>
> >>>> +static struct page *__page_frag_cache_reload(struct page_frag_cache *nc,
> >>>> +                                         gfp_t gfp_mask)
> >>>>  {
> >>>> +    struct page *page;
> >>>> +
> >>>>      if (likely(nc->encoded_va)) {
> >>>> -            if (__page_frag_cache_reuse(nc->encoded_va, nc->pagecnt_bias))
> >>>> +            page = __page_frag_cache_reuse(nc->encoded_va, nc->pagecnt_bias);
> >>>> +            if (page)
> >>>>                      goto out;
> >>>>      }
> >>>>
> >>>> -    if (unlikely(!__page_frag_cache_refill(nc, gfp_mask)))
> >>>> -            return false;
> >>>> +    page = __page_frag_cache_refill(nc, gfp_mask);
> >>>> +    if (unlikely(!page))
> >>>> +            return NULL;
> >>>>
> >>>>  out:
> >>>>      /* reset page count bias and remaining to start of new frag */
> >>>>      nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1;
> >>>>      nc->remaining = page_frag_cache_page_size(nc->encoded_va);
> >>>> -    return true;
> >>>> +    return page;
> >>>> +}
> >>>> +
> >>>
> >>> None of the functions above need to be returning page.
> >>
> >> Are you still suggesting to always use virt_to_page() even when it is
> >> not really necessary? why not return the page here to avoid the
> >> virt_to_page()?
> >
> > Yes. The likelihood of you needing to pass this out as a page should
> > be low as most cases will just be you using the virtual address
> > anyway. You are essentially trading off branching for not having to
> > use virt_to_page. It is unnecessary optimization.
>
> As my understanding, I am not trading off branching for not having to
> use virt_to_page, the branching is already needed no matter we utilize
> it to avoid calling virt_to_page() or not, please be more specific about
> which branching is traded off for not having to use virt_to_page() here.

The virt_to_page overhead isn't that high. It would be better to just
use a consistent path rather than try to optimize for an unlikely
branch in your datapath.

> >
> >
> >>
> >>>> +struct page *page_frag_alloc_pg(struct page_frag_cache *nc,
> >>>> +                            unsigned int *offset, unsigned int fragsz,
> >>>> +                            gfp_t gfp)
> >>>> +{
> >>>> +    unsigned int remaining = nc->remaining;
> >>>> +    struct page *page;
> >>>> +
> >>>> +    VM_BUG_ON(!fragsz);
> >>>> +    if (likely(remaining >= fragsz)) {
> >>>> +            unsigned long encoded_va = nc->encoded_va;
> >>>> +
> >>>> +            *offset = page_frag_cache_page_size(encoded_va) -
> >>>> +                            remaining;
> >>>> +
> >>>> +            return virt_to_page((void *)encoded_va);
> >>>> +    }
> >>>> +
> >>>> +    if (unlikely(fragsz > PAGE_SIZE))
> >>>> +            return NULL;
> >>>> +
> >>>> +    page = __page_frag_cache_reload(nc, gfp);
> >>>> +    if (unlikely(!page))
> >>>> +            return NULL;
> >>>> +
> >>>> +    *offset = 0;
> >>>> +    nc->remaining = remaining - fragsz;
> >>>> +    nc->pagecnt_bias--;
> >>>> +
> >>>> +    return page;
> >>>>  }
> >>>> +EXPORT_SYMBOL(page_frag_alloc_pg);
> >>>
> >>> Again, this isn't returning a page. It is essentially returning a
> >>> bio_vec without calling it as such. You might as well pass the bio_vec
> >>> pointer as an argument and just have it populate it directly.
> >>
> >> I really don't think your bio_vec suggestion make much sense  for now as
> >> the reason mentioned in below:
> >>
> >> "Through a quick look, there seems to be at least three structs which have
> >> similar values: struct bio_vec & struct skb_frag & struct page_frag.
> >>
> >> As your above agrument about using bio_vec, it seems it is ok to use any
> >> one of them as each one of them seems to have almost all the values we
> >> are using?
> >>
> >> Personally, my preference over them: 'struct page_frag' > 'struct skb_frag'
> >>> 'struct bio_vec', as the naming of 'struct page_frag' seems to best match
> >> the page_frag API, 'struct skb_frag' is the second preference because we
> >> mostly need to fill skb frag anyway, and 'struct bio_vec' is the last
> >> preference because it just happen to have almost all the values needed.
> >
> > That is why I said I would be okay with us passing page_frag in patch
> > 12 after looking closer at the code. The fact is it should make the
> > review of that patch set much easier if you essentially just pass the
> > page_frag back out of the call. Then it could be used in exactly the
> > same way it was before and should reduce the total number of lines of
> > code that need to be changed.
>
> So the your suggestion changed to something like below?
>
> int page_frag_alloc_pfrag(struct page_frag_cache *nc, struct page_frag *pfrag);
>
> The API naming of 'page_frag_alloc_pfrag' seems a little odd to me, any better
> one in your mind?

Well at this point we are populating/getting/pulling a page frag from
the page frag cache. Maybe look for a word other than alloc such as
populate. Essentially what you are doing is populating the pfrag from
the frag cache, although I thought there was a size value you passed
for that isn't there?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ