netdev - Re: [PATCH net-next] xen-netfront: try linearizing SKB if it occupies too many slots

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <53B3F9BF.603@canonical.com>
Date:	Wed, 02 Jul 2014 14:23:27 +0200
From:	Stefan Bader <stefan.bader@...onical.com>
To:	Zoltan Kiss <zoltan.kiss@...rix.com>,
	Wei Liu <wei.liu2@...rix.com>,
	Eric Dumazet <eric.dumazet@...il.com>
CC:	netdev@...r.kernel.org, xen-devel@...ts.xen.org,
	David Vrabel <david.vrabel@...rix.com>,
	Konrad Wilk <konrad.wilk@...cle.com>,
	Boris Ostrovsky <boris.ostrovsky@...cle.com>
Subject: Re: [PATCH net-next] xen-netfront: try linearizing SKB if it occupies
 too many slots

On 30.05.2014 14:07, Zoltan Kiss wrote:
> On 30/05/14 09:06, Stefan Bader wrote:
>> On 16.05.2014 18:29, Zoltan Kiss wrote:
>>> On 16/05/14 16:34, Wei Liu wrote:
>>>> On Fri, May 16, 2014 at 08:22:19AM -0700, Eric Dumazet wrote:
>>>>> On Fri, 2014-05-16 at 15:36 +0100, Wei Liu wrote:
>>>>>> On Fri, May 16, 2014 at 07:21:08AM -0700, Eric Dumazet wrote:
>>>>>>> On Fri, 2014-05-16 at 14:11 +0100, Wei Liu wrote:
>>>>>>>
>>>>>>>> It's not that common to trigger this, I only saw a few reports. In fact
>>>>>>>> Stefan's report is the first one that comes with a method to reproduce
>>>>>>>> it.
>>>>>>>>
>>>>>>>> I tested with redis-benchmark on a guest with 256MB RAM and only saw a
>>>>>>>> few "failed to linearize", never saw a single one with 1GB guest.
>>>>>>> Well, I am just saying. This is asking order-5 allocations, and yes,
>>>>>>> this is going to fail after few days of uptime, no matter what you try.
>>>>>>>
>>>>>> Hmm... I see what you mean -- memory fragmentation leads to allocation
>>>>>> failure. Thanks.
>>>>> In the mean time, have you tried to lower gso_max_size ?
>>>>>
>>>>> Setting it witk netif_set_gso_max_size() to something like 56000 might
>>>>> avoid the problem.
>>>>>
>>>>> (Not sure if it is applicable in your case)
>>>>>
>>>> It works, at least in this Redis testcase. Could you explain a bit where
>>>> this 56000 magic number comes from? :-)
>>>>
>>>> Presumably I can derive it from some constant in core network code?
>>> I guess it just makes more unlikely to have packets with problematic layout.
>>> But the following packet would still fail:
>>> linear buffer : 80 bytes, on 2 pages
>>> 17 frags, 80 bytes each, each spanning over page boundary.
>>>
>>> I just had an idea: a modified version of xenvif_handle_frag_list function
>>> from netback would be useful for us here. It recreates the frags array on
>>> fully utilized 4k pages. Plus we can use pskb_expand_head to reduce the page
>>> number on the linear buffer (although it might not work, see my comment in
>>> the patch)
>>> The worst case linear buffer then spans N+1 pages, and has N*PAGE_SIZE+1
>>> bytes. Then the frags after this coalescing should have 16*PAGE_SIZE -
>>> (N*PAGE_SIZE+2) bytes at most, which is 16-N pages. Altogether that's 16+1
>>> page, which should definitely fit!
>>> This is what I mean:
>>>
>> I had been idly wondering about this onwards. And trying to understand the whole
>> skb handling environment, I tried to come up with some idea as well. It may be
>> totally stupid and using the wrong assumptions. It seems to work in the sense
>> that things did not blow up into my face immediately and somehow I did not see
>> dropped packages due to the number of slots either.
>> But again, I am not sure I am doing the right thing. The idea was to just try to
>> get rid of so many compound pages (which I believe are the only ones that can
>> have an offset big enough to allow some alignment savings)...
> Hi,
> 
> This probably helps in a lot of scenarios, but not for everything. E.g. if the
> skb has 15 frags, each PAGE_SIZE+1 bytes, it'll still consume 30 slots for the
> frags, and this function won't be able to help that.
> It's hard to come up with an algorithm which handles all the scenarios we can
> have here, and does that with the least possible amount of copy. I'll keep
> looking into this in the next weeks, but don't hesitate, if you have an idea!

Hi Zoltan,

I lost a bit track about this. I think I saw some summary about brainstorming
something similar but for the backend from you (not sure). I think during last
Xen Hackathon. I cannot remember that got to some solution on the netfront side.
Do I miss something?

-Stefan
> 
> Regads,
> 
> Zoltan
>>
>> -Stefan
>>
>>
>>  From 8571b106643b32296e58526e2fbe97c330877ac8 Mon Sep 17 00:00:00 2001
>> From: Stefan Bader <stefan.bader@...onical.com>
>> Date: Thu, 29 May 2014 12:18:01 +0200
>> Subject: [PATCH] xen-netfront: Align frags to fit max slots
>>
>> In cases where the frags in a skb require more than MAX_SKB_FRAGS + 1
>> (= 18) 4K pages of grant pages, try to reduce the footprint by moving
>> the data to new pages and have it aligned to the beginning.
>> Then replace the page in the frag and release the old one. This sure is
>> more expensive in compute but should happen not too often and sounds
>> better than to just drop the packet in that case.
>>
>> Signed-off-by: Stefan Bader <stefan.bader@...onical.com>
>> ---
>>   drivers/net/xen-netfront.c | 65 +++++++++++++++++++++++++++++++++++++++++++---
>>   1 file changed, 62 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
>> index 158b5e6..ad71e5c 100644
>> --- a/drivers/net/xen-netfront.c
>> +++ b/drivers/net/xen-netfront.c
>> @@ -544,6 +544,61 @@ static int xennet_count_skb_frag_slots(struct sk_buff *skb)
>>       return pages;
>>   }
>>
>> +/*
>> + * Align data to new pages in order to save slots required to
>> + * transmit this buffer.
>> + * @skb - socket buffer
>> + * @target - number of pages to save
>> + * returns the number of pages the fragments have been reduced of
>> + */
>> +static int xennet_align_frags(struct sk_buff *skb, int target)
>> +{
>> +    int i, frags = skb_shinfo(skb)->nr_frags;
>> +    int reduced = 0;
>> +
>> +    for (i = 0; i < frags; i++) {
>> +        skb_frag_t *frag = skb_shinfo(skb)->frags + i;
>> +        struct page *fpage = skb_frag_page(frag);
>> +        struct page *npage;
>> +        unsigned long size;
>> +        unsigned long offset;
>> +        gfp_t gfp;
>> +        int order;
>> +
>> +        if (!PageCompound(fpage))
>> +            continue;
>> +
>> +        size = skb_frag_size(frag);
>> +        offset = frag->page_offset & ~PAGE_MASK;
>> +
>> +        /*
>> +         * If the length of data in the last subpage of a compound
>> +         * page is smaller than the offset into the first data sub-
>> +         * page, we can save a subpage by copying data around.
>> +         */
>> +        if ( ((offset + size) & ~PAGE_MASK) > offset )
>> +            continue;
>> +
>> +        gfp = GFP_ATOMIC | __GFP_COLD;
>> +        order = PFN_UP(size);
>> +        if (order)
>> +            gfp |= __GFP_COMP | __GFP_NOWARN;
>> +
>> +        npage = alloc_pages(gfp, order);
>> +        if (!npage)
>> +            break;
>> +        memcpy(page_address(npage), skb_frag_address(frag), size);
>> +        frag->page.p = npage;
>> +        frag->page_offset = 0;
>> +        put_page(fpage);
>> +
>> +        if (++reduced >= target)
>> +            break;
>> +    }
>> +
>> +    return reduced;
>> +}
>> +
>>   static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
>>   {
>>       unsigned short id;
>> @@ -573,9 +628,13 @@ static int xennet_start_xmit(struct sk_buff *skb, struct
>> net_device *dev)
>>       slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
>>           xennet_count_skb_frag_slots(skb);
>>       if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
>> -        net_alert_ratelimited(
>> -            "xennet: skb rides the rocket: %d slots\n", slots);
>> -        goto drop;
>> +        slots -= xennet_align_frags(skb, slots - (MAX_SKB_FRAGS + 1));
>> +        if (slots > MAX_SKB_FRAGS + 1) {
>> +            net_alert_ratelimited(
>> +                "xennet: skb rides the rocket: %d slots\n",
>> +                slots);
>> +            goto drop;
>> +        }
>>       }
>>
>>       spin_lock_irqsave(&np->tx_lock, flags);
> 



Download attachment "signature.asc" of type "application/pgp-signature" (902 bytes)