linux-kernel - Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network troubles "bisected"

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <1972209744.20140326121116@eikelenboom.it>
Date:	Wed, 26 Mar 2014 12:11:16 +0100
From:	Sander Eikelenboom <linux@...elenboom.it>
To:	Paul Durrant <Paul.Durrant@...rix.com>
CC:	Wei Liu <wei.liu2@...rix.com>, annie li <annie.li@...cle.com>,
	Zoltan Kiss <zoltan.kiss@...rix.com>,
	<xen-devel@...ts.xen.org>, Ian Campbell <Ian.Campbell@...rix.com>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	<netdev@...r.kernel.org>
Subject: Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network troubles "bisected"

Paul,

You have been awfully silent for this whole thread while this is a regression caused by a patch of you
(ca2f09f2b2c6c25047cfc545d057c4edfcfe561c as clearly stated much earlier in this thread).

The commit messages states:
    "net_rx_action() is the place where we could do with an accurate predicition but,
    since that has proven tricky to calculate, a cheap worse-case (but not too bad)
    estimate is all we really need since the only thing we *must* prevent is xenvif_gop_skb()
    consuming more slots than are available."

Your "worst-case" calculation stated in the commit message is clearly not the worst case,
since it doesn't take calls to "get_next_rx_buffer" into account.

Problem is that a worst case calculation would probably be reverting to the old calculation,
and the problems this patch was trying to solve would reappear, but introducing new regressions
isn't very useful either. And since it seems such a tricky and fragile thing to determine, it would
probably be best to be split into a distinct function with a comment to explain the rationale used.

Since this doesn't seem to progress very fast .. CC'ed some more folks .. you never know ..

--
Sander


Tuesday, March 25, 2014, 4:29:42 PM, you wrote:


> Tuesday, March 25, 2014, 4:15:39 PM, you wrote:

>> On Sat, Mar 22, 2014 at 07:28:34PM +0100, Sander Eikelenboom wrote:
>> [...]
>>> > Yes there is only one frag .. but it seems to be much larger than PAGE_SIZE .. and xenvif_gop_frag_copy brakes that frag down into smaller bits .. hence the calculation in xenvif_rx_action determining the slots needed by doing:
>>> 
>>> >                 for (i = 0; i < nr_frags; i++) {
>>> >                         unsigned int size;
>>> >                         size = skb_frag_size(&skb_shinfo(skb)->frags[i]);
>>> >                         max_slots_needed += DIV_ROUND_UP(size, PAGE_SIZE);
>>> >                 }
>>> 
>>> > But the code in xenvif_gop_frag_copy .. seems to be needing one more slot (from the emperical test) .. and calling "get_next_rx_buffer" seems involved in that ..
>>> 
>>> Hmm looked again .. and it seems this is it .. when your frags are large enough you have the chance of running into this.
>>> 

>> get_next_rx_buffer is guarded by start_new_rx_buffer. Do you see any
>> problem with that implementation?
> In general no, but "get_next_rx_buffer" up's cons .. and the calculations done in "xenvif_rx_action" for max_slots_needed to prevent the overrun
> don't count in this possibility. So it's not the guarding of "start_new_rx_buffer" that is at fault. It's the ones early in "xenvif_rx_action".
> The ones that were changed by Paul's patch from MAX_SKB_FRAGS to a calculated value that should be a "slim fit".

> The problem is in determining upfront in "xenvif_rx_action" when and how often the "get_next_rx_buffer" path will be taken.
> Unless there are other non direct restrictions (from a size point of view) it can be called multiple times per frag per skb.

>>> Problem is .. i don't see an easy fix, the "one more slot" of the empirical test doesn't seem to be the worst case either (i think):
>>> 
>>> - In my case the packets that hit this only have 1 frag, but i could have had more frags.
>>>   I also think you can't rule out the possibility of doing the "get_next_rx_buffer" for multiple subsequent frags from one packet,
>>>   so in the worst (and perhaps even from a single frag since it's looped over a split of it in what seems PAGE_SIZE pieces.)
>>>   - So an exact calculation of how much slots we are going to need for hitting this "get_next_rx_buffer"  upfront in "xenvif_rx_action" seems unfeasible.
>>>   - A worst case gamble seems impossible either .. if you take multiple frags * multiple times the "get_next_rx_buffer" ... you would probably be back at just
>>>     setting the needed_slots to MAX_SKB_FRAGS.
>>> 
>>> - Other thing would be checking for the available slots before doing the  "get_next_rx_buffer" .. how ever .. we don't account for how many slots we still need to
>>>   just process the remaining frags.
>>> 

>> We've done a worst case estimation for whole SKB (linear area + all
>> frags) already, at the very first beginning. That's what
>> max_slots_needed is for.

>>> - Just remove the whole "get_next_rx_buffer".. just tested it but it seems the "get_next_rx_buffer" is necessary ..  when i make start_new_rx_buffer always return false
>>>   i hit the bug below :S
>>> 
>>> So the questions are ... is the "get_next_rx_buffer" part really necessary ?
>>>    - if not, what is the benefit of the "get_next_rx_buffer" and does that outweigh the additional cost of accounting
>>>      of needed_slots for the frags that have yet to be processed ?
>>>    - if yes, erhmmm what would be the best acceptable solution .. returning to using MAX_SKB_FRAGS ?
>>> 

>> I think you need to answer several questions first:
>> 1. is max_slots_needed actually large enough to cover whole SKB?
>         No it's not if, you end up calling "get_next_rx_buffer" one or multiple times when processing the SKB
>         you have the chance of overrunning (or be saved because prod gets upped before you overrun).

>> 2. is the test of ring slot availability acurate?
>         Seems to be.

>> 3. is the number of ring slots consumed larger than max_slots_needed? (I
>>    guess the answer is yes)
>         Yes that was the whole point.

>> 4. which step in the break-down procedure causes backend to overrun the
>>    ring?
>         The "get_next_rx_buffer" call .. that is not accounted for (which can be called
>         multiple times per frag (unless some other none obvious size restriction limits this
>         to one time per frag or one time per SKB).
>         In my errorneous case it is called one time, but it would be nice if there would be some theoretical proof
>         instead of just some emperical test.


>> It doesn't matter how many frags your SKB has and how big a frag is. If
>> every step is correct then you're fine. The code you're looking at
>> (start_new_rx_buffer / get_next_rx_buffer and friend) needs to be
>> carefully examined.

> Well it seems it only calls "get_next_rx_buffer" in some special cases .. (and that also what i'm seeing because it doesn't happen
> continously.

> Question is shouldn't this max_needed_slots calc be reverted to what it was before 3.14 and take the time in 3.15 to figure out a
> the theoretical max (if it can be calculated upfront) .. or another way to guarantee the code is able to process the whole SKB  ?

>> Wei.





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/