[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAKgT0Uf9=mCoa4hcZs-SAtsdj7ktOnNwD3N8yo7gZsRZVQQjsw@mail.gmail.com>
Date: Fri, 4 Nov 2016 12:13:06 -0700
From: Alexander Duyck <alexander.duyck@...il.com>
To: Jason Baron <jbaron@...mai.com>
Cc: David Miller <davem@...emloft.net>,
Netdev <netdev@...r.kernel.org>,
Andy Whitcroft <apw@...onical.com>,
Alexander Duyck <alexander.h.duyck@...el.com>
Subject: Re: [PATCH net] fib_trie: correct /proc/net/route for large read buffer
On Fri, Nov 4, 2016 at 12:07 PM, Jason Baron <jbaron@...mai.com> wrote:
>
>
> On 11/04/2016 02:43 PM, Alexander Duyck wrote:
>>
>> On Fri, Nov 4, 2016 at 7:45 AM, Jason Baron <jbaron@...mai.com> wrote:
>>>
>>> From: Jason Baron <jbaron@...mai.com>
>>>
>>> When read() is called on /proc/net/route requesting a size that is one
>>> entry size (128 bytes) less than m->size or greater, the resulting output
>>> has missing and/or duplicate entries. Since m->size is typically
>>> PAGE_SIZE,
>>> for a PAGE_SIZE of 4,096 this means that reads requesting more than 3,968
>>> bytes will see bogus output.
>>>
>>> For example:
>>>
>>> for i in {100..200}; do
>>> ip route add 192.168.1.$i dev eth0
>>> done
>>> dd if=/proc/net/route of=/tmp/good bs=1024
>>> dd if=/proc/net/route of=/tmp/bad bs=4096
>>>
>>> # diff -q /tmp/good /tmp/bad
>>> Files /tmp/good and /tmp/bad differ
>>>
>>> I think this has gone unnoticed, since the output of 'netstat -r' and
>>> 'route' is generated by reading in 1,024 byte increments and thus not
>>> corrupted. Further, the number of entries in the route table needs to be
>>> sufficiently large in order to trigger the problematic case.
>>>
>>> The issue arises because fib_route_get_idx() does not properly handle
>>> the case where pos equals iter->pos. This case only arises when we have
>>> a large read buffer size because we end up re-requesting the last entry
>>> that overflowed m->buf. In the case of a smaller read buffer size,
>>> we don't exceed the size of m->buf, and thus fib_route_get_idx() is
>>> called
>>> with pos greater than iter->pos.
>>>
>>> Fix by properly handling the iter->pos == pos case.
>>>
>>> Fixes: 25b97c016b26 ("ipv4: off-by-one in continuation handling in
>>> /proc/net/route")
>>> Cc: Andy Whitcroft <apw@...onical.com>
>>> Cc: Alexander Duyck <alexander.h.duyck@...el.com>
>>> Signed-off-by: Jason Baron <jbaron@...mai.com>
>>> ---
>>> net/ipv4/fib_trie.c | 12 ++++++++++--
>>> 1 file changed, 10 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
>>> index 31cef3602585..1017533fc75c 100644
>>> --- a/net/ipv4/fib_trie.c
>>> +++ b/net/ipv4/fib_trie.c
>>> @@ -2411,12 +2411,17 @@ static struct key_vector
>>> *fib_route_get_idx(struct fib_route_iter *iter,
>>> loff_t pos)
>>> {
>>> struct key_vector *l, **tp = &iter->tnode;
>>> + loff_t saved_pos = 0;
>>> t_key key;
>>>
>>> /* use cache location of next-to-find key */
>>> if (iter->pos > 0 && pos >= iter->pos) {
>>> pos -= iter->pos;
>>> key = iter->key;
>>> + if (pos == 0) {
>>> + saved_pos = iter->pos;
>>> + key--;
>>> + }
>>> } else {
>>> iter->pos = 0;
>>> key = 0;
>>> @@ -2436,10 +2441,13 @@ static struct key_vector
>>> *fib_route_get_idx(struct fib_route_iter *iter,
>>> break;
>>> }
>>>
>>> - if (l)
>>> + if (l) {
>>> iter->key = key; /* remember it */
>>> - else
>>> + if (saved_pos)
>>> + iter->pos = saved_pos;
>>> + } else {
>>> iter->pos = 0; /* forget it */
>>> + }
>>>
>>> return l;
>>> }
>>
>>
>> This doesn't seem correct to me. I will have to look through this.
>> My understanding is that the value of iter->pos is supposed to be the
>> next position for us to grab, not the last one that was retrieved. If
>> we are trying to re-request the last value then we should be falling
>> back into the else case for this since pos should be one less than
>> iter->pos. The problem is the table could change out from under us
>> which is one of the reasons why we don't want to try and rewind the
>> key like you are doing here.
>>
>> - Alex
>>
>
> Hi Alex,
>
> In this case, seq_read() has called m->op->next(), which sets iter->pos
> equal to pos and iter->key to key + 1. However, when we then go to output
> the item associated with key, the 'm->op->next()' call overflows. Thus, we
> have a situation where iter->pos equals pos, iter->key = key + 1, but we
> have not displayed the item at position 'key' (thus the bug is that we miss
> the item at key).
>
> The change I proposed was simply to restart the search from 'key' in this
> case. If that item has disappeared, we will output the next one, or if its
> been replaced we will display its replacement. I think that is
> ok?
>
> The bug could also be fixed by changing:
>
> if (iter->pos > 0 && pos >= iter->pos) {
>
> to say:
>
> if (iter->pos > 0 && pos > iter->pos) {
>
> But that restarts the search on every overflow, which could mean every page
> size, and that seems suboptimal to me. Like-wise, if we make pos 1 less than
> iter->pos that restarts the search. The idea with this patch is to not force
> us to redo the entire search on each overflow.
>
> Thanks,
>
> -Jason
Actually I think the underlying issue is that we still have an
unresolved off by one error. Specifically offset 0 actually
represents two values, the header for the file and entry 0. I think
the fix is for us to look at pushing pos to 1 for the start of the
data, and we reserve POS 0 for SEQ_START_TOKEN. Doing that we should
start at the correct offset for each section following the first page.
- Alex
Powered by blists - more mailing lists