netdev - Re: [PATCH net] fib_trie: correct /proc/net/route for large read buffer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0e1e6651-3de6-6a71-cb37-be2942fb70b2@akamai.com>
Date:   Fri, 4 Nov 2016 15:07:08 -0400
From:   Jason Baron <jbaron@...mai.com>
To:     Alexander Duyck <alexander.duyck@...il.com>
CC:     David Miller <davem@...emloft.net>,
        Netdev <netdev@...r.kernel.org>,
        Andy Whitcroft <apw@...onical.com>,
        Alexander Duyck <alexander.h.duyck@...el.com>
Subject: Re: [PATCH net] fib_trie: correct /proc/net/route for large read
 buffer



On 11/04/2016 02:43 PM, Alexander Duyck wrote:
> On Fri, Nov 4, 2016 at 7:45 AM, Jason Baron <jbaron@...mai.com> wrote:
>> From: Jason Baron <jbaron@...mai.com>
>>
>> When read() is called on /proc/net/route requesting a size that is one
>> entry size (128 bytes) less than m->size or greater, the resulting output
>> has missing and/or duplicate entries. Since m->size is typically PAGE_SIZE,
>> for a PAGE_SIZE of 4,096 this means that reads requesting more than 3,968
>> bytes will see bogus output.
>>
>> For example:
>>
>> for i in {100..200}; do
>>         ip route add 192.168.1.$i dev eth0
>> done
>> dd if=/proc/net/route of=/tmp/good bs=1024
>> dd if=/proc/net/route of=/tmp/bad bs=4096
>>
>> # diff -q /tmp/good /tmp/bad
>> Files /tmp/good and /tmp/bad differ
>>
>> I think this has gone unnoticed, since the output of 'netstat -r' and
>> 'route' is generated by reading in 1,024 byte increments and thus not
>> corrupted. Further, the number of entries in the route table needs to be
>> sufficiently large in order to trigger the problematic case.
>>
>> The issue arises because fib_route_get_idx() does not properly handle
>> the case where pos equals iter->pos. This case only arises when we have
>> a large read buffer size because we end up re-requesting the last entry
>> that overflowed m->buf. In the case of a smaller read buffer size,
>> we don't exceed the size of m->buf, and thus fib_route_get_idx() is called
>> with pos greater than iter->pos.
>>
>> Fix by properly handling the iter->pos == pos case.
>>
>> Fixes: 25b97c016b26 ("ipv4: off-by-one in continuation handling in /proc/net/route")
>> Cc: Andy Whitcroft <apw@...onical.com>
>> Cc: Alexander Duyck <alexander.h.duyck@...el.com>
>> Signed-off-by: Jason Baron <jbaron@...mai.com>
>> ---
>>  net/ipv4/fib_trie.c | 12 ++++++++++--
>>  1 file changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
>> index 31cef3602585..1017533fc75c 100644
>> --- a/net/ipv4/fib_trie.c
>> +++ b/net/ipv4/fib_trie.c
>> @@ -2411,12 +2411,17 @@ static struct key_vector *fib_route_get_idx(struct fib_route_iter *iter,
>>                                             loff_t pos)
>>  {
>>         struct key_vector *l, **tp = &iter->tnode;
>> +       loff_t saved_pos = 0;
>>         t_key key;
>>
>>         /* use cache location of next-to-find key */
>>         if (iter->pos > 0 && pos >= iter->pos) {
>>                 pos -= iter->pos;
>>                 key = iter->key;
>> +               if (pos == 0) {
>> +                       saved_pos = iter->pos;
>> +                       key--;
>> +               }
>>         } else {
>>                 iter->pos = 0;
>>                 key = 0;
>> @@ -2436,10 +2441,13 @@ static struct key_vector *fib_route_get_idx(struct fib_route_iter *iter,
>>                         break;
>>         }
>>
>> -       if (l)
>> +       if (l) {
>>                 iter->key = key;        /* remember it */
>> -       else
>> +               if (saved_pos)
>> +                       iter->pos = saved_pos;
>> +       } else {
>>                 iter->pos = 0;          /* forget it */
>> +       }
>>
>>         return l;
>>  }
>
> This doesn't seem correct to me.  I will have to look through this.
> My understanding is that the value of iter->pos is supposed to be the
> next position for us to grab, not the last one that was retrieved.  If
> we are trying to re-request the last value then we should be falling
> back into the else case for this since pos should be one less than
> iter->pos.  The problem is the table could change out from under us
> which is one of the reasons why we don't want to try and rewind the
> key like you are doing here.
>
> - Alex
>

Hi Alex,

In this case, seq_read() has called m->op->next(), which sets iter->pos 
equal to pos and iter->key to key + 1. However, when we then go to 
output the item associated with key, the 'm->op->next()' call overflows. 
Thus, we have a situation where iter->pos equals pos, iter->key = key + 
1, but we have not displayed the item at position 'key' (thus the bug is 
that we miss the item at key).

The change I proposed was simply to restart the search from 'key' in 
this case. If that item has disappeared, we will output the next one, or 
if its been replaced we will display its replacement. I think that is
ok?

The bug could also be fixed by changing:

if (iter->pos > 0 && pos >= iter->pos) {

to say:

if (iter->pos > 0 && pos > iter->pos) {

But that restarts the search on every overflow, which could mean every 
page size, and that seems suboptimal to me. Like-wise, if we make pos 1 
less than iter->pos that restarts the search. The idea with this patch 
is to not force us to redo the entire search on each overflow.

Thanks,

-Jason