[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKgT0Ud7BVHfzbT=CsXn--_WSSt=hofcM3uNJcxdY=kaJ5psOA@mail.gmail.com>
Date: Fri, 4 Nov 2016 11:43:27 -0700
From: Alexander Duyck <alexander.duyck@...il.com>
To: Jason Baron <jbaron@...mai.com>
Cc: David Miller <davem@...emloft.net>,
Netdev <netdev@...r.kernel.org>,
Andy Whitcroft <apw@...onical.com>,
Alexander Duyck <alexander.h.duyck@...el.com>
Subject: Re: [PATCH net] fib_trie: correct /proc/net/route for large read buffer
On Fri, Nov 4, 2016 at 7:45 AM, Jason Baron <jbaron@...mai.com> wrote:
> From: Jason Baron <jbaron@...mai.com>
>
> When read() is called on /proc/net/route requesting a size that is one
> entry size (128 bytes) less than m->size or greater, the resulting output
> has missing and/or duplicate entries. Since m->size is typically PAGE_SIZE,
> for a PAGE_SIZE of 4,096 this means that reads requesting more than 3,968
> bytes will see bogus output.
>
> For example:
>
> for i in {100..200}; do
> ip route add 192.168.1.$i dev eth0
> done
> dd if=/proc/net/route of=/tmp/good bs=1024
> dd if=/proc/net/route of=/tmp/bad bs=4096
>
> # diff -q /tmp/good /tmp/bad
> Files /tmp/good and /tmp/bad differ
>
> I think this has gone unnoticed, since the output of 'netstat -r' and
> 'route' is generated by reading in 1,024 byte increments and thus not
> corrupted. Further, the number of entries in the route table needs to be
> sufficiently large in order to trigger the problematic case.
>
> The issue arises because fib_route_get_idx() does not properly handle
> the case where pos equals iter->pos. This case only arises when we have
> a large read buffer size because we end up re-requesting the last entry
> that overflowed m->buf. In the case of a smaller read buffer size,
> we don't exceed the size of m->buf, and thus fib_route_get_idx() is called
> with pos greater than iter->pos.
>
> Fix by properly handling the iter->pos == pos case.
>
> Fixes: 25b97c016b26 ("ipv4: off-by-one in continuation handling in /proc/net/route")
> Cc: Andy Whitcroft <apw@...onical.com>
> Cc: Alexander Duyck <alexander.h.duyck@...el.com>
> Signed-off-by: Jason Baron <jbaron@...mai.com>
> ---
> net/ipv4/fib_trie.c | 12 ++++++++++--
> 1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
> index 31cef3602585..1017533fc75c 100644
> --- a/net/ipv4/fib_trie.c
> +++ b/net/ipv4/fib_trie.c
> @@ -2411,12 +2411,17 @@ static struct key_vector *fib_route_get_idx(struct fib_route_iter *iter,
> loff_t pos)
> {
> struct key_vector *l, **tp = &iter->tnode;
> + loff_t saved_pos = 0;
> t_key key;
>
> /* use cache location of next-to-find key */
> if (iter->pos > 0 && pos >= iter->pos) {
> pos -= iter->pos;
> key = iter->key;
> + if (pos == 0) {
> + saved_pos = iter->pos;
> + key--;
> + }
> } else {
> iter->pos = 0;
> key = 0;
> @@ -2436,10 +2441,13 @@ static struct key_vector *fib_route_get_idx(struct fib_route_iter *iter,
> break;
> }
>
> - if (l)
> + if (l) {
> iter->key = key; /* remember it */
> - else
> + if (saved_pos)
> + iter->pos = saved_pos;
> + } else {
> iter->pos = 0; /* forget it */
> + }
>
> return l;
> }
This doesn't seem correct to me. I will have to look through this.
My understanding is that the value of iter->pos is supposed to be the
next position for us to grab, not the last one that was retrieved. If
we are trying to re-request the last value then we should be falling
back into the else case for this since pos should be one less than
iter->pos. The problem is the table could change out from under us
which is one of the reasons why we don't want to try and rewind the
key like you are doing here.
- Alex
Powered by blists - more mailing lists