netdev - Re: [PATCH] iproute2 flush: handle larger tables and deleted entries

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4e0db5bc0908201708s2d1a4523q2a09eac418061893@mail.gmail.com>
Date:	Thu, 20 Aug 2009 17:08:07 -0700
From:	Gautam Kachroo <gk@...stanetworks.com>
To:	Stephen Hemminger <shemminger@...tta.com>
Cc:	Patrick McHardy <kaber@...sh.net>, netdev@...r.kernel.org
Subject: Re: [PATCH] iproute2 flush: handle larger tables and deleted entries

On Wed, Jul 15, 2009 at 3:04 PM, Gautam Kachroo<gk@...stanetworks.com> wrote:
> On Wed, Jul 15, 2009 at 12:19 PM, Stephen
> Hemminger<shemminger@...tta.com> wrote:
>> On Wed, 15 Jul 2009 10:50:57 -0700
>> Gautam Kachroo <gk@...stanetworks.com> wrote:
>>
>>> On Wed, Jul 15, 2009 at 8:19 AM, Patrick McHardy<kaber@...sh.net> wrote:
>>> > Gautam Kachroo wrote:
>>> >> On Tue, Jul 14, 2009 at 2:38 AM, Patrick McHardy<kaber@...sh.net> wrote:
>>> >>> Gautam Kachroo wrote:
>>> >>>> use a new netlink socket when sending flush messages to avoid reading
>>> >>>> any pending data on the existing netlink socket.
>>> >>>>
>>> >>>> read all of the response from the netlink request -- this response can
>>> >>>> be split over multiple recv calls, pretty much one per netlink request
>>> >>>> message. ENOENT errors, which correspond to attempts to delete an
>>> >>>> already deleted entry, are ignored. Other errors are not ignored.
>>> >>>
>>> >>> In which case would there be any pending data? From what I can see,
>>> >>> this can only happen when using batching, but in that case the
>>> >>> previous command should continue reading until it has received all
>>> >>> responses (which the netlink functions appear to be doing properly).
>>> >>
>>> >> What is the "previous command"?
>>> >
>>> > The last command before the one executing when using batching.
>>>
>>> This is independent of batching (I assume you're referring to the
>>> -batch option to the ip command).
>>> It happens when running a command like "ip neigh flush to 0.0.0.0/0"
>>> if there are many neighbor entries.
>>>
>>> The implementation of flush commands, e.g. ip neigh flush, sends a
>>> dump request, e.g. RTM_GETNEIGH, and then sends requests, e.g.
>>> RTM_DELNEIGH, *while* there can be unread data from the dump request.
>>> There would be unread data if the response to the dump request was
>>> split over multiple calls to recvmsg.
>>>
>>> >> Are you referring to rtnl_dump_filter? If rtnl_send_check comes across
>>> >> a failure, rtnl_dump_filter will not continue reading.
>>> >>
>>> >> Here's the situation that I'm referring to:
>>> >>
>>> >> If rtnl_send_check detects an error, it returns -1. rtnl_send_check is
>>> >> called from flush_update. The multiple implementations of flush_update
>>> >> (e.g. in ipneigh.c, ipaddress.c) propagate this return value to their
>>> >> caller, e.g. print_neigh or print_addrinfo.
>>> >>
>>> >> print_neigh, print_addrinfo, etc. are called from rtnl_dump_filter.
>>> >> rtnl_dump_filter sits in a loop calling recvmsg on the netlink socket.
>>> >> However, it returns the error value if the filter function (e.g.
>>> >> print_neigh) returns an error. In this case, rtnl_dump_filter can
>>> >> return before it's read all the responses.
>>> >> The error return from rtnl_dump_filter causes the program to exit.
>>> >
>>> > Yes, and I agree with your patch so far. My question is why you
>>> > need another socket.
>>> >
>>> >> use a new netlink socket when sending flush messages to avoid reading
>>> >> any pending data on the existing netlink socket.
>>> >
>>> > Under what circumstances would there be pending data when
>>> > performing a new iproute operation?
>>>
>>> As above, it's not that there is pending data when performing a new
>>> iproute operation, it's that there can be pending data while
>>> performing a single iproute operation, namely ip <object> flush.
>>> The benefit of a new socket is that it won't have any data from the
>>> dump request waiting for it.
>>
>> I posted a better fix (using MSG_PEEK).
>
> Where did you post the fix? I didn't see it on netdev or in the iproute2 git...

> I had considered using MSG_PEEK in rtnl_send_check, but I don't think
> that notices errors with the requests in the "buf" argument of
> rtnl_send_check if there is already pending data -- the recv will peek
> the next chunk of the dump response. The error response will be
> waiting in the queue after the dump response.
> Of course, an error, e.g. EPERM, will eventually be noticed, just not
> as early...

I saw commit 2d8240f8d95dfdc276dcf447623129fb5ccedcd6.

Using MSG_PEEK will prevent pending data from being removed during the
check for errors, but re-using the same socket means that errors won't
be detected until all the pending data has been read.

rtnl_send_check still treats ENOENT as an error. It seems better for
flush to ignore ENOENT. That way a flush will not be disrupted by an
entry being removed since that's not really an error for a flush
operation.

thanks,
-gk


> thanks,
> -gk
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html