netdev - Re: [PATCH] iproute2 flush: handle larger tables and deleted entries

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4e0db5bc0907151504i28ed3a00l85e6767e9ef59921@mail.gmail.com>
Date:	Wed, 15 Jul 2009 15:04:33 -0700
From:	Gautam Kachroo <gk@...stanetworks.com>
To:	Stephen Hemminger <shemminger@...tta.com>
Cc:	Patrick McHardy <kaber@...sh.net>, netdev@...r.kernel.org
Subject: Re: [PATCH] iproute2 flush: handle larger tables and deleted entries

On Wed, Jul 15, 2009 at 12:19 PM, Stephen
Hemminger<shemminger@...tta.com> wrote:
> On Wed, 15 Jul 2009 10:50:57 -0700
> Gautam Kachroo <gk@...stanetworks.com> wrote:
>
>> On Wed, Jul 15, 2009 at 8:19 AM, Patrick McHardy<kaber@...sh.net> wrote:
>> > Gautam Kachroo wrote:
>> >> On Tue, Jul 14, 2009 at 2:38 AM, Patrick McHardy<kaber@...sh.net> wrote:
>> >>> Gautam Kachroo wrote:
>> >>>> use a new netlink socket when sending flush messages to avoid reading
>> >>>> any pending data on the existing netlink socket.
>> >>>>
>> >>>> read all of the response from the netlink request -- this response can
>> >>>> be split over multiple recv calls, pretty much one per netlink request
>> >>>> message. ENOENT errors, which correspond to attempts to delete an
>> >>>> already deleted entry, are ignored. Other errors are not ignored.
>> >>>
>> >>> In which case would there be any pending data? From what I can see,
>> >>> this can only happen when using batching, but in that case the
>> >>> previous command should continue reading until it has received all
>> >>> responses (which the netlink functions appear to be doing properly).
>> >>
>> >> What is the "previous command"?
>> >
>> > The last command before the one executing when using batching.
>>
>> This is independent of batching (I assume you're referring to the
>> -batch option to the ip command).
>> It happens when running a command like "ip neigh flush to 0.0.0.0/0"
>> if there are many neighbor entries.
>>
>> The implementation of flush commands, e.g. ip neigh flush, sends a
>> dump request, e.g. RTM_GETNEIGH, and then sends requests, e.g.
>> RTM_DELNEIGH, *while* there can be unread data from the dump request.
>> There would be unread data if the response to the dump request was
>> split over multiple calls to recvmsg.
>>
>> >> Are you referring to rtnl_dump_filter? If rtnl_send_check comes across
>> >> a failure, rtnl_dump_filter will not continue reading.
>> >>
>> >> Here's the situation that I'm referring to:
>> >>
>> >> If rtnl_send_check detects an error, it returns -1. rtnl_send_check is
>> >> called from flush_update. The multiple implementations of flush_update
>> >> (e.g. in ipneigh.c, ipaddress.c) propagate this return value to their
>> >> caller, e.g. print_neigh or print_addrinfo.
>> >>
>> >> print_neigh, print_addrinfo, etc. are called from rtnl_dump_filter.
>> >> rtnl_dump_filter sits in a loop calling recvmsg on the netlink socket.
>> >> However, it returns the error value if the filter function (e.g.
>> >> print_neigh) returns an error. In this case, rtnl_dump_filter can
>> >> return before it's read all the responses.
>> >> The error return from rtnl_dump_filter causes the program to exit.
>> >
>> > Yes, and I agree with your patch so far. My question is why you
>> > need another socket.
>> >
>> >> use a new netlink socket when sending flush messages to avoid reading
>> >> any pending data on the existing netlink socket.
>> >
>> > Under what circumstances would there be pending data when
>> > performing a new iproute operation?
>>
>> As above, it's not that there is pending data when performing a new
>> iproute operation, it's that there can be pending data while
>> performing a single iproute operation, namely ip <object> flush.
>> The benefit of a new socket is that it won't have any data from the
>> dump request waiting for it.
>
> I posted a better fix (using MSG_PEEK).

Where did you post the fix? I didn't see it on netdev or in the iproute2 git...

I had considered using MSG_PEEK in rtnl_send_check, but I don't think
that notices errors with the requests in the "buf" argument of
rtnl_send_check if there is already pending data -- the recv will peek
the next chunk of the dump response. The error response will be
waiting in the queue after the dump response.
Of course, an error, e.g. EPERM, will eventually be noticed, just not
as early...

thanks,
-gk
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html