[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20081028.155817.216873568.davem@davemloft.net>
Date: Tue, 28 Oct 2008 15:58:17 -0700 (PDT)
From: David Miller <davem@...emloft.net>
To: fernando@....ntt.co.jp
Cc: netdev@...r.kernel.org
Subject: Re: [RFC][PATCH] xfrm: do not leak ESRCH to user space
From: Fernando Luis Vázquez Cao <fernando@....ntt.co.jp>
Date: Fri, 24 Oct 2008 10:05:00 +0900
> On Thu, 2008-10-23 at 14:11 -0700, David Miller wrote:
> > From: Fernando Luis Vázquez Cao <fernando@....ntt.co.jp>
> > Date: Thu, 23 Oct 2008 23:27:19 +0900
> >
> > > I noticed that, under certain conditions, ESRCH can be leaked from the
> > > xfrm layer to user space through sys_connect. In particular, this seems
> > > to happen reliably when the kernel fails to resolve a template either
> > > because the AF_KEY receive buffer being used by racoon is full or
> > > because the SA entry we are trying to use is in XFRM_STATE_EXPIRED
> > > state.
> > >
> > > However, since this could be a transient issue it could be argued that
> > > EAGAIN would be more appropriate. Besides this error code is not even
> > > documented in the man page for sys_connect (as of man-pages 3.07).
> > >
> > > What is the expected behavior (I could not find anything in the RFCs)?
> > > Should we just fix the connect(2) man page instead?
> >
> > I think this case requires some care.
> >
> > -EAGAIN tells the caller that it is a temporary failure and that
> > retrying can be expected to succeed eventually (some resource is not
> > available at the moment). So applications loop when they see this
> > error returned, they will try again.
> >
> > But that's not what is happening when ESRCH is signalled. We found
> > no matching policy, and we've done nothing to make such a policy
> > be found in the (near) future. It is more of a hard failure, which
> > should not necessarily be retried over and over again.
> >
> > So converting this to -EAGAIN doesn't seem correct at all.
>
> That would be so if -ESRCH did not happen to be a transient error.
It is not set in transient conditions as far as I can see.
Look at xfrm_state_find() which is where this error is generated and
then propagates down to xfrm_tmpl_resolve_one().
In xfrm_state_find() if an acquire is in progress to resolve the
entry, the code explicitly converts all errors into -EAGAIN.
> Looking at the code, the window during which an entry is in
> XFRM_STATE_EXPIRED state seems to be about 2 seconds in the worst case.
> Connection attempts before and after that window would most likely
> result in a successful connection or -EAGAIN, respectively. Would not it
> make sense to return -EAGAIN also during that 2 seconds window?
Only if an acquire has been triggered and is in progress, which as
explained above the code already seems to handle.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists