netdev - Re: [RFC][PATCH] Fixing SA/SP dumps on netlink/af

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1200491531.4457.91.camel@localhost>
Date:	Wed, 16 Jan 2008 08:52:11 -0500
From:	jamal <hadi@...erus.ca>
To:	Timo Teräs <timo.teras@....fi>
Cc:	netdev@...r.kernel.org
Subject: Re: [RFC][PATCH] Fixing SA/SP dumps on netlink/af_key


Timo,

Thanks for the effort you are putting on this.
I would just speak to the concepts instead of the code - hopefully that
would simulate a discussion (and shorten my email)

On Sun, 2008-13-01 at 14:26 +0200, Timo Teräs wrote:
> Hi all,
> 
> The problem with IPsec SP/SA dumping is that they don't work well with large SPD/SADB:s. 
> In netlink the dumping code is O(n^2) and can miss some entries/dump duplicates if the DB 
> is changed between the recv() calls to read the dump entries. This is due to the entry 
> index counting done in xfrm_user code. With af_key the dump entries are just dropped 
> when the socket receive queue is filled.

nod to the above

> I tried to address all these problems, and added the xfrm_policy and xfrm_state structure 
> to a new linked list that is used solely for dumping. This way the dumps can be done in O(n) 
> and have an iterator point to them. I also modified to af_key to stop dumping when receive 
> queue is filling up and continue it from a callback at recvfrom() when there's more room.
> 
> This patch is against 2.6.23 vanilla. I wanted to get some feedback before I make it against 
> the git tree.
> 
> But I'd like to hear about:
> - if the approach is ok (adding the entries in one more list)?
> - if the new/modified variables, structures and function names are ok?
> - if I should port these against net-2.6 or net-2.6.25 git tree?
> - if I should split this to more than one commit? (maybe xfrm core, xfrm user and af_key parts?)
> 

To answer your last 2 questions:
There are two issues that are inter-mingled in there. The most important
is pf_key not being robust on dump. The other being the accurracy of
netlink dumping - this has much broader implications. I think you need
to separate the patches into those two at least and prioritize on the
pf_key as a patch that you push in. I would also try to make the pf_key
dumping approach to mirror what netlink does today - in the worst case
it would be as innacurate as netlink; which is not bad. I think youll
find no resistance getting such a patch in.

To answer your first 2 questions:

My main dilema with your approach is the policy of maintaining such a
list in the kernel and memory consumption needed vs the innacuracy of
netlink dumps. I have run a system with >400K SPD entries and >100K SAD
entries on a running system and i needed a few Gig of RAM to just make
sure my other apps dont get hit by oom at some point or other.  With
your approach of maintaining extra SA/P D, i would need double the RAM
amount. RAM is relatively cheap these days, but latency involved is not
(and one could argue that CPU cycles are much much cheaper).

Netlink dumping gives up accuracy[1], uses 50% of the memory you are
proposing but abuses more cpu cycles. User space could maintain the list
you are proposing instead - and compensate for the innaccuracies by
watching events[2]
Of course that shifts the accuracy to events which could be lost on
their way to user space. This issue is alleviated a little more with
your approach of keeping the list in the kernel and adding new updates
to the tail of the dump list (which, btw, your patch didnt seem to do);
however, even if you solve that: 
** you are still will be faced with challenges of not being atomic on
updates; example an entry already on your dump list could be deleted
before being read by user space. I cant see you solving that without
abusing a _lot_ more cpu (trying to find the entry on the dump list that
may have gotten updated or deleted). Theres also the issue of how long
to keep the dump list before aging it out and how to deal with multiple
users.

Ok, lets say you throw in the computation in there to walk the dump list
and find the proper entry and find out it only consumes 62% cpu of what
current netlink dump approach does, you are still using twice the RAM
needs. And hence it becomes a policy choice and what would be needed is 
some knob for me to select "i dont care about abusing more memory; i
want more accuracy dammit". I can tell you based on my current
experience i would rather use less RAM - but thats a matter of choice at
the moment because i havent come across scenarios of the conflicts where
user space wasnt able to resolve. 

I apologize for the long email, it would have been longer if i commented
on the code.

I hope you find my comments useful and dont see them as rocks being
thrown at you - rather look at them as worth at least 2-cents Canadian
and will stimulate other people speak out.

cheers,
jamal

[1] In cases where a dump happens to not complete while SAD/SPD are
getting concurently updated, it is feasible that a dump list will have
innacurate details.

[2] Racoon for example doesnt do this, hence it needs to dump from the
kernel when purging SAs.  

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html