lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 2 Nov 2015 16:08:05 +0900
From:	Namhyung Kim <namhyung@...nel.org>
To:	Tom Zanussi <tom.zanussi@...ux.intel.com>
CC:	rostedt@...dmis.org, daniel.wagner@...-carit.de,
	masami.hiramatsu.pt@...achi.com, josh@...htriplett.org,
	andi@...stfloor.org, mathieu.desnoyers@...icios.com,
	peterz@...radead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v11 08/28] tracing: Add lock-free tracing_map

Hi Tom,

On Thu, Oct 29, 2015 at 01:35:43PM -0500, Tom Zanussi wrote:
> Hi Namhyung,
> 
> On Thu, 2015-10-29 at 17:31 +0900, Namhyung Kim wrote:
> > Hi Tom,
> > 
> > On Thu, Oct 22, 2015 at 01:14:12PM -0500, Tom Zanussi wrote:
> > > Add tracing_map, a special-purpose lock-free map for tracing.
> > > 
> > > tracing_map is designed to aggregate or 'sum' one or more values
> > > associated with a specific object of type tracing_map_elt, which
> > > is associated by the map to a given key.
> > > 
> > > It provides various hooks allowing per-tracer customization and is
> > > separated out into a separate file in order to allow it to be shared
> > > between multiple tracers, but isn't meant to be generally used outside
> > > of that context.
> > > 
> > > The tracing_map implementation was inspired by lock-free map
> > > algorithms originated by Dr. Cliff Click:
> > > 
> > >  http://www.azulsystems.com/blog/cliff/2007-03-26-non-blocking-hashtable
> > >  http://www.azulsystems.com/events/javaone_2007/2007_LockFreeHash.pdf
> > > 
> > > Signed-off-by: Tom Zanussi <tom.zanussi@...ux.intel.com>
> > > Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@...achi.com>
> > > ---
> > > +/**
> > > + * tracing_map_insert - Insert key and/or retrieve val from a tracing_map
> > > + * @map: The tracing_map to insert into
> > > + * @key: The key to insert
> > > + *
> > > + * Inserts a key into a tracing_map and creates and returns a new
> > > + * tracing_map_elt for it, or if the key has already been inserted by
> > > + * a previous call, returns the tracing_map_elt already associated
> > > + * with it.  When the map was created, the number of elements to be
> > > + * allocated for the map was specified (internally maintained as
> > > + * 'max_elts' in struct tracing_map), and that number of
> > > + * tracing_map_elts was created by tracing_map_init().  This is the
> > > + * pre-allocated pool of tracing_map_elts that tracing_map_insert()
> > > + * will allocate from when adding new keys.  Once that pool is
> > > + * exhausted, tracing_map_insert() is useless and will return NULL to
> > > + * signal that state.
> > > + *
> > > + * This is a lock-free tracing map insertion function implementing a
> > > + * modified form of Cliff Click's basic insertion algorithm.  It
> > > + * requires the table size be a power of two.  To prevent any
> > > + * possibility of an infinite loop we always make the internal table
> > > + * size double the size of the requested table size (max_elts * 2).
> > > + * Likewise, we never reuse a slot or resize or delete elements - when
> > > + * we've reached max_elts entries, we simply return NULL once we've
> > > + * run out of entries.  Readers can at any point in time traverse the
> > > + * tracing map and safely access the key/val pairs.
> > > + *
> > > + * Return: the tracing_map_elt pointer val associated with the key.
> > > + * If this was a newly inserted key, the val will be a newly allocated
> > > + * and associated tracing_map_elt pointer val.  If the key wasn't
> > > + * found and the pool of tracing_map_elts has been exhausted, NULL is
> > > + * returned and no further insertions will succeed.
> > > + */
> > > +struct tracing_map_elt *tracing_map_insert(struct tracing_map *map, void *key)
> > > +{
> > > +	u32 idx, key_hash, test_key;
> > > +	struct tracing_map_entry *entry;
> > > +
> > > +	key_hash = jhash(key, map->key_size, 0);
> > > +	if (key_hash == 0)
> > > +		key_hash = 1;
> > > +	idx = key_hash >> (32 - (map->map_bits + 1));
> > > +
> > > +	while (1) {
> > > +		idx &= (map->map_size - 1);
> > > +		entry = TRACING_MAP_ENTRY(map->map, idx);
> > > +		test_key = entry->key;
> > > +
> > > +		if (test_key && test_key == key_hash && entry->val &&
> > > +		    keys_match(key, entry->val->key, map->key_size))
> > > +			return entry->val;
> > > +
> > > +		if (!test_key && !cmpxchg(&entry->key, 0, key_hash)) {
> > > +			struct tracing_map_elt *elt;
> > > +
> > > +			elt = get_free_elt(map);
> > > +			if (!elt)
> > > +				break;
> > > +			memcpy(elt->key, key, map->key_size);
> > > +			entry->val = elt;
> > > +
> > > +			return entry->val;
> > > +		}
> > > +		idx++;
> > > +	}
> > > +
> > > +	return NULL;
> > > +}
> > 
> > IIUC this always insert new entry if no matching key found.  And if
> > the map is full, it only fails after walking through the entries to
> > find an empty one, mark the entry with the key and call to
> > get_free_elt() returns NULL.  As more key added, it worsenes the
> > problem since more entries will be marked with no value IMHO.
> > 
> > I can see you checked hist_data->drops in the next patch to work
> > around this problem.  But IMHO it's suboptimal since it cannot update
> > the existing entries too.  I think it'd be better having lookup-only
> > version of this function and use it after it sees drops.  The lookup
> > function can bail out from the loop if the insert doesn't mark empty
> > entry anymore IMHO.
> > 
> > Thoughts?
> > 
> 
> The assumption has always been that once you have drops (i.e.
> tracing_map_insert() returns NULL), the data can't really be trusted any
> more and tracing should just stop (and presumably be restarted with a
> bigger table).  It doesn't mean that the data is completely useless,
> just that it no longer can be assumed to have captured all the events
> over the tracing run.  Having a lookup-only version for the purpose of
> updating only existing entries sort of illustrates the problem even
> better - in that case only the events that already have entries in the
> table will be included while the events that don't yet have entries will
> be ignored, skewing the value of the data.

I thought it'd be better if users can see which one is the real drop
or not.  IOW if drop count is much smaller than the normal event
count, [s]he might want to ignore the occasional drops.  Otherwise,
[s]he should restart with a bigger table.  This requires accurate
counts of events and drops though.


> 
> On the other hand, if users do end up calling this even after it's
> returned NULL, we should make sure it doesn't result in an infinite
> loop, and cap the number of iterations through the loop.  tracing_map
> wasn't really meant to be generally reusable - it was separated out so
> it could be shared between two tracers - but it wouldn't hurt to add
> that check just in case...

Right.

Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists