linux-kernel - Re: Process Hang in __read_seqcount

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1351053164.6537.95.camel@edumazet-glaptop>
Date:	Wed, 24 Oct 2012 06:32:44 +0200
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Peter LaDow <petela@...ougs.wsu.edu>
Cc:	linux-kernel@...r.kernel.org
Subject: Re: Process Hang in __read_seqcount_begin

On Tue, 2012-10-23 at 17:15 -0700, Peter LaDow wrote:
> (Sorry for the subject change, but I wanted to try and pull in those
> who work on RT issues, and the subject didn't make that obvious.
> Please search for the same subject without the RT Linux trailing
> text.)
> 
> Well, more information.  Even with SMP enabled (and presumably the
> migrate_enable having calls to preempt_disable), we still got the
> lockup in iptables-restore.  I did more digging, and it looks like the
> complete stack trace includes a call from iptables-restore (through
> setsockopt with IPT_SO_SET_ADD_COUNTERS).  This seems to be a
> potential multiple writer case where the counters are updated through
> the syscall and the kernel is updating the counters as it filters
> packets.
> 
> I think there might be a race on the update to xt_recseq.sequence,
> since the RT patches remove the spinlock in seqlock_t.  Thus multiple
> writers can corrupt the sequence count. 



>  And I thought the SMP
> configuration would disable preemption when local_bh_disable() is
> called.  And indeed, looking at the disassembly, I see
> preempt_disable() (though optimized, goes to add_preempt_count() ) is
> being called.
> 
> Yet we still see the lockup in the get_counters() in iptables.  Which,
> it seems, is because of some sort of problem with the sequence.  It
> doesn't appear to be related to the preemption, and perhaps there is
> some other corruption of the sequence counter happening.  But the only
> places I see it modified is in xt_write_recseq_begin and
> xt_write_recseq_end, which is only in the netfilter code
> (ip6_tables.c, ip_tables.c, and arp_tables.c).  And every single call
> is preceeded by a call to local_bh_disable().
> 
> This problem is a huge one for us.  And so far I'm unable to track
> down how this is occurring.
> 
> Any other tips?  I presume this is the proper place for RT issues.

Hmm...

Could you try following patch ?

diff --git a/include/linux/netfilter/x_tables.h b/include/linux/netfilter/x_tables.h
index 8d674a7..053f9e7 100644
--- a/include/linux/netfilter/x_tables.h
+++ b/include/linux/netfilter/x_tables.h
@@ -471,30 +471,24 @@ DECLARE_PER_CPU(seqcount_t, xt_recseq);
  *
  * Begin packet processing : all readers must wait the end
  * 1) Must be called with preemption disabled
- * 2) softirqs must be disabled too (or we should use this_cpu_add())
  * Returns :
  *  1 if no recursion on this cpu
  *  0 if recursion detected
  */
 static inline unsigned int xt_write_recseq_begin(void)
 {
-	unsigned int addend;
-
+	unsigned int old, new;
+	seqcount_t *ptr = &__get_cpu_var(xt_recseq);
 	/*
 	 * Low order bit of sequence is set if we already
 	 * called xt_write_recseq_begin().
 	 */
-	addend = (__this_cpu_read(xt_recseq.sequence) + 1) & 1;
-
-	/*
-	 * This is kind of a write_seqcount_begin(), but addend is 0 or 1
-	 * We dont check addend value to avoid a test and conditional jump,
-	 * since addend is most likely 1
-	 */
-	__this_cpu_add(xt_recseq.sequence, addend);
-	smp_wmb();
-
-	return addend;
+	old = ptr->sequence;
+	new = old | 1;
+	/* Note : cmpxchg() is a memory barrier, we dont need smp_wmb() */
+	if (old != new && cmpxchg(&ptr->sequence, old, new) == old)
+		return 1;
+	return 0;
 }
 
 /**


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/