[cfe-commits] patch: libcxxabi cxa_*_virtual and cxa_guard_* methods

Howard Hinnant hhinnant at apple.com
Mon May 23 09:13:03 CDT 2011


On May 23, 2011, at 3:22 AM, John McCall wrote:

> 
> On May 22, 2011, at 7:10 PM, Howard Hinnant wrote:
> 
>> On May 22, 2011, at 7:42 PM, John McCall wrote:
>> 
>>> On May 22, 2011, at 4:28 PM, Howard Hinnant wrote:
>>> 
>>>> Question for clang developers:
>>>> 
>>>> Will these namespace-scope definitions ever generate a call to __cxa_guard_acquire?  I know this is forward looking concerning thread_local:
>>> 
>>> The only variable definitions that require _cxa_guard_acquire are static locals and static data members of templated classes.
>> 
>> Ok, thanks John.
>> 
>> I've got a prototype here.  It is kind of messy.  But it is basically derived from Nick's code which I think is a good fundamental design.  Though I've rejected spin locks.  It would work best with thread_local data.  But for obvious reasons I haven't tested that branch and am emulating thread_local with pthread_key_t.  This implementation is also sensitive to endian because the Itanium spec is in terms of "first byte" instead of "high byte".
>> 
>> In a nutshell, I develop a thread id that can be stored in size_t.  An assumption is that the number of threads in the lifetime of an application will be less than 2^32 on a 32 bit architecture, and less than 2^56 on a 64 bit architecture.  This thread id is always non-zero, and is nothing more than a sequential count of threads.  Note that it is not a count of currently active threads, but a count of the number of times a thread is created.  pthread_self() is specifically not used for this purpose because it often takes up more storage space than the 56 bits we have available.
> 
> A thread-local with a mandatory initializer seems like a lot of overhead for a QoI thing like aborting on a recursive initialization — I dunno, maybe I'm over-thinking things.  I do think that it would be good to allow platforms to opt out, or at least to provide a different implementation if the platform happens to support a range-restricted TID.
> 
> In particular, while Darwin's pthread_t is a pointer, Darwin does provide mach_thread_self(), which returns a mach_port_t, which is (ultimately) an unsigned int.

I didn't know this, thanks!  I will look into taking advantage of it.

>  This is the sort of thing that most OSes probably have an equivalent of.  For example, on Linux (maybe only certain distributions?), you can use gettid(), which is a signed int.
> 
> Just to make your life more complicated, guard variables on ARM are only 32 bits, but they give the implementation 31 bits to play with:  the compiler's fast-path check only considers whether the low *bit* is non-zero.
> 
> I don't think you need to worry about platforms that use the Itanium ABI but that don't have pthreads. :)  Undoubtedly someone will correct me, though.
> 
>> An assumption (and clang developers will have to tell me if it is correct or not) is that the compiler will do a proper double-checked locking dance prior to calling __cxa_guard_acquire.
> 
> I'm not sure what code you're imagining that the compiler might emit here;  if the compiler emitted a full and proper double-checked locking dance, it would not need to call __cxa_guard_acquire.  The compiler has the option of emitting an unprotected check that the first byte is non-zero, subject only to the restriction that accesses to the protected global cannot be re-ordered around that check.  It's not required to do any such thing, though.

Yes, that was what I was thinking, but also on some platforms (almost certainly not intel) an atomic read may be necessary.  I'm weak in this area though.  

> 
> In short, for a multitude of reasons, __cxa_guard_acquire must be prepared for the possibility that the variable has already been initialized by the time it grabs the lock.

Oh, yes, absolutely.  The danger I'm worried about is the compiler getting a false '1' out of the first byte (using a non-atomic read) before the initializing thread is quite done.  I'm always confused as to how that might happen, but have been repeatedly warned by those more knowledgeable than me in this area, that it is possible on platforms with sufficiently weak memory ordering.

http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html

> 
> That said, it's probably best to assume that the compiler did a preliminary check and that it's not worth performing a second such check before trying to grab the lock.

Yes, that is the question I was asking, and the answer I was assuming, thanks.


> 
>> Comments, suggestions welcome.
> 
> Pedantic point of the week:  'initialized' should be specifically a char*, not an int8_t*.  int8_t may be 'signed char', which in C++ does not have the omnipotent aliasing power of 'char' or 'unsigned char'.

Ok, thanks.

> 
> Also, I don't see a reason to clear the "lock" on cxa_guard_release, and on every other path, set_lock can safely assume that the first byte of the guard is zero.

Ok, thanks.

> 
> But this looks like a great start!
> 
> John.

Howard




More information about the cfe-commits mailing list