[PATCH] OpenACC reference count consistency checking

Julian Brown julian@codesourcery.com
Thu May 7 16:11:09 GMT 2020


Sorry about the delay replying to this email!

On Thu, 30 Jan 2020 16:21:20 +0100
Thomas Schwinge <thomas@codesourcery.com> wrote:

> Hi Julian!
> 
> Notwithstanding the open question about how to implement this
> checking in libgomp in a non-intrusive (performance-wise) yet
> maintainable (avoid '#if 0') way, I have two more questions.
> 
> 
> Is there a specific reason why this checking isn't also enabled for
> libgomp OpenMP 'target' entry points?

Just that it was developed in the context of adding manual deep-copy
support to OpenACC -- OpenMP wasn't my focus at that point. So, I
didn't try adding checking for OpenMP also. It might be interesting to
see how that goes though, particularly with regards to dynamic data
lifetimes in OpenMP.

> Can you please explain (textually?) how this checking (design per your
> textual description below) is working in context of mixed OpenACC
> structured ("S") and dynamic ("D") reference counts?  For example:
> 
>     // S: 0, D: 0
>     
>     #pragma acc enter data copyin ([data]) // copyin; S: 0, D: 1
>     
>     acc_copyin ([data]) // no-op; S: 0, D: 2

Unfortunately it's not quite that simple. The "refcount" fields (in
either splay tree keys or target_mem_descs) do not really represent
program-level reference counts, but rather references in the linked
splay tree structure within libgomp. That's correct: the refcounts are
used so as to know when data is still live, and when it can be freed.

Structured data mapping operations ("acc data", "acc parallel", etc.)
always create a target_mem_desc, with a list of target_var_descs that
describe data mapped in that structured block. That target_mem_desc
either "owns" a block of target memory corresponding to the structured
data block, or it doesn't.

We might have something like this (excuse ASCII art!):

   +===================+        +=================+
   | TARGET_MEM_DESC 1 |   ,-->	| TARGET_VAR_DESC |
   +-------------------+   |	+-----------------+
   | tgt_start...      |   |    | splay_tree_key  | --> ... 
   +-------------------+   |    +=================+
   | target_var_desc 0 | --' 
   | target_var_desc 1 | ---.  	+=================+
   | target_var_desc 2 | -. `-> | TARGET_VAR_DESC |
   +===================+  |     +-----------------+
			  |	| splay_tree_key  | --> ...  
			  |	+=================+  
			  |			     
			  |    	+=================+  
			  `--->	| TARGET_VAR_DESC |  
				+-----------------+  
   +=================+   .-----	| splay_tree_key  |
   | SPLAY_TREE_KEY  | <-'	+=================+
   +-----------------+	 
   | target_mem_desc | -.       +===================+
   +=================+	'-----> | TARGET_MEM_DESC 2 |
				+-------------------+
				| tgt_start...      |
				+-------------------+
				| target_var_desc   |
				+===================+

(Non-virtual/non-dynamic) reference counts correspond to the arrows
between blocks in the diagram (for the pointed-to block --
target_mem_desc or splay tree key).

For a structured data mapping, say "TARGET_MEM_DESC 1" is the descriptor
returned from gomp_map_vars.

Now, "TARGET_MEM_DESC 1" and "TARGET_MEM_DESC 2" can be the same block,
or different blocks. (Each of the TARGET_MEM_DESCs linked from splay
tree keys, linked from TARGET_VAR_DESCs, can be a mix of such
identical or different blocks for each of the splay tree keys linked
from TARGET_VAR_DESCs.) In the case where they're different blocks, and
TARGET_MEM_DESC 2 (etc.) owns its own mapped memory, TARGET_MEM_DESC 1
may have a NULL tgt_start -- thus, not own a target data block itself.

In the case of a dynamic mapping, this subtlety is especially
important. A target_mem_desc being returned from
gomp_map_vars{_internal} with a refcount of zero -- one which no splay
tree keys link back to, because it does not own its own block of target
memory -- is discarded before the function returns.

So, the first time a dynamic data mapping takes place for DATA, we have:

>     // S: 0, D: 0
>     
>     #pragma acc enter data copyin ([data]) // copyin; S: 1, D: 0

This is because the target_mem_desc created to describe on-target
memory for DATA will "own" that data: nothing has referred to it
beforehand. So there's a "real" link from the splay tree key for DATA's
host region to the target_mem_desc we just created. (Yes, the
splay tree key's reference counts look just like a structured data
mapping. That was a subject for another patch.)

>     acc_copyin ([data]) // no-op; S: 2, D: 1

So now we have another dynamic mapping. This time, we already have a
target_mem_desc describing DATA on the target. The
gomp_map_vars_internal function will return NULL -- but before it does
that, it realises that it will "lose" references in doing so. Those are
the ones linked via the discarded target_mem_desc's variable list to
splay tree keys that are referred to in the dynamic mapping operation.

For OpenACC, that's where the "virtual" refcount comes in -- to keep
track of those "lost" dynamic references. In particular, the "virtual"
refcount is the count by which the structured reference count must be
decremented when we hit an OpenACC "finalize" operation. Without that
(cf. OpenMP), we probably wouldn't need it.

>     #pragma acc data copyout ([data]) // no-op; S: 1, D: 2
>       {
>         acc_create ([data]) // no-op; S: 1, D: 3
>         
>         #pragma acc data create ([data]) // no-op; S: 2, D: 3
>           {
>             #pragma acc parallel copyout ([data]) // no-op; S: 3, D: 3
>               {
>               } // no-op; S: 2, D: 3
>     
>             acc_delete_finalize ([data]) // no-op; S: 2, D: 0
>     
>             acc_create ([data]) // no-op; S: 2, D: 1
>           } // no-op; S: 1, D: 1
>     
>         #pragma acc exit data delete ([data]) // no-op; S: 1, D: 0
>       } // copyout; S: 0, D: 0
>     
>     assert (!acc_is_present ([data]));
> 
> (Haven't compiled but I'm reasonably sure that the nesting and my
> manual "[action]; [S], [D]" annotations are correct.  But please
> verify, if course.)

I'm sure to make a mistake if I try to work through the rest of the
reference counts :-).

Let me know if that helps.

Thanks,

Julian

> On 2018-11-30T03:50:24-0800, Julian Brown <julian@codesourcery.com>
> wrote:
> > The model used for checking is as follows.
> >
> >  1. Each splay tree key that references a target memory descriptor
> >     increases that descriptor's refcount by 1.
> >
> >  2. Each variable listed in a target memory descriptor that links
> > back to a splay tree key increases that key's refcount by 1. Each
> > target memory descriptor's variable list is counted only once, even
> > if multiple splay tree keys point to it (via their "tgt" field).
> >
> >  3. Additional ("real") target memory descriptors may be present
> >     representing data mapped through "acc data" or "acc
> > parallel/kernels" blocks.  These descriptors have their refcount
> > bumped, and the variables linked through such blocks have their
> > refcounts bumped also (again, with "once only" semantics).
> >
> >  4. Asynchronous operations "artificially" bump the reference
> > counts for referenced target memory descriptors (but *not* for
> > linked variables/splay tree keys), in order to delay freeing mapped
> > device memory until the asynchronous operation has completed.  We
> > model this, for checking purposes only, using an off-side linked
> > list.
> >
> >  5. "Virtual" reference counts ("virtual_refcount") cannot be
> > checked purely statically, so we add the incoming value to each
> > key's statically-determined reference count ("refcount_chk"), and
> > make sure that the total matches the incoming reference count
> > ("refcount").


More information about the Gcc-patches mailing list