This is the mail archive of the kawa@sources.redhat.com mailing list for the Kawa project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

future model of symbols and environments


This is FYI: Writing down some thoughts of mine about how to improve
Kawa's model of environments.  I don't know when I'll get around to
implementing this, but I hope it's not too far off.

The basic Kawa model for global environments is simple and works ok
for Scheme:
* An environment maps symbols to variables.
* Each thread has a current environment, but multiple treads can share
an environment.

However, the model is more complicated for Common Lisp and XQuery,
which has two-level names: a qualified name consists of a local name
and a package/namespace.  So Kawa has ended up with a model where
Environment does double duty as a namespace/package, and a Symbol does
double duty as a variable.  So things work if we use one namespace per
thread/context, or if we use multiple namespaces in a single
thread/context, but fall apart if we have multiple namespaces in
multiple threads/contexts.

Other issues: How fluid bindings fit into this is awkward; I'd like to
implement the full Common Lisp package/symbol semantics, rather than
the current half-baked solution; for XML I want namespace-qualified
keywords; finally, the sharing semantics (thread-safety issues) are
unclear.

So the plan is something like the following:

* A Package is a map from strings (print names) to Symbols.

* A Symbol has an immutable print-name (an unqualified String) and a
mutable pointer to the "home" Package.  (Common Lisp requires support
for unintered Symbols, which have no home package, and allows changing
the home package of a symbol.  It follows that hashing on Symbols
should use just the print-name, not the Package name, as that can
change.)

* A Symbol has no other mutable state: No variable or function or
property list cell.  A Symbol is a global shared between threads,
repls, and languages.

* Packages can inherit from other Packages (that it "uses", in Common
Lisp terminology).

* A Common Lisp package is implemented as a private Package that
inherits from a public Package.  When a Symbol is exported, it gets
moved from the private Package to its parent public Package.

* Packages may have names, nicknames, URIs and other properties, that
allow us to map from printed representation of Packages and Symbols
and back again.  Howeever, this mapping may need to be thread- and
language-dependent.

* Package lookup and modification are synchronized, but since these
operations are usually done at read time or class initialization time
performance isn't critical.

* There is a default Package with an empty name.

* For backwards compatibility, a Scheme symbol in the default
(unnamed) Package is represented in Scheme applications as an interned
Java String.  It is equivalent to (and has the same variable bindings)
as the equivalently-named Symbol in the default Package.  The two
might compare as "equal?" (and maybe "eql?") though not alas "eq?".

* A Variable can contain a value (any Object) or be unbound.  It is an
abstract class that has "get", "set", and "isBound" methods.  (We
might add convenience methods to get/set primitive types.  They would
default to boxing/unboxing, but may be more efficient in some cases.)

* A SimpleVariable is a Variable implemented using a plain "Object
value" field.  It is not synchronized.

* An IndirectVariable is a Variable that forwards get/set/ isBound
requests to another Variable.  Its methods are synchronized on the
other (target) Variable.

* A FieldVariable is implemented using reflection on a a static field
or an (instance-field + Object) pair.  It is not synchronized.

* An Environment is a mapping from Symbols to Variables:
  Variable lookup(Symbol);
It also has convenience methods:
  Object get(Symbol name);
  Object get(Symbol name, Object defaultValue);
  void set(Symbol name, Object value);
  boolean isBound(Symbol name);
These are equivalent to (but might be more efficient than)
the corresponding methods on a Variable.

* A simple Environment can be implemented as a hash table from Symbols
to Variables.  For speed it is not synchronized.

* A SharedEnvironment is a wrapper around Environment that
synchronizes on the target Environment while invoking requests on the
latter.  Furthermore, lookup(Symbol) will return a synchronized
IndirectVariable.  This class should be used whenever multiple threads
may need to access the same Environment.

* It might be posssible for Environments to inherit from other
Environments or share Variables, though such inheritance can be
simulated with IndirectVariables.  I don't know what sharing semantics
one might need: Certainly one needs an efficient way to populate a new
Environment with builtin default bindings.  We can handle with a
special case for importing a constant (immutable) Environment.

* Each thread (via its per-thread CallContext instance) has a current
Environment.  Multiple threads can share the same Environment, in
which case it should be a SharedEnvironment.

* The "value of a global binding" is defined by looking up the
Variable in the current thread's current Environment.

* In Common Lisp and Emacs Lisp a symbol may also have a function
binding and a property list.  This can be handled multiple ways:
(a) Each Variable may a value "cell" plus a function "cell" plus a
"plist" cell.  This complicates the mental model of a Variable, so it
seems a bad idea.
(b) An Environment can manage multiple mappings.  In addition to
lookup/get there would be lookupFunction/getFunction.  This
complicates both the implementation and mental model, so I'm not keen
on it.  However, you can get a similar effect by:
(c) A separate Environment for function bindings.  Each thread can
have both a value Environment and a function Environment.  The
function Environment need not be referenced directly in the thread's
CallContext, but can instead be accessed using a convention: Looking
up a special magic '$function$ Symbol in the value Environment returns
the corresponding function Environment.  I haven't evaluated how this
would interact with environment sharing/imports.
(d) Mapping a Symbol to a corresponding "function symbol".  Looking up
a regular symbol in an Environment yields the value binding; looking up
the "function symbol" in the same Environment yields the original
Symbol's function binding.  We can do this mapping using a special
global Environment that maps regular Symbols to generated "function
symbols".  Note this mapping can be done at class initialization time.
Solution (c) and (d) are most promising; (d) looks likely to be the
more robust solution.

* Fluids bindings (as in fluid-let - which doesn't appear to be in the
Kawa manual -oops) are trivial if the current Environment is not
shared with another thread: Just save the old value and later restore
it.  If multiple threads share an Environment, we don't want other
threads to see the temporary change.  Even more complicated is if a
new shared Environment (presumably for a child thread) is created
while we're inside a fluid binding: Should the new thread inherit the
temporary binding?  Should it get a new binding, but initialized to
the temporary value?

X In a static module, a global binding is compiled to a static field
referencing its Symbol (resolved at class initialization time).
Getting/setting the value means looking up the Symbol in the current
Environment.  If the compiler is told the module will only be used in
a single thread, it can also do the lookup from Environment to
Variable at class initialization time, though we do have the issue of
redefinitions (changing the Variable bound to a Symbol) - see below.

* In a non-static module, we assume that there is one instance of the
module's compiled class for each Environment.  To "require" a module,
we look for a module instance by looking up its name (as a Symbol) in
the current Environment, allocating a new instance if needed.  We can
lookup each Variable when the module instance is allocated.  For
variables that are "owned by" (local to or exported from) this module,
we can "inline" them as simple fields for further speed ups.
Conceptually, these would be FieldVariables, but with access inlined
by the compiler.  Imported variables are imlemented by accessing the
corresponding field in the imported module instance, where an imported
module instance is accessed as a simle field access.

* "Hot-swapping" of running code can be implemented without special VM
support (with some restrictions) using non-static modules: Just
compile a new module as a new class that extends the original class,
and create a new instance, copying all fields over to the new.  The
tricky part is to find and update pointers that referrence the old
module instance.  For that we need some kind of "dependency table",
which might make use of reflection.  (An alternative is to add an
extra layer of indirection, which slows everything down.)

* Re-definition is changing the actual Variable mapped to a Symbol in
an Environment, rather than just changing the value.  For example we
might change it to a FieldVariable or an IndirectVariable.  These also
complicate effiviency: If we want to cache a Variable for performance,
that is another reason we may need a "dependency table".

--
	--Per Bothner
per@bothner.com   http://per.bothner.com/



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]