On Clojure

February 23, 2010

Generating deftype forms in macros

Filed under: General — khinsen @ 2:36 pm

One of the common uses of macros in Clojure, as in other Lisp dialects, is to abstract away boilerplate code. Instead of writing very similar lengthy forms several times, one defines a macro that specializes a template for each particular use, and then uses that macro a few times. The template is usually written using syntax-quote: the template form is preceded by a backquote, and inside the template a tilde marks expressions that are replaced by their values.

Syntax-quote has one more effect: it resolves all symbols in the current namespace (the one in which the macro is defined, not the one where it is used) and replaces the unqualified symbol by its namespace-qualified equivalent. For most symbols in most forms, this is the right thing to do in order to make the macro work in any namespace, as well as to avoid unwanted variable capture. More specifically, it is the right thing to do for symbols that are defined by the macro, and for symbols that will ultimately be evaluated (names referring to vars, in particular function names). It is not the right thing to do for symbols bound locally inside the form (function parameter names, symbols bound in a let form). And it is also not the right thing to do for symbols that just stand for themselves and are used in some special way by the form that the macro expands to.

The latter situation is particularly frequent in macros that generate deftype forms. Consider for example the following deftype form, which is a simplified version of the type definition used in my multiarray design study:

(deftype multiarray

  [descriptor
   data-array]
  :as this

  Object
    (equals [o] ...)
    (hashCode [] ...)

   clojure.lang.Counted
     (count [] ...)

   clojure.lang.Indexed
     (nth [i] ..)

   clojure.lang.Sequential

   clojure.lang.Seqable
     (seq [] ...))

Of all the symbols shown in the above example, the only one for which namespace-resolution is appropriate is multiarray, the name of the type being defined. All the other symbols name fields of the type, Java interfaces, or methods. They must remain unqualified. In real-life deftypes, there are of course symbols that could or should be namespace-qualified, in particular most of the symbols used inside the method definitions, which are just like function definitions. However, method definitions are often short, and rarely subject to variable capture, meaning that not namespace-resolving those symbols is rarely a problem.

In a syntax-quote template, there are two ways to deal with symbols for which the default (namespace-resolution) is not appropriate:

  • Prefixing with ~' (tilde + quote). This is a special case of an expression inside a template, whose value is the quoted symbol. A tilde-quoted symbol is taken over into the instantiated template without namespace-resolution.
  • Postfixing with # (hash sign). Such symbols are replaced with system-generated symbols that are guaranteed to be different from any other symbol in existence. This is another technique to avoid variable capture.

For generating a deftype form from a syntax-quote template, the only solution is thus to prefix all the symbols shown in the example above with tilde-quote. I tried: it works, but it’s a mess. It’s not very readable, and the inevitable mistakes lead to unpleasant error messages.

Well, this is Lisp, and in Lisp you are always free to make your own tools if you are not happy with the ones provided by the system. What I want here is a template expansion system that doesn’t do namespace resolution on symbols. However, I didn’t need a full-blown equivalent to syntax-quote templates either, given that I would use those deftype templates only for one application. So I came up with the following definitions, which for me are the right compromise between simplicity and useability:

(defn instantiate-template
  [substitution-map form]
  (clojure.walk/prewalk
   (fn [x] (if (and (sequential? x) (= (first x) 'clojure.core/unquote))
	     (substitution-map (second x))
	     x))
   form))

(defmacro template
  [substitutions form]
  (let [substitution-map (into {} (map (fn [[a b]]
					 [(list 'quote a) b])
				       (partition 2 substitutions)))]
    `(instantiate-template ~substitution-map (quote ~form))))

Compared to syntax-quote, this has two restrictions: it has no splicing, and it admits only symbols after a tilde, not arbitrary expressions. The template macro takes a let-like vector as its first argument. This vector contains the symbol-value pairs for substitution inside the template. The second argument is the template form, which presumably contains tilde-prefixed symbols for substitution. Note that the Clojure reader translates ~x to <code (clojure.core/unquote x), which is what the above code searches for.

Here is an example for using such templates:

(defmacro foo [typename fieldname]
  (template [type  typename
	     field fieldname]
    (deftype ~type
      [~field])))

(foo bar boo)

(bar 42)

This prints #:bar{:boo 2}, illustrating that the macros does what it is expected to do. Of course this is not the perfect example for the utility of my little template instantiation system, as it could just as well be written using syntax-quote!

February 17, 2010

Managing namespaces

Filed under: General, Libraries — khinsen @ 2:38 pm

One aspect of Clojure that I have not been quite happy with is namespace management. In a bigger project that consists of several namespaces, I usually end up having nearly identical :use and :require clauses in the initial ns form. These clauses set up the project-specific set of symbols that I want to work with. Individual namespaces sometimes add symbols for their specific needs, of course. What bothers me is that I have to repeat the :use and :require clauses, often with :exclude or :only options with many symbols, in every single namespace. And of course I often forget a copy when updating my symbol set. Therefore I decided to look at how namespaces work in more detail, and try to find a better way to manage symbols in namespaces.

For those who don’t want to read all the explanations, my solution (still a bit experimental, for the moment), is in my nstools library, which is also on Clojars.

As most Clojure programmers know, a namespace maps symbols to vars. Vars are mutable storage locations with well defined concurrency semantics, but this is not the topic of this post – see the documentation for details. But a namespace is not a simple map. To start with, a namespace stores two maps: one from symbols to their values, and one from namespace aliases to namespaces. Aliases are usually created using a (:require ... :as ...) clause in the ns form that opens a namespace. They are used in namespace-qualified symbols before the slash, as a shorthand for the full namespace name. Since aliases are used before the slash and namespace-local symbols are used after the slash (or in an unqualified name with no slash at all), there is no conflict between the two. It is thus possible to use the same symbol both as an alias and as a regular symbol in the same namespace.

The main symbol-to-value map is also not quite as simple as it seems. The values it stores are not always vars. A symbol can also have a Java class as its value. A symbol-to-class entry is created using import or using the :import clause in ns. A submap containing only the symbol-to-class entries of the namespace map can be obtained by calling ns-imports. Finally, the symbol-to-var entries can be divided up into two categories: those that refer to vars in the same namespace (created by def and the many macros based on it), and those that refer to vars in some other namespace. The latter are created with use or the :use clause of ns, and the submap of these symbols can be obtained by calling ns-refers. The first category, a submap of symbols to vars defined in the same namespace, is the return value of ns-interns.

There is one more subtlety, and an undocumented one as far as I know: Two symbols, ns and in-ns, are put in the namespace map when the namespace is created, and can’t be removed (using ns-unmap) nor redefined. This makes sense because they refer to a macro and a function needed to create new namespaces and to switch namespaces. Having them in every namespace (referring to vars in clojure.core) ensures that it is always possible to get out of the current namespace.

Next, let’s look at how namespaces are set up in Clojure. Pretty much all the namespace management functionality is available through the standard ns form with its various clauses and options. The one exception is removing symbols, which can be done only by calling ns-unmap explicitly. The ns form first switches to the namespace it defines, creating it if necessary. The second step is to add references to all public vars defined in namespace clojure.core. This step can be modified by specifying a :refer-clojure clause that lists the symbols to include or exclude. Then ns goes through its optional clauses. A :require clause loads another namespace, but doesn’t normally modify the namespace under construction. Only if the option :as is specified, there is an impact on the namespace: an alias is added. A :use clause first does a :require and then adds all of the newly loaded namespace’s public vars to the symbol table of the current namespace. The options :exclude and :only can be used to select a subset of the public vars. Finally, an :import clause adds Java classes to the namespace’s symbol table.

The most dangerous, but also most convenient, ns clause is :use. In its basic form, it adds all public vars of another namespace to the symbol table of the namespace under construction. And once those symbols are there, they cannot be redefined in the namespace, except by first removing them using ns-unmap. The problem is that “all public vars of namespace X” is not something under your control. It’s the author of the other namespace who decides which symbols you get in your namespace. The next release of namespace X may well have a few more public definitions, and if those are in conflict with your own definitions, then your module will fail to load. Therefore, as a security measure, you should use the :only option of :use with all namespaces that are out of your control, listing explicitly the definitions that you need, in order to be certain that you don’t get more than you expect. Unfortunately, this includes clojure.core, which also grows with every new Clojure release. To be on the safe side, you should have a :refer-clojure clause with the :only in every namespace that you intend to maintain for a longer time.

So far for what I have, but what do I want? I’d like to be able to set up a namespace to my taste and then be able to use it as a basis for deriving other namespaces. With that possibility, I would define a master namespace once per project, being careful to always use the :only option in :refer-clojure and :use. All other namespaces in my project would then be based on this master namespace and only add or remove symbols for their specific local needs.

To implement this functionality, I added three new clauses to ns. The :like clause takes a namespace as its only argument and adds all symbols from that namespace that refer to vars in yet another namespace to the current namespace (make sure you read this properly; there are at least three namespaces involved here!). The :clone clause does the same but also adds the symbols defined in the other namespace. In other words, :clone is equivalent to :like followed by :use. The third new clause is :remove, whose arguments are symbols to be removed from the namespace. It is explicitly allowed to “remove” symbols that aren’t there. This creates another way to protect one’s namespace against future extensions in namespaces that are :used: simply add all symbols defined in your namespace to the :remove list.

The above paragraph contains a small lie: I didn’t add anything to ns, of course, though that’s what I would have liked to do. I made a copy of ns and added the new clauses to the copy. The copy is in namespace nstools.ns and it’s called ns+ – as explained above, I cannot call it ns. So to use nstools, you have to replace ns by ns+ and put a (use 'nstools.ns) before it.

As I said, this library is still a bit experimental. I am not sure for example if both :like and :clone are necessary. And perhaps :remove should be called :exclude. Of course, any feedback is welcome!

The Shocking Blue Green Theme Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.