On Clojure

March 5, 2010

Pre- and post-conditions: a quest for a nicer syntax

Filed under: Uncategorized — Konrad Hinsen @ 6:15 pm

When pre- and post-conditions appeared in Clojure (since version 1.1 they are “official”), I though that was a pretty neat feature and that I ought to use it. I added conditions to a couple of functions and was satisfied. But rather soon I noticed that I used conditions very sparingly, only where I expected wrong data to be fed into a function. Considering that bugs also happen where we don’t expect them, and that conditions are great documentation as well as error-checking code, I should use them more. So why don’t I?

Something I didn’t like about pre- and post-conditions is the rather heavy syntax. For short functions, the conditions take up more space than the code itself. Moreover, parsing them visually takes some effort as well, as much as reading the code itself. Wouldn’t it be nice if preconditions could be written somehow right in the argument list, and postconditions at the level of the function definition?

Well, this is Lisp, so if you don’t like some syntax, you just roll your own. I didn’t come up with a better general syntax though, but I think that what I describe below is much nicer and suitable for 90% of pre- and post-conditions used in practice. The main limitation is that each condition can depend on only one argument, or on the return value. For the other cases, there is still Clojure’s general syntax, which is perfectly compatible with my extension. For those who want to play with this themselves, here is the code.

As a first example, here’s a pretty stupid algorithm to calculate integer powers of a number:

(defn (number?) power
  [(number?) x
   (integer?) (pos?) n]
  (apply * (repeat n x)))

The preconditions check that the first argument is a number and that the second one is a positive integer. The postcondition checks that the result is a a number – the utility of this test is a bit dubious, but it serves as an illustration. Note that you can have multiple conditions per argument, and also multiple postconditions. The full form representing the condition is constructed by inserting the argument to be tested in the second position of the supplied list. The above function definition actually expands to

(defn power
  ([x n]
   {:pre [(number? x) (integer? n) (pos? n)], :post [(number? %)]}
   (apply * (repeat n x))))

One precondition in the above example is actually too strict. The argument n needn’t be positive, just non-negative. There is not simple test function for “non-negative” in Clojure, but with the above rule we can write this as:

(defn (number?) power
  [(number?) x
   (integer?) (>= 0) n]
  (apply * (repeat n x)))

Another possibility is to use the -> macro:

(defn (number?) power
  [(number?) x
   (integer?) (-> neg? not) n]
  (apply * (repeat n x)))

Preconditions can be combined with destructuring. Here is a variant of Clojure’s function second that actually verifies that its argument has at least two element:

(defn my-second
  [[f & (seq) r]]
  (first r))

There is however one limitation: I couldn’t find a way to use my new syntax with map destructuring. So for now at least it works only with vector destructuring.

Comments on this syntax are welcome. Do you like it? Can you come up with something better? Or do you think that Clojure’s standard syntax is just fine?

March 1, 2010

Anonymous function literals and syntax-quote don’t work well together

Filed under: Uncategorized — Konrad Hinsen @ 10:43 am

Clojure has a couple of macro-like features built right into the reader, providing shorthand notation for commonly needed constructs. However, unlike real macros, which are built on a sound conceptual framework that guarantees arbitrary composability (macro calls can contain macros and expand into other macros), the different macro-like features in the reader don’t always play well together.

Consider the following piece of code:

(defmacro foo [x]
  `(map #(identity %) [~x]))

At first sight, nothing looks wrong with this, other than it doesn’t do anything useful. But any use of this macro causes an error message:

(foo [:a :b])
java.lang.Exception: Can't use qualified name as parameter: user/p1__3328

What’s going on here? The error message hints at a problem with a function argument. The only function being defined is here #(identity %), which uses a shorthand notation for function literals expanded by the reader. Let’s see what the macro call expands to:

(macroexpand-1 '(foo [:a :b]))

yields

(clojure.core/map (fn* [user/p1__3328] (clojure.core/identity user/p1__3328)) [[:a :b]])

So here’s the problem: #(identity %) is expanded by the reader into (fn* [p1__3328] (identity p1__3328)). This is a perfectly valid function literal, but the other reader feature used here, syntax-quote, doesn’t know about function literals. It takes the expanded function literal as an arbitrary form and does namespace resolution on all symbols. This leads to a namespace-qualified symbol for a function parameter, which is not legal Clojure syntax.

Moral: use reader features sparingly, ideally one at a time. Except for syntax-quote, they only save you a couple of keystrokes, a convenience that you may end up paying a high price for in terms of debugging time.

June 24, 2009

Protecting mutable state in the state monad

Filed under: Uncategorized — Konrad Hinsen @ 4:19 pm

In part 3 of my monad tutorial for Clojure programmers, I explained that the main purpose of the state monad is to facilitate the implementation of stateful algorithms in a purely functional way. Here I will show a somewhat different use of the state monad: protecting mutable data in such a way that it is guaranteed not to be used incorrectly. I will use I/O as my example because I expect that everyone is familiar with it, but the same method can also be applied to modify mutable Java data structures, such as arrays, in a referentially transparent fashion.

The example I will use is stream-based I/O, based on the java.io library. Writing to such a stream has side-effects, so it is certainly not purely functional. But even reading from a stream is not purely functional: every time you call the read method, you get another return value. A stream object therefore represents mutable state. Passing around such an object among functions in a program makes it difficult to verify that the stream is read or written to properly.

Here is the basic idea of protecting state in the state monad. Suppose that the only way to create your mutable object is through a function that takes a state monad value as its argument. The function creates the object, calls the state monad value on it, and then destroys the object. Client code never obtains a reference to the object, so the only way to act on it is through state monad values that define useful operations on the object (such as “read a line from the stream”). To add another layer of protection, the object is wrapped in a protective data structure (such as a closure) before being passed to the state monad value. Only a well-defined set of state monad values gets the key to access the object, meaning that only those clearly identified operations can act on the object. You can then use those operations, and combine them in the state monad to define more complex operations. But no matter how you try, you will never get a reference to the object that you could assign to a var, pass to some function, or (ab-)use in any other way.

For stream-based I/O, this approach is implemented in the library clojure.contrib.monadic-io-streams. Before trying any of the examples below, you have to evaluate the following form that takes care of importing the libraries:

(ns monadic-io-demo
  (:refer-clojure :exclude (read-line print println flush))
  (:use [clojure.contrib.monadic-io-streams])
  (:use [clojure.contrib.monads]))

The :refer-clojure clause is necessary because clojure.contrib.monadic-io-streams defines a couple of names that are also defined in clojure.core. In general this is not a good idea, but here the names are the same as those of the Java functions that are being called, which is a useful feature as well. The number of good names for functions is unfortunately not unlimited!

With the bookkeeping done, let’s look at a basic application:

(with-reader "my-file.txt"
  (read-line))

This returns the first line of the text file “my-file.txt”. To understand how this works, here is the definition of read-line:

(defn read-line
  []
  (fn [s] [(.readLine (unlock s)) s]))

The call (read-line) thus returns a state monad value: a function that takes a state argument s, calls the Java method readLine on the unlocked state, and returns a vector containing the freshly read line and the state argument. The function unlock is defined locally in a let form and is thus inaccessible from the outside. It retrieves the real state value from the wrapper that only serves to protect it.

Next, we need to look at with-reader:

(defn with-reader
  [reader-spec statement]
  (with-open [r (reader reader-spec)]
    (first (statement (lock r)))))

The function reader that it calls comes from clojure.contrib.duck-streams. It creates the stream reader object which is then locked (wrapped inside a closure) and passed to statement, which happens to be the state monad value returned by (read-line). The with-open macro ensures that the reader is closed. The return value of the with-reader function is the first item in the return value of the monadic statement; the second item is the state which is of no interest any more.

There are two levels of protection here: first, the reader object is never made accessible to the outside world. It is created, injected into the monadic statement, and then made invalid by closing it. The only way to get a reference would be write a monadic statement that exposes the state. This is indeed possible, and the statement is even provided under the name fetch-state in clojure.contrib.monads. The following piece of code returns the state value:

(with-reader "my-file.txt"
  (fetch-state))

But here the second level of protection takes over: the return value is the locked version of the state, which happens to be a closure. That closure must be called with the right key in order to unlock the state, but the key is not accessible anywhere. Only a handful of functions in clojure.contrib.monadic-io-streams can unlock the state and work on it.

The typical way to do more complex I/O using this monadic approach is to define complex I/O statements by composing the basic I/O statements (read, read-line, write, print, etc.) in the state monad. This permits the construction of arbitrary I/O code, all in a purely functional way. In the end, the compound I/O statement is applied to an I/O stream using with-reader or with-writer. That part is necessarily not purely functional: when reading a file, nothing guarantees that the file will be the same every time it is read. But the non-functional part is now localized in a single place, and the complex aspect of I/O, the composition of I/O statements to do the required work, is purely functional.

As I said earlier, the same approach can be applied to working with mutable arrays. There would be function make-array that takes a monadic array-modifying statement as its argument. This function would create an array, run it through the statement, and return the resulting modified array. The only array-modifying functions would be defined as state monad values. The net result would be a referentially transparent way to create a new array and have it initialized by an arbitrary algorithm. However, once returned from make-array, the array would be immutable.

May 6, 2009

Simulating dynamic scoping

Filed under: Uncategorized — Konrad Hinsen @ 4:36 pm

In the early days of Lisp, and programming languages in general, scoping was a hot issue of debate. What is better, lexical scoping or dynamic scoping?

Scoping refers to where a function looks for the definitions of symbols that it doesn’t have locally. Given (fn [x] (+ b x)), where does b come from? With lexical scoping, b is taken from the lexical environment, i.e. the forms surrounding the function definition, or from the global namespace if the lexical environment doesn’t have a b. The lookup typically happens when the function is compiled. With dynamic scoping, the lookup happens at runtime and by following the call stack: first the calling function is checked for a definition of b, then the calle’s caller, etc.

By now the issue has been settled: lexical scoping is the default in all modern programming languages, including all modern Lisp dialects. And that includes Clojure, of course. Lexical scoping is more predictable and permits compile-time analysis of which value a symbol refers to. It also permits closures, which have become a popular technique in functional programming.

However, dynamic scoping is of use in some occasions and it can in fact be simulated in Clojure. In this post I will show how and what to watch out for.

First of all, why would one want dynamic scoping? Here is one example. Recently I wrote an implementation of macrolet and symbol-macrolet for Clojure. These are both macros that modify the macro expansion procedure by adding local macro definitions. This means that a stack of macro definitions must be maintained: each macrolet adds definitions to the stack, and at the end of the macrolet form the new definitions are popped off again.

One usual way to handle this would be to pass the stack around among the functions that do the macro expansion, which could modify it as needed. Another approach would be hiding this passed-around state using the state monad (see part 3 of my monad tutorial). But in the specific situation of macro expansion, Clojure’s macroexpand-1 enters into the call chain. I couldn’t modify this function to pass on the macro definition stack, so I had to work around it in some way. I could have avoided using macroexpand-1, for example. But I chose to try simulating dynamic scoping at this occasion.

With dynamic scoping, a function that wants to modify the stack just redefines the variable containing it, calls the expansion recursively, and sets the stack back to its old value. A function accessing the stack just uses the current value of the variable, which is looked up in the call chain.

How can this be simulated in Clojure? The obvious candidate is binding. The binding form redefines a var in a namespace for the duration of the execution of its body. The redefinition is valid only inside the thread that is being executed, so other threads are not affected. In my macro expansion code, one of the stack vars is defined by

(defvar- macro-fns {})

and modified in each call to macrolet:

(defmacro macrolet
  [fn-bindings & exprs]
  (let [names      (map first fn-bindings)
	name-map   (into {} (map (fn [n] [(list 'quote n) n]) names))
	macro-map  (eval `(letfn ~fn-bindings ~name-map))]
    (binding [macro-fns     (merge macro-fns macro-map)
	      macro-symbols (apply dissoc macro-symbols names)]
      `(do ~@(map expand-all exprs)))))

Well, that’s almost all there is to say about it. Except that the definition of macrolet given above does not work.

The problem is laziness. The binding form changes the definition of macro-fns only for the duration of the execution of its body and then resets it to its previous value. But the execution of the body is just a call to map. This doesn’t actually do anything, it merely returns a package that calls expand-all as soon as the first element of the sequence is requested. And that happens only after the binding form has been left.

Once the problem is identified, the solution is simple: add a doall around the map:

(defmacro macrolet
  [fn-bindings & exprs]
  (let [names      (map first fn-bindings)
	name-map   (into {} (map (fn [n] [(list 'quote n) n]) names))
	macro-map  (eval `(letfn ~fn-bindings ~name-map))]
    (binding [macro-fns     (merge macro-fns macro-map)
	      macro-symbols (apply dissoc macro-symbols names)]
      `(do ~@(doall (map expand-all exprs))))))

However, adding all then necessary doalls requires careful attention. Forgetting does cause errors, but not necessarily immediately. For this reason, simulating dynamic scoping should be avoided except when there is a good reason. In retrospect, I am not even sure if my reason was good enough, and perhaps one day I will rewrite the code in a different way.

The Shocking Blue Green Theme. Create a free website or blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.