Tuesday, September 15, 2009

How would Linq look like in Java (2)

After thinking a bit more about my recent attempt in bringing Linq to Java and looking deeper into the JGA library, one thing starts to bug me: The Queryable<T> interface lets you only adapt specific operations by overriding specific methods. Although this is a powerful possibility, it kind of prevents the main use case to extend the Linq mechanism: An overload for a specific type T of Iterable<T>. In .NET  this is possible through the extension method feature which does not exist in Java. You simple would provide a new method for your specific Iterable<YourType>.

Basically extension methods are static methods that get glued to a type by some compiler magic and look like instance methods. In Java you can use static methods without a qualifying type by importing it statically. They look and feel like global methods in C++. Since I have seen how the hamcrest matchers use this feature in JUnit 4.x to create a somewhat fluent syntax, I’m wondering if this can’t be used to create a Linq like syntax:

import static xox.foo.Queryable.*;//defines select, … 

UnaryFunction<Float, Integer> xform =…;
UnaryFunction<Integer, Boolean> lessThan5 = …;

Iterable<Integer> i2 = 
    from(collection, select(xform, where(lessThan5)));

With this design, a contributor would be able to provide his/her type specific overloaded implementation of the operation. It is really interesting to think about a possible implementation for this. How can the iterators be chained correctly, as the static methods are evaluated in the exact opposite order as the iterator chain is expected to work?

I first experimented passing a LinkedList<Iterable<T>> through the method chain, but it turned out that I had to know the specific Iterable type to link the iterators together. So I switched to provide a simple base class that implements Iterable<T>. It provides service to its inheritors to access the next and previous Iterable.

abstract class LinkedIterable<TIn, TOut> implements Iterable<TOut>{
   
LinkedIterable<?, TIn> getPrev() {// … omitted
    LinkedIterable<TOut, ?> getNext(){// … omitted }
    public abstract Iterator<TOut> iterator();

}

A typical implementation would  use it like this:

class SelectIterable<TIn, TOut> extends LinkedIterable<TIn, TOut> {
   
private final UnaryFunction<TIn, TOut> select;

    public SelectIterable(LinkedIterable<TOut, ?> next,
                          UnaryFunction<TIn, TOut> select) {
        super(next);
        this.select = select;
    }

    public Iterator<TOut> iterator() {
        return new SelectIterator<TIn, TOut>(super.getPrev().iterator(),
                                             select);
    }
}

This linked list liked construct gets passed through the method chain. Each method takes one as input,creates a specific new implementation, adds it to the front and returns the head of the list.

public static <TIn> LinkedIterable<TIn, TIn>
   
where(UnaryFunction<TIn, Boolean> sel,
          LinkedIterable<TIn, ?> dec) {

        return new WhereIterable<TIn>(dec, sel);
}

The last method in the method chain to be called is the from function. It simply has to add the input range to the LinkedIterable<T> chain, run to the tail of the tail of the list and return it. As a LinkedIterable<T> is an Iterable<T>, this works seamless.

Now I just have to think about whether this syntax is better to read than my former approach. Especially for longer statements, all the methods and their braces might become unhandy:

Grouping managersBySalary = 
   
from(employes, where(isManager, select(toManager, groupBy(salary)));

How would Linq look like in Java

After thinking a while about Linq for Java I started to experiment. My main driver was to find out how Linq would look like in Java. I started with a simple task: From a given range of floats convert ever item to an integer and take those that are less or equal four. In C# this would look like this:

IEnumerable<float> source =
   
new []{0, 1, 2, 3, 4, 5, 6, 7, 8, 9};

IEnumerable<int> result =
   
source.Select(x=>(int)x).Where(x=> x<= 4);

Java has neither delegates nor any lambda syntax, so statements like x=>x<=4 are not possible. So I decided to model the delegates as interfaces:

public interface UnaryFunction<Arg, Result> { 
    Result execute(Arg arg);
}

Linq statements basically are so called extension methods to the IEnumerable<T> and/or IQueryable<T> interface. The Java equivalent is the Iterable<T> interface. However, there is nothing like an extension method in java, so I had to fall back on the OO way of extension: inheritance.

public interface Queryable<T> extends Iterable<T>{
   <U> Queryable<U> select(UnaryFunction<T, U> select); 
   Queryable<T> where(UnaryFunction<T, Boolean> pred);
}

So if every operation returns a new Queryable<T> object, it would be possible to chain the calls very similar to the C# way.

This looked very promising! I decided that the real work should be done on the Iterator<T> level and implemented specific iterators for the select and where operations. Here is the SelectIterator as an example:

class SelectIterator<From, To> implements Iterator<To> {

   
private final Iterator<From> inner;
    private final UnaryFunction<From, To> select;

    SelectIterator(Iterator<From> inner,
                          UnaryFunction<From, To> sel) {
        this.inner = inner;
        this.select = sel;
    }
    public boolean hasNext() {
        return inner.hasNext();
    } 
    public To next() {
        return select.execute(inner.next());
    } 
    public void remove() {
        throw new UnsupportedOperationException("");
    }
}

I implemented the Queryable<T> interface with an abstract base class and concrete anonymous inner classes:

abstract class QueryableImpl<T> implements Queryable<T> {  
   <U> Queryable<U> select(final UnaryFunction<T, U> s){
        return new QueryableImpl<U>(){
            @Override
            public Iterator<U> iterator() {
                return new SelectIterator(
                       QueryableImpl.this.iterator(), s
                                        
);
            }
        };
    }

    // … and some more
    public abstract Iterator<T> iterator();
}

Last but not least I provided a static adaptor function to convert any Iterable<T> to a Queryable<T>:

static <T> Queryable<T> from(final Iterable<T> range) {
    return new QueryableImpl<T>() { 
        @Override
        public Iterator<T> iterator() {
            return range.iterator();
        }
    };
}

Putting all together allows you to write the following Java code:

Iterable<Float> src=
    Arrays.asList(0f, 1f, 2f, 3f, 4f, 5f, 6f, 7f, 8f);

UnaryFunction<Float, Integer> xform =
    new UnaryFunction<Float, Integer>() {
        public Integer execute(Float arg) {
            return (int) arg.floatValue();
        }
    };

UnaryFunction<Integer, Boolean> less5 =
    new UnaryFunction<Integer, Boolean>() {
        public Boolean execute(Integer arg) {
                return arg < 5;
        }
    };

Iterable<Integer> i2 = 
     from(src).select(xform).where(less5);

This looks pleasant close to the C# original!

Tuesday, September 8, 2009

Linq for Java

I started to work in Java and I’m really missing a comprehensive set of algorithms and data structures. The google collections in combination with the standard collection framework serves the data structure side very well and JGA looks promising for Algorithms. After all I don't want to do another port of the STL :-).

In the last months working with .NET I started to rely on Linq for my daily work and although it is quite reduced compared to the NSTL, I rarely missed a thing. I especially liked the fluent way that you can manipulate and transform the content any IEnumerable<T>.

I have found nothing comparable that is as fluent as Linq (JGA is close though) in the Java world and I start to wonder if it is possible to port at least some of the aspects?