Monday, June 23, 2008

Don't call it a static or dynamic language!

I was talking the other day with someone recruiting me for a position. Smart organizations will contact you with someone who can try and sift the wheat from the chaff. In other words, they'll ask you questions in particular computer science or even, *gasp*, Java questions. Well this one guy asked me a standard question, 'what are the advantages and disadvantages of statically typed versus dynamically typed languages?' Now the full debate on that question could not only fill a whole series of blog entries, but could be a book by itself, which I will not get into here. Still as I was giving my answer I prefaced my comments by saying something that I have put a fair bit of thought into, which was: 'I don't like the name static or dynamic typing, I prefer the notion of implicitly typed and explicitly typed languages.'

Definitions
First off let me start with the definitions of explicit and implicit, both of these are horked from Merriam Webster Online. Implicit is defined as "capable of being understood from something else though unexpressed : implied or involved in the nature or essence of something though not revealed, expressed, or developed." Explicit would be defined as "fully revealed or expressed without vagueness, implication, or ambiguity : leaving no question as to meaning or intent" or as "fully developed or formulated ." I think a good way to think about this is to consider the following question, "are details hidden from you or not?"

WTF (What The Friendly) does this have to do with Programming Languages?
I think if you consider it for a while, the language debate between statically and dynamically typed is clearer when it is considered in terms of whether it emphasizes implicitness or explicitness. First off, all languages and programming platforms are implicit to some degree, since there is no way to encapsulate all of the factors that go into programming a software system to a developer. If they did, the resulting interface to the developer would probably be enough to make an airline pilot's jaw drop. Still the point applies if you focus on types. Most of the debate about programming languages focuses on the issues of types, so it is a good area to concern ourselves with. By my reckoning, a strongly/statically typed language, like Java for instance, would be considered 'explicitly typed' in this nomenclature. A dynamically/weakly typed language, like Python or Ruby for instance, would be called an 'implicitly typed' language from now on. An optionally typed language like Groovy would be optionally explicit or optionally implicit, depending on the developers mood that day. ;)

Actually in my opinion, most of the interest in Groovy is due to its 'implicit' capabilities and therefore I would really put in that category. I know you can explicitly type things in Groovy, but that is really an enabling feature. Enabling feature, you say? Yeah, an enabling feature, kinda like (Note: I have no personal experience with the following analogy, really...) how a drug dealer might give out free samples to hook you in and then afterwords start charging. So being able to specify types in Groovy gives Java developers, like myself, a warm fuzzy about trying out the new features and, therefore, try out the Groovy language.

Ok this is awfully pedantic, so what?
So the question that might be formulating in your mind is, "ok so you have a new definition, how does this help anything?" Well for one thing, no matter what you do with software code, the types are there, period. The difference is if they are hidden or not. In the next few sections I am going to compare equivalent sections of code in Java and in Groovy, but I think that the point holds if we were in Ruby, Smalltalk or Python, even though the examples would vary.

I want to show a quick example from groovy and Java, in this I am going to create an instance of ArrayList:

Java(5.0+):
List<> list = new ArrayList<>();

Groovy:
def list = []

So let me ask you a question? What is the difference in this case? Well for one the amount of actual keystrokes to write the examples. Beyond that, there is not much difference, in both cases you have a variable declaration of list that is initialized with a java.util.ArrayList instance. So what is the advantage of the Java version? Well Java's explicitness is key here, you know straight away that list is of type java.util.List and that it is an instance of java.util.ArrayList. Therefore, someone new to the Java platform would probably have an easier time figuring out what is going on in the Java line(at least not including generics, which is another post entirely) than with the Groovy line. To understand the groovy line you need to know that the array notation is shorthand for instantiating a List object and that by default an ArrayList is used, therefore I would term Groovy an "implicit" language in this case.

One other not so small point, if I was looking at this from a tool developer's viewpoint, I prefer the Java version to the Groovy version, in probably just about every case. In general, it is easier to write tools to analyze and provide support to base Java (or other explicitly typed language) than it is to any implicitly (read: dynamic) typed language.

Another example, I am going to do the simple "HelloWorld" application in both Groovy and Java:

Java: HelloWorld.java
public class HelloWorld
{
public static void main( String[] args )
{
System.out.println( "Hello World!" );
}
}

Groovy: HelloWorld.groovy
println "Hello World!"

This is the example that should drive the point home. What is the difference? In the end the bytecode will be functionally the same in both the Groovy and Java cases (Notice I don't say that the bytecode would be exactly the same). The big difference is that the Java version is explicit, not only about types, but also in requiring a class definition and explicit main method. In Groovy, the type is inferred implicitly from the file name and the main method is also implied.

What about the tools man?
The final example I want to consider is the Domain Specific Language (DSL) in the two kinds of languages. Actually, I lied, I am only going to show an example of a builder in Groovy and skip showing the example in Java, because the example for Java would be far too painful to consider. I want to consider the following example because I want to get to the point about what 'explicit' versus 'implicit' means for tool developers.

Example: AntBuilder (Groovy)
// Going to zip up a file
def ant = new AntBuilder()
ant.zip( destfile:"${zip.location}", basedir:"${project.location}" )

What all is going on here? You have an instance of AntBuilder being created and a method called zip being invoked. There is an example of named parameters, which is a very cool feature for Groovy I think. Finally, you see there two GStrings (Groovy Strings, quite the name no?) with its built in templating. There is a lot going on here, someone wanting to write a tool would probably take a deep breath, but envision that he/she will be able to do it. The trick is what about the zip method. What about the zip method? Lets get to that next.

In the previous examples, the types and their signatures are just below the surface so to speak. The notation '[]' is a shorthand for java.util.List and in the case where the statements are just put into the Groovy file without a class definition, means there is an implicit class definition and static main method. In the above example, there is no such nicety. Why would I say that? I mean the variable ant is an instance of AntBuilder right? Yes, but look at the javadoc for AntBuilder, where is the method definition for zip? There is not one, but if you want to see for yourself go ahead and look, I'll wait.

The AntBuilder class is an example of a Builder in Groovy which does something like what java.lang.reflect.Proxy does. Proxy provides an InvocationHandler, which has a method called invoke() which the Java platform provides as a hook to allow you to provide runtime implementations of arbitrary types. The Groovy Builder support allows for creation of nodes and invocation of methods on those nodes. All of this means that not only is the full type definition for AntBuilder implicit, but that there is really no way at compile time to figure it out. What I mean is that there is no way for a source code tool developer to write general tools to provide the kind of language support that is available in explicitly typed languages for DSLs. In the first two examples, you could easily imagine being able to provide tooling support for those cases, the types and signatures were implicit, but they could be figured out relatively simply, there is no real way to do it in the last case.

Tooling support is an area that is usually skipped in alot of discussions on explicitly typed versus implicitly typed languages. It is dismissed by implicit proponents with, "well tooling will just catch up." I don't consider it trivial and, having dabbled a bit in providing IDE support for Groovy, I don't assume that the tooling will just be there in the future. Implicitly typed language designers and proponents need to give this its full consideration, elsewise the best your language could hope for is to be the next Lisp, praised by some and used by none.

Conclusion
So the difference between the two languages in the above examples is best exemplified by asking three questions. First, which version do you think someone new to your codebase or the Java platform would understand better? Usually an explicitly typed language like Java (or maybe Scala/Haskell) would be favored. Second, which version do you think someone with experience with your codebase or the Java platform would prefer to write? Usually, I would think that an implicit language like Groovy, Ruby or even Python would be preferred. Third, what kind of tool, particularly IDE support do you want or need? If your project thinks that the first question is more important than the second one, then an explicit language like Java should be your choice. If the second question seems more important, then an implicit language like Groovy should be your choice. The third question can only be answered by surveying what is available and trying it out to see if it is good enough.



12 comments:

paulk_asert said...

Hi James, nice thought provoking article. I just upgraded the javadoc for AntBuilder to at least point to the Ant manual so you can look up the task names there. It is in trunk but won't appear on the website until the next release.

Also, Groovy doesn't stop you from writing your own AntBuilder with static methods - most people just prefer one that will work when new versions of Ant come up with more tasks or with their own custom tasks. IntelliJ IDEA also has completion for some builders (e.g. SwingBuilder) and is supposed to be providing hooks so that you can tell it about your own builders.

I am not saying tooling has caught up yet but some good progress has been made.

Cheers, Paul.

Daniel Spiewak said...

I disagree completely -- with your initial point, that is -- the rest of it is fine. :-)

Changing the terminology to explicit/implicit doesn't do anything other than confuse communication. Actually, it also implies the wrong facts about type systems. For example, static type systems do not have to be explicitly typed. Type inference in languages like Scala can go a long way to removing the explicit type annotations; and languages like ML or Haskell with Henley/Milton type inference can complete the journey:

fun addList nil = 0
| addList (hd::tail) = hd + (addList tail)

No type information at all, and yet this is all fully statically type checked (and valid) as a method operating on a list of ints. Hardly an explicit piece of code, yet it is static.

On the flip side, dynamically typed languages really *are* dynamically typed, not just implicitly. Yes, you can argue that the lack of type annotations represents an implicit runtime inference (and you did), but that's not the whole story. Techniques like so-called "duck typing" muddy the waters a bit. For example (Ruby):

class Duck
def walk
'waddle'
end

def quack
'quack'
end
end

class Penguin
def walk
'swim'
end

def quack
'do penguins even make noise?'
end
end

def do_stuff(animal)
puts animal.quack
puts animal.walk
end

In this case, the `animal` variable has no defined type. The method can technically be used on an object of *any* type, though it will fail for objects which do not define methods `quack` and `walk`. You could *claim* that this defines an implicit structural type, but I think that's a bit of a stretch. Things like `method_missing` and unscoped open classes throw that sort of claim into chaos, since an object may or may not define a certain method at runtime.

In short, dynamic languages really *are* dynamic, both in the type system and in the rest of their design. There may not be *any* types implicit in the code. On the other side, static languages are truly static, and while crude languages like Java require everything to be explicitly declared, more advanced designs may even allow you to write an entire program without ever specifying a single type.

Oh, and for the record, Ruby, Python and Groovy are all dynamically and *strongly* typed. Weakly typed languages are incredibly primitive, usually causing no end of troubles (PHP, C/C++, Fortran, etc). The definition of "weak typing" is when a type system allows code which is unsound to execute. This isn't restricted to static languages either, though it is most obvious in that case. Example from C++:

string s = "Hello, World!";
int i = int(s);

cout << i - 42 << endl; // segfault

In fact, the very *reason* languages like C and C++ suffer from segmentation faults is because of their weak type system.

Note that most languages (Java, Ruby, etc) allow this sort of situation statically -- meaning they all have some sort of casting mechanism -- but they perform runtime checks to ensure that the system itself does not blow up as a result. This is where the dreaded ClassCastException comes from, it's actually a very good thing.

As a minor note of interest, Java's arrays actually originally made the language weakly typed, due to their mutable covariance. This was fixed in a later release however with runtime checks.

Ricky Clarkson said...

I agree with Daniel, static/dynamic is not about explicitness. Groovy can be explicitly typed but remains dynamically typed. int i="hello"; compiles in Groovy but fails at runtime.

However, by Daniel's own measure, Scala, Java and Haskell are all weakly typed. They all allow unsound code to execute. Java and Scala will give warnings when you compile it though (except if you manage to trick the compiler through incremental compilation), and Haskell requires you to use functions containing the word 'unsafe' to do it.

I suggest avoiding the terms "strong typing" and "weak typing", as they are pretty much always whatever the person using them wants them to be, and aren't really very meaningful.

Daniel Spiewak said...

I probably should have been a bit clearer on strong vs weak typing. What I meant is a weak type system at some point allows you to treat an object of one type as if it were of another disjoint type. In other words, no enforcing of sound runtime type.

I already mentioned how you can make this blow up in C++, but many programmers often take advantage of it. Actually, C-style arrays are just that, exploiting weak typing. For example:

char name[] = "Daniel";
int i;
for (i = 0; i < 6; ++i)
{
char c = *(name + i);
}

In this case, we're treating an array of characters as if it were an integer, then treating the result as a pointer and dereferencing. This works at runtime, but there's really not much of a safety net. Raw pointers in general lead to weak typing, because manipulation of a pointer value means the ability to access arbitrary memory slots, and without a lot of very inefficient checking, it's not possible for the runtime to prevent catastrophic results should something go wrong.

By contrast, it is certainly possible to do nasty things like my first example in Java:

Object s = "Hello, World!";
int i = ((Integer) s).intValue();

The difference here is that Java will catch this at the time of the cast, rather than when we actually try to use the "int" value. Also, when it catches it, the runtime will throw an exception which can be handled gracefully, not an error signal which kills the process.

It is a bit of a contrived definition though, so Ricky's probably right that the terms "weak" and "strong" typing should be avoided.

James E. Ervin, IV said...

Daniel,
I knew when I wrote this article, someone who knew what they were talking about would show up and put me in my place. ;)

First off, I will state that I am not as much of a polyglot in terms of programming languages. In other words, the first bit of example code you gave looks like Greek to me.

Secondly, I haven't tried Scala or Haskell and therefore I have not considered the ramifications of statically typed type inference.

In light of your comments, I think that I will have to amend my thoughts a bit. See from my perspective I am thinking of it from the programmers (not the machine's or runtime's) perspective. Do I as a programmer know what the types are? How hard is it for me to find them? This is why I do not believe that you can discuss this without consideration of tooling.

See from my perspective, if a programmer can look at a piece of code in Scala and quickly find out what the types are, then I would still term the language explicit. If a programmer can not do that, I would think of the language as implicit. Still if the language is statically typed, tooling should be easy to create to support that language and therefore for a programmer the language effectively becomes explicit. As you rightly point out, a truly dynamic language, this is really not possible.

Finally, thanks for the examples of duck typing. I know you consider it a stretch, but I do not, a duck 'type' if you will is very much an implicit type. So if an object has operations that match a given signature, it should work, very implicit I would think and also nearly impossible to tool for. Actually, let me amend that, I think you could provide tooling for duck typing far before any general purpose support for DSLs for instance. It would be hard, but possible I would think.

I know there are some that will think that discussing a programming language in light of its tooling is a cop out, but if you think about it, is it? How could I say that for instance Scala, by itself could be considered implicit, but with the right tooling, made explicit? A language does not exist in a vacuum, it exists with the runtime artifacts produced, the platform on which it is run and the tooling that the programmer has used to produce it.

See for me programming is no longer an exercise in syntax or really mathematics. Btw for background, I am a mathematician as well as an Engineer. With modern IDEs it is possible to develop code that the machine understands and, with Unit Testing, even code that mostly does what you expected it to. Use of profiling tools also should eliminate most of the problem of premature optimization. Therefore the challenge I think is really one more akin to writing. You know like technical or report writing. Can you express your intent clearly so that other developers or even yourself later on can understand what you were originally intending? Subjects such as writing tests as specifications, patterns and most of modern Software Engineering is really at its core focused on this problem. As you can imagine from reading my blog entries I have a ways to go to get better about this. Still thanks, I think I have the subject for my next blog entry pontificating on things I have no business pontificating on. :)

James E. Ervin, IV said...

Man, y'all are replying faster than I can comment. :) I must have hit a nerve eh?

Still great comments and points. I realize now that my emphasis of the article should be to view and categorize languages and programming platforms more from the perspective of what they provide the end user/developer and not as much from the runtime or machine's perspective.

I think I am going to let this gestate a bit and reconsider it in a later blog entry. Thanks again y'all.....

Daniel Spiewak said...

Ah, I think I see what you're saying now. The distinction you are making is point at which type information is known. In Java, that is statically in the declaration. In Scala, the type information is statically known, but not necessarily at point of declaration. In dynamic languages (Ruby, Groovy, etc), the type information is not really known until runtime. The runtime contract which must be fulfilled by the type is defined by the usage, just as ML and Haskell define the *compile time* contract of the type based on use.

In that sense, I guess I do agree that the terms implicit and explicit fit reasonably well. However, I think it should be clear that not all implicit languages are dynamic, nor or all explicit languages static. I would also still argue that in extremely dynamic situations (open classes, `eval`, `method_missing`, etc), a language might be *neither* implicit nor explicit, because it is impossible to determine the type contract without considering external factors.

> Man, y'all are replying faster than I can comment. :) I must have hit a nerve eh?

:-) Not really a nerve, just piqued interest. If we can't quibble over arbitrary definitions, then what have we left with which to concern ourselves?

paulk_asert said...

Re: 'In dynamic languages (Ruby, Groovy, etc), the type information is not really known until runtime.'

I guess Groovy has a subtle difference to Ruby in the spectrum Daniel described as I can explicitly type the parameters to my methods (and my fields and return types) if I wish and can mix in a sprinkle of interface-oriented programming with my duck typing if I wish. (I.e. make an explicit declaration of the protocol some objects should conform to).

Anonymous said...

It is a bit unfortunate that most major statically typed languages also happen to use explicit type inference - these are two different concepts and have been commonly confused because of this historical reason.

A newer breed of statically typed but implicitly inferred language is starting to emerge in the form of Scala, F# and the likes - hopefully these will become mainstream soon.

A good resource to read before embarking on any debate around static and dynamic typing is this

James E. Ervin, IV said...

Thanks for the resource I will check it out, it looks very informative. I know I am not the foremost expert on this area, but that is the beauty of blogging right? The ability to pontificate on things you know little about. :)

Stephan.Schmidt said...

For me the important distinction is static/dynamic reference and static/dynamic data structure languages. More in depth analysis I posted 2 years ago

http://stephan.reposita.org/archives/2006/03/29/weak-versus-strong-languages-who-wins-the-fight/

Peace
-stephan

Anonymous said...

Statically typed just means the language uses the type information at (or before) Compile/read/eval time not run-time. The fact of explicit or implicit makes no difference. Infact a language can be both static and dynamic at the same time and you can never see a type in the entire program.