Iacobus: software engineering

I was talking the other day with someone recruiting me for a position. Smart organizations will contact you with someone who can try and sift the wheat from the chaff. In other words, they'll ask you questions in particular computer science or even, *gasp*, Java questions. Well this one guy asked me a standard question, 'what are the advantages and disadvantages of statically typed versus dynamically typed languages?' Now the full debate on that question could not only fill a whole series of blog entries, but could be a book by itself, which I will not get into here. Still as I was giving my answer I prefaced my comments by saying something that I have put a fair bit of thought into, which was: 'I don't like the name static or dynamic typing, I prefer the notion of implicitly typed and explicitly typed languages.'

Definitions
First off let me start with the definitions of explicit and implicit, both of these are horked from Merriam Webster Online. Implicit is defined as "capable of being understood from something else though unexpressed : implied or involved in the nature or essence of something though not revealed, expressed, or developed." Explicit would be defined as "fully revealed or expressed without vagueness, implication, or ambiguity : leaving no question as to meaning or intent" or as "fully developed or formulated ." I think a good way to think about this is to consider the following question, "are details hidden from you or not?"

WTF (What The Friendly) does this have to do with Programming Languages?
I think if you consider it for a while, the language debate between statically and dynamically typed is clearer when it is considered in terms of whether it emphasizes implicitness or explicitness. First off, all languages and programming platforms are implicit to some degree, since there is no way to encapsulate all of the factors that go into programming a software system to a developer. If they did, the resulting interface to the developer would probably be enough to make an airline pilot's jaw drop. Still the point applies if you focus on types. Most of the debate about programming languages focuses on the issues of types, so it is a good area to concern ourselves with. By my reckoning, a strongly/statically typed language, like Java for instance, would be considered 'explicitly typed' in this nomenclature. A dynamically/weakly typed language, like Python or Ruby for instance, would be called an 'implicitly typed' language from now on. An optionally typed language like Groovy would be optionally explicit or optionally implicit, depending on the developers mood that day. ;)

Actually in my opinion, most of the interest in Groovy is due to its 'implicit' capabilities and therefore I would really put in that category. I know you can explicitly type things in Groovy, but that is really an enabling feature. Enabling feature, you say? Yeah, an enabling feature, kinda like (Note: I have no personal experience with the following analogy, really...) how a drug dealer might give out free samples to hook you in and then afterwords start charging. So being able to specify types in Groovy gives Java developers, like myself, a warm fuzzy about trying out the new features and, therefore, try out the Groovy language.

Ok this is awfully pedantic, so what?
So the question that might be formulating in your mind is, "ok so you have a new definition, how does this help anything?" Well for one thing, no matter what you do with software code, the types are there, period. The difference is if they are hidden or not. In the next few sections I am going to compare equivalent sections of code in Java and in Groovy, but I think that the point holds if we were in Ruby, Smalltalk or Python, even though the examples would vary.

I want to show a quick example from groovy and Java, in this I am going to create an instance of ArrayList:

Java(5.0+):
List<> list = new ArrayList<>();

Groovy:
def list = []

So let me ask you a question? What is the difference in this case? Well for one the amount of actual keystrokes to write the examples. Beyond that, there is not much difference, in both cases you have a variable declaration of list that is initialized with a java.util.ArrayList instance. So what is the advantage of the Java version? Well Java's explicitness is key here, you know straight away that list is of type java.util.List and that it is an instance of java.util.ArrayList. Therefore, someone new to the Java platform would probably have an easier time figuring out what is going on in the Java line(at least not including generics, which is another post entirely) than with the Groovy line. To understand the groovy line you need to know that the array notation is shorthand for instantiating a List object and that by default an ArrayList is used, therefore I would term Groovy an "implicit" language in this case.

One other not so small point, if I was looking at this from a tool developer's viewpoint, I prefer the Java version to the Groovy version, in probably just about every case. In general, it is easier to write tools to analyze and provide support to base Java (or other explicitly typed language) than it is to any implicitly (read: dynamic) typed language.

Another example, I am going to do the simple "HelloWorld" application in both Groovy and Java:

Java: HelloWorld.java
public class HelloWorld
{
public static void main( String[] args )
{
System.out.println( "Hello World!" );
}
}

Groovy: HelloWorld.groovy
println "Hello World!"

This is the example that should drive the point home. What is the difference? In the end the bytecode will be functionally the same in both the Groovy and Java cases (Notice I don't say that the bytecode would be exactly the same). The big difference is that the Java version is explicit, not only about types, but also in requiring a class definition and explicit main method. In Groovy, the type is inferred implicitly from the file name and the main method is also implied.

What about the tools man?
The final example I want to consider is the Domain Specific Language (DSL) in the two kinds of languages. Actually, I lied, I am only going to show an example of a builder in Groovy and skip showing the example in Java, because the example for Java would be far too painful to consider. I want to consider the following example because I want to get to the point about what 'explicit' versus 'implicit' means for tool developers.

Example: AntBuilder (Groovy)
// Going to zip up a file
def ant = new AntBuilder()
ant.zip( destfile:"${zip.location}", basedir:"${project.location}" )

What all is going on here? You have an instance of AntBuilder being created and a method called zip being invoked. There is an example of named parameters, which is a very cool feature for Groovy I think. Finally, you see there two GStrings (Groovy Strings, quite the name no?) with its built in templating. There is a lot going on here, someone wanting to write a tool would probably take a deep breath, but envision that he/she will be able to do it. The trick is what about the zip method. What about the zip method? Lets get to that next.

In the previous examples, the types and their signatures are just below the surface so to speak. The notation '[]' is a shorthand for java.util.List and in the case where the statements are just put into the Groovy file without a class definition, means there is an implicit class definition and static main method. In the above example, there is no such nicety. Why would I say that? I mean the variable ant is an instance of AntBuilder right? Yes, but look at the javadoc for AntBuilder, where is the method definition for zip? There is not one, but if you want to see for yourself go ahead and look, I'll wait.

The AntBuilder class is an example of a Builder in Groovy which does something like what java.lang.reflect.Proxy does. Proxy provides an InvocationHandler, which has a method called invoke() which the Java platform provides as a hook to allow you to provide runtime implementations of arbitrary types. The Groovy Builder support allows for creation of nodes and invocation of methods on those nodes. All of this means that not only is the full type definition for AntBuilder implicit, but that there is really no way at compile time to figure it out. What I mean is that there is no way for a source code tool developer to write general tools to provide the kind of language support that is available in explicitly typed languages for DSLs. In the first two examples, you could easily imagine being able to provide tooling support for those cases, the types and signatures were implicit, but they could be figured out relatively simply, there is no real way to do it in the last case.

Tooling support is an area that is usually skipped in alot of discussions on explicitly typed versus implicitly typed languages. It is dismissed by implicit proponents with, "well tooling will just catch up." I don't consider it trivial and, having dabbled a bit in providing IDE support for Groovy, I don't assume that the tooling will just be there in the future. Implicitly typed language designers and proponents need to give this its full consideration, elsewise the best your language could hope for is to be the next Lisp, praised by some and used by none.

Conclusion
So the difference between the two languages in the above examples is best exemplified by asking three questions. First, which version do you think someone new to your codebase or the Java platform would understand better? Usually an explicitly typed language like Java (or maybe Scala/Haskell) would be favored. Second, which version do you think someone with experience with your codebase or the Java platform would prefer to write? Usually, I would think that an implicit language like Groovy, Ruby or even Python would be preferred. Third, what kind of tool, particularly IDE support do you want or need? If your project thinks that the first question is more important than the second one, then an explicit language like Java should be your choice. If the second question seems more important, then an implicit language like Groovy should be your choice. The third question can only be answered by surveying what is available and trying it out to see if it is good enough.

There has been alot of discussion about the JSR-277 and OSGi versioning standards. I wont revisit the arguments here since people far more intelligent than me have lots to say on that matter. If you want to dive in, go ahead and Google it, and I will talk to you later since you will have forgotten to go back to this blog entry once you are done. For the rest of you, I have to beg your indulgence, I am going to assume that I actually have the qualifications to pontificate on this matter. So here goes.

Defining Versioning
Why do you want to version something? There are alot of reasons for it, particularly when it comes to Source Control Management (SCM) or Version Control. You need to know the version of a particular class that you are working on in that case. Using version control when you are working on software is an idea that goes without saying. There are no debates on versioning or the formats of those versions, you get the one that comes with the SCM you are using, period end of story. No, the issue comes up when you are considering versioning a component, which for OSGi/Eclipse is a bundle and for JSR 277 is called a module. So what does a version stand for in this case?

I like the wikipedia definition of a version in this case. It is pretty generic, but it goes something like this, "Software versioning is the process of assigning either unique version names or unique version numbers to unique states of computer software." The definition tries to cover all cases, so the point to focus in on is the unique identifier of the unique state of the software. So what kind of state are we trying uniquely identify? Well changes to the software you as a developer have made of course, but I think that they fall into two categories at least at the component level. One category are those changes that affect dependency management because the external API on which clients rely has been altered. The second category, I will term bookkeeping for this blog entry. I term it bookkeeping, because it denotes internal changes that should, notice I say should, not affect clients using the external API. "Bookkeeping" comes into play because Software developers are human, and even though a change should not impact clients adversely, they still do. "Bookkeeping" lets a system integrator/admin note that a version of the component X worked, but X+1 sure did not, which is exactly the information someone providing support wants to hear. It is really difficult to say one category matters more than the other. Changes in the first category will prevent a system from coming up correctly, whereas changes in the second category could help explain why the system later goes haywire.

I like to view a version as a shortcut, so that a client does not have to check the full external API at runtime to know if there are changes that it should concern itself with. In fact, ideally the "external API" version should be some sort of "hash", if you will, of the external API that could be automatically maintained. More on the ideal answer later. The "bookkeeping" component is in effect like a branch in version control, it lets the developer go back and retrieve the previous version when problems come in from the users.

I could go on about how many coordinates are needed for this or that or what have you, but there are many many people far more qualified to comment on this and, you know what? They have commented, so if you are interested, go Google for them and check them out. What I am going to do is describe what in my ideal world versioning would do for me.

Versioning: James Ervin's Happy Place
I am not a big Adam Sandler fan, but I am a guy, so I will say that in the movie "Happy Gillmore", I was moved by Happy's idea of his "Happy Place." So what I am going to describe is my "Happy Place" for Software Versioning, though I will say that the "Happy Place" described in the movie is far superior to the one I will go into.

First off, any versioning scheme on the component level has to take into account that all software changes fall into one of the two orthogonal categories that I defined above, they are either external API or, for lack of a better word, internal "bookkeeping" changes. Any tooling support should be able to distinguish.

Secondly, I would like the notion of backward compatibility to go away. You change the external API, you create incompatibilities period. Incompatibilities between versions are a two way street. I mean if you make so called "backward compatible changes", then you shouldn't break existing clients, well that's great right? The problem with this view is that it assumes that your new stuff will be automagically available everywhere, which really does not take a full lifecycle view of your component into account. What you do is allow for a set of clients to be written that will be unable to use your pre-existing components. I don't have a great answer for this, but I would like people to take a greater view into account, all external API changes could force incompatibilities, not just ones that break backward compatibility. Once external API is defined, you need to treat it like set cement. How to potentially better deal with set cement? I try and deal with it in the next point.

Thirdly, I would like better API tooling support. This comes from point two, I want better tooling to help with what is in effect an intractable problem. First I would like to mention I know about the new Eclipse API tooling push and I want to investigate it thoroughly later on. I think it is worth mentioning that I view tooling support here in two categories again, what I would call "static support" and "client feedback support".

Static Support analysis in my view is what the Eclipse API tooling effort is focused on. Give me warnings if I add things to the external API to change the "external API" component of the version and then provide an automatic way to update it. Give me an error/warning to update the "external API" component of the version if I change an existing method or other backward compatible breaking change. Tell me to update the "bookkeeping" component of the version when I make any significant changes to the component that does not impact the external API. In other words, give me some help to make it more automatic. I, as a human being, have and will continue to forget to update versions without warning/errors markers to guide my way.

Before I go into the second category, I want to emphasize again that this is my happy place, my ideal world. "Client Feedback" tools to me are a potential answer to the problem of external API being set concrete after released in the wild. The trouble is that most of the time a whole API is not in use, usually only a subset, but what subset? That information would be invaluable. It would allow a component developer to know what parts of the API could be eliminated or given better documentation so that more clients would use it. "Client Feedback" would provide the targets for refactoring and could prevent the kind of cruft that eventually dooms any software product. How do you best get this feedback? Some sort of tooling support of course and beyond that I dunno. Did I mention this was my "Happy Place"?

Conclusion
So what do I want in a versioning scheme or tooling support? Prevent me from making simple obvious mistakes that I have done again and again. I want support to make me take into account that when a component is released, people write clients to use it and I have to keep them in mind for any changes, period. I think what I really want is the Staples Easy Button, which if you know me, is a metaphor for an impossible to build tool. Still there is alot that can be done before we need an 'Easy Button' to make this problem less problematic.

Iacobus

Monday, June 23, 2008

Don't call it a static or dynamic language!

Monday, June 16, 2008

Versioning: My Happy Place

About Me

Links

Blog Archive

Labels

James E. Ervin's shared items

Twitter Updates