Skip to content
ClemsonRSRG edited this page Oct 24, 2012 · 20 revisions

I've long said I'd like to change RESOLVE's syntax. Murali asked the other day what changes I'd like to see, and I didn't have a good answer for him. So here are some thoughts after reflection:

Design Principles

For the most part, my thoughts on syntax are not vague intuitions, but derive from general ideas about how programming language design should be done. These include:

  • Goal Oriented - This derives from the 80/20 principle: 80% of the good we can do will come from 20% of our features. We are not trying to make a revolution in programming syntax. Wasting effort of things unrelated to verification is not in the interest of we ourselves, nor the adoption of our programming language. All syntax should support our goals of making specification and verification possible and commonplace. This goal trumps all others. No syntax justification is permissible if it contradicts these goals.
  • Consistent - Bruce Tognazzini, considered by many THE expert on computer interface design (and a programming language is an interface for programmers), lists consistency amongst his first principles of design. He states that the most important things to be consistent are invisible things—which includes EVERYTHING in a programming language (which has no innate GUI). Consistency reduces cognitive burden on the programmer both during writing (because she need not make "choices" about irrelevancies) and reading (because different programmers can't do the same thing different ways.)
  • Familiar - The top five programming languages are, in order: C, Java, Objective-C, C++, C#, and PHP. Where we are doing something legitimately different, it makes sense to use new syntax. But when we are doing something "normal" we should use the normal syntax. This is respectful to our programmers time while making it as easy as possible to transition to RESOLVE for the maximum number of programmers. It is also respectful of our own time—we do not have the time or desire to run complicated user studies; the court of public opinion has already settled on a relatively consistent syntax.
  • Brief - the programmer's primary focus is on her content. What stands out in a language's syntax should be the logic, and in our case, additionally, the mathematical specification (Larry Wall, lead developer of Perl, has a good bit about this in his article Programming is Hard, Let's Go Scripting... under the Symbolic/Wordy subsection). Anything that is not my logic is a necessary evil, or, far too often in RESOLVE, an unnecessary distraction. Consider end Stack_Template. What possible value does this add? Add to this that programmers pitch a fit when introduced to a language that doesn't have i++ because writing i = i + 1 is a chore, and syntax should be brief and stay out of the way.
  • Not Redundant - Ironically, this is really just a different way of saying brief. But it bears saying explicitly because the RESOLVE language loves to make you repeat yourself. Programming languages should be optimized for reading and editing. And each time the developer must repeat herself, it's just another frustrating thing she has to remember to track down and fix up when she inevitably decides to change her code.
  • Sane by default - (also called convention over configuration) anything for which an obvious default exists, should be optional--the need for this rises with the "weirdness" of the operation. This is invaluable since it allows for new programmers to learn RESOLVE in stages, rather than all at once, mastering a new feature when it becomes relevant rather than because the syntax demands that a hoop be jumped through

Recommendations

With these things in mind, some specific recommendations. Organized roughly from least to most extreme.

Establish a Useful Capitalization Convention

Why do we use both camel case and underscores? Array_Based_Implementation_rb. This violates Brief and Not Redundant. Furthermore, why do we capitalize keywords? No other modern language does this ( Familiarity ) and for good reason: I'm supposed to be focused on my logic, not my keywords, so really this if Brief.

Naming convention should help distinguish "kinds of things." The more likely it is that two things could be confused, the more important to give them different conventions (this is why it's ok that variable names and methods have the same convention in Java.) On a lark, here's a possible scheme in RESOLVE:

Mathematical Modules     Some_Mathematical_Module
Programmatic Modules     Some_Programmatic_Module

I know I just said I don't like both camel case and underscores, but here there's a reason: it doesn't waste a "good" naming convention on something more actively useful like function or variable names. Since modules are "top level" things, there will be the fewest of them and some verbosity is ok. I don't see any reason why math modules and programmatic modules should be easily confused (they're mostly just used to import things), so I use the same convention.

Mathematical Variables   someMathematicalVariable
Mathematical Types       SM_MTH_TP
Mathematical Functions   some_mathematical_function

It's crucial that these three things look VERY different from each other because there is no syntactic difference between them. Both types and functions are normal mathematical values. The naming convention I've chosen for Mathematical Types is "extremely succinct, all caps" to be consistent with normal math names like "N" and "B" and "NP".

Mathematical Functions   SomeMathematicalFunction
Programmatic Operations  someProgrammaticOperation

It's important to draw the line between mathematical and programmatic functions, as this is often a source of confusion for new programmers and is something we consider an "important" distinction in resolve. This convention makes it clear at a glance if the thing is a math or programming function, while still making them look similar. It opens the door to having an operation copy that implements a mathematical function Copy.

Programmatic Types       SomeProgrammaticType
Programmatic Variables   someProgrammaticVariable
Programmatic Operations  someProgrammaticOperation

First, note that programming variables and mathematical variables are identical. This is necessary, since it would often be difficult to say "what kind" of variable a particular named value is. For example, is maxDepth in Stack Template a programmatic variable or a mathematical variable? It appears all throughout the spec alongside math things, so it'd be easy to rule it a "math" variable, but in Array Realization it's used to dimensionalize an array, so clearly it has concrete, programmatic value. It's both, depending on where and how we use it--so we can't use convention to distinguish.

As with Java, it's clear in programming world when you're using a variable name and when you're using a function name (hint: the latter always ends in ()), so no need to distinguish.

It's always clear if you're referring to a class or a variable by syntactic slot (i.e.:)

Var Stack : Stack;

is unambiguously a variable named Stack of type Stack. But unlike functions, there's no visual distinction to reinforce the "classness" of a class name. We use the Java convention of initially-capitalized camel case.

So, an example:

facility Stack_Experimentation {

    //What counts as a "type" is hazy--STR is also a math function, but I use
    //the definition "if it returns a type".  Arguably, things like N and B are
    //just constant functions that return a type.
    definition IsEmpty(s : STR(MTYPE)) : B = (s = empty_string);

    operation isEmpty(restores s : Stack)
        ensures isEmpty(s) = IsEmpty(s)
    procedure {
        return isEmpty.size() = 0;
    }
}

Type parameters are either special or they're not

In programming mode, we say Concept X(type T, evaluates I : Integer). If types are just normal values (hint: in programming mode they're not), let's say Concept X(evaluates T : Type, evaluates I : Integer) (Consistent). If types are not normal values, let's use the same type parmeterization syntax as the two popular languages that support it (C++ and Java): Concept X<T>(evaluates I : Integer) (Familiar).

"Normal" Comments

I actually personally prefer (* this style of comment *). It's what's used in a number of the functional languages and I think it reads better to the uninitiated, since it properly indicates "parenthetical comment".

That said: we're not trying to make a revolution in comments (Goal Oriented) and they look "cute" to initiated programmers, who are our target user. Let's stick to //Normal comments and /* Normal comments */ (Familiar: every one of the top five languages uses this style comment).

Curly Brace Syntax

Oh how I hate RESOLVE block ends. end Stack_Template neatly sums up everything I hate about RESOLVE's syntax. Every one of the top five languages uses curly braces (Familiar), which ends a block in a single character, whereas we put the programmer through a whole damn sentence (violating Brief). We compound that badness by making her REPEAT herself, so that when she decides to change the name of the concept later, she inevitably forgets to change it at the end the compiler yells at her (why!?!?!? The compiler's supposed to help me find logical errors!!!! Why add EXTRA HOOPS!?!?!?) (violating Not Redundant). And then we FURTHER compound this badness by making it optional (violating Consistent).

"Normal" Returns

As a modern programmer, I only know of one language that does returns as:

Function Foo() {
    Foo = 5;
}

And that's BASIC. Let's use the same return syntax as all top 5 programming languages.

"Normal" Formal Parameter Declaration

In operations and procedures, we permit variable declaration lists like alters X, Y, Z : Thing. I happen to like this syntax: it is undeniably Brief. But of course I can also write alters X : Thing; alters Y : Thing; alters Z : Thing, or alters X, Y : Thing, alters Z: Thing, which doesn't change how the operation is called (violating Consistent), and it makes our parameter-separator the semi-colon (violating Familiar), so I say it's not worth it since it in no way changes the process of verification (Goal Oriented).

Additionally, the alters X : Thing syntax is weird (violating Familiar) and even violates our own party line about "Programming should look like programming and math should look like math." I'd change the syntax to alters Thing X, which bring it in line with 3 of the 5 top programming languages (PHP doesn't declare types and Objective-C uses (Thing) X).

"Normal" variable declaration

Var X : Thing; is excessively verbose (violating Brief) and reminds the modern programmer of Javascript (a toy language). Thing X; is consistent with 4 of the 5 top programming languages (PHP does not have variable declaration).

Fewer semi-colons

Semi-colons are the very definition of a necessary evil. Anywhere they are not syntactically necessary, they should be removed (Brief) except in places where a full programmatic thought has clearly terminated (Consistent). Semi-colons should not be used at multiple levels of syntax (such as "to separate operations" and also "to separate parameters" and also "to separate parts of the operation declaration"), which makes it confusing what exactly it is they separate.

"Normal" imports

I'll confine myself to saying that every one of the top five puts their imports at the top of the file (Familiar) using either include or import as the keyword. I think this is an extra win for us, since it disentangles imports from verification machinery like module-requires clauses and operation parameters (Goal Oriented), highlighting that machinery.

Module naming convention

I do not have files called "Hampton_Picture.jpg", "Apartment_Picture.jpg", or "Screenshot_Picture.jpg" on my hard drive. Nor do I have "Dissertation_Document.doc" or "Short_Story_Document.doc". There is definitely no "index_Webpage.html" or "Stack_Java_Program.java". This is ridiculously verbose (violating Brief) and redundant with the file extension (violating Not Redundant), it also stealthily violates Sane Defaults, since I may assume, by default, that something that ends in .fac is a Facility.

This problem is so bad that the group has suggested introducing a mechanism by which an alternate name for modules could be introduced, which seems the height of madness: if we know our names suck, why don't we just fix them?

Modern operating systems permit arbitrary file extensions, not limited to three characters. Java files end in .java and .class. Webpages end in .html. The popular template language Groovy ends in .groovy.

My suggested, overall, naming convention would be things like:

Stack.concept Queue.concept Array_Based.realization Linked_List_Based.realization Flip.enhancement Naive.realization Standard_Integers.facility Reverse_Stack.driver

(I think clarity trumps Brief here, but I'd be perfectly happy with:)

Stack.con Queue.con Array_Based.impl Linked_List_Based.impl Flip.enh Naive.impl Standard_Integers.fac Reverse_Stack.drv

What the hell is a Template?

Adding to the confusion, while "Realization" files end in _Realiz.rb and "Facility" files end in _Fac.fac, "Concept" files end in _Template.co. ...!?!?!? In discussion, we in the RESOLVE group use concept and template interchangeably. There may exist some distinction--but we are not able to satisfactorily articulate it, so of course everyone (ourselves included) is confused. This violates Consistency and Brevity.

One explanation I've heard is that a "Concept" is a specification while a "Template" is a parameterizable module. So it sort of makes sense that Stack_Template is BOTH (though no explanation is provided for why a Concept is a kind of module while a Template has lesser, mere naming-convention status.

If this is the case, I claim that neither term correctly describes what we store in .co files right now. Since all our modules are parameterizable, flagging one particular kind as a "template" is extremely confusing, implying that ONLY .co files are parameterizable.

Similarly, we also have specifications (and thus "concepts") in enhancement files and private operations in realizations, so calling just .co files "Concepts" also implies that there aren't specs anywhere else.

So let's get rid of the idea of "Template" all together, understanding that everything can act as a template, and redefine Concept to mean whatever the hell it is we put in .co files right now.

Distinguished Parameter

It's been asserted to me that having a distinguished parameter fundamentally changes the semantics of the language, but I don't think that needs to be the case. In Python, the full semantics for a distinguished parameter is as follows: X.Y(Z) === Y(X, Z). I see no reason why we couldn't do that. It's true that this violates Consistent, but I think the win in Familiar more than makes up for it. I might further complicate it by saying that on the implementation side, rather than doing what Python does, asking the implementer to have a first parameter named "this", we simply make that parameter implicit. The specifics of this may require some thought since our concepts permit functions for manipulating more than one type. But at absolute worst, X.Foo(Y)/Operation Foo(this : Thing, Y : OtherThing) is worlds better than Foo(X, Y) in a supposedly object-oriented language (as far as I'm concerned, "object-based" is a silly distinction we make to pad out papers).

"Normal" array declaration syntax, always indexed from the same place

Our array declaration syntax is extremely verbose (violating Brief), very unusual (violating Familiar), and permits arrays that are indexed from... well... anywhere (violating Consistent).

Here's a question, you've got an array A. I'll give you a Print() method that will print out an element. Print out it's contents.

Well, you can't. You have to go find its declaration. And it's declaration might not even be in the same module. Flexibility is great, but consistency is better. Index from 0 please, like all 5 of the top languages.

Optional, Renamed Parameter Passing Modes

I like our parameter passing modes, but they are confusing to new people (violating Familiar) and redundant with the requires/ensures clauses (violating Not Redundant). That said, this is a case where I think the facility of the syntax justifies is presence--having a "gloss" of the basic in/out behavior is very useful.

They should, however, be renamed. There's no clear reason why the English word updates and the English word replaces should mean anything different. Certainly replace doesn't actually replace the value. It alters. But alters alters. The choices seem arbitrary and are thus confusing. I've been working with RESOLVE for 6 years and am in the 99th percentile of programmers and I still have to stop and think about what the hell these words mean sometimes.

My personal recommendation would be in, out, and inout, due to Familiarity. clears makes good sense (though initializes might be more accurate), as does preserves. I understand that others don't like these keywords because by one definition they aren't really accurate. I don't really agree--notice that earlier when I (in the 99.999999th percentile of programmers and RESOLVE users) chose, of all English words, "in/out behavior." I think this clearly communicated to everyone what I meant. Which is the point.

Regardless, parameter passing modes should be optional (Sane Defaults). And when not provided, the semantic becomes "all information about how this parameter is used is deferred to the requires and ensures clause of this operation", i.e. the default mode is updates/inout.

Reduce verbosity of module headers

Realization X of Y for Z is ridiculously verbose (violating Brief) and redundant with the declaration of Y, which must also state Enhancement Y for Z, (violating Not Redundant). At bare minimum, the necessity to state the concept in an enhancement realization should be removed.

Ideally, I would use convention to remove such associations altogether. A concept introduces itself as Concept X like normal. But realizations are kept in a folder in the directory shared with X.co called realizations and therefore only say Realization Y. We know what it's a realization of, because it's in the source hierarchy of X. Similarly, there is a folder named enhancements with folders named for each enhancement, each containing a single E.en written internally as simply Enhancement E, with a folder called realizations that contains realizations of the enhancement, each introduced simply as Realization R. (Sane Defaults, though in this case it is not a default--it's a requirement)

So the stack hierarchy might look like this:

Stack.con
realizations/
  .  Array.impl
  .  Clean_Array.impl
  .  Linked.impl
enhancements/
  .  Flip/
  .  .  Flip.enh
  .  .  realizations/
  .  .  .  Naive.impl
  .  .  .  Swapping.impl

In which case Swapping.impl would simply open:

Realization Swapping {
   ...
}

It's clear from context what it's a realization of.

Examples

Taking all these things into account, here is what my ideal RESOLVE would look like:

Stack.con:

imports Integers, String_Theory;

concept Stack<T>(evaluates Integer maxDepth)
        requires maxDepth > 0 {

    /* I really hate "exemplar".  We need a whole new keyword for these three lines?
     * And one that's presumably an English word that no English speaker knows what
     * it means?  So I'm saying "we'll use the first letter from the type name" as
     * a SANE DEFAULT. */
    family Stack is modeled by STR(T)
        constraint |S| <= maxDepth;
        initialization
            ensures S = emptyString;

    operation push(in T e) 
        requires |this| < maxDepth
        ensures  this = <#e> o #this;

    operation pop(out T r)
        requires |this| != 0
        ensures #this = <r> o this;

    //Parameter mode in front of "operation" is the mode on "this".  As with
    //all other parameter mode, the default is "inout", so there was no need
    //for it above
    preserves operation Integer depth()
        ensures depth() = |this|;

    preserves operation Integer rem_capacity()
        ensures rem_capacity() = (max_depth - |this|);

    clears operation clear();
}

realizations/Array.impl

realization Array {

    type Stack is record
            T[max_depth] contents,
            Integer top
        end
    convention
        0 <= s.top < max_depth
    correspondence
        //Originally this started with "Conc.S = ", but
        //what else would go on the left-hand-side of the equals here?
        //A SANE DEFAULT gets rid of another mostly-useless keyword.
        //I imagine correspondence clauses instead expect the type of the
        //mathematical model of the type in question
        reverse(concatenation i : Integer,
                    where 0 <= i < S.top, <S.contents(i)>);

    procedure push(in T e) {
        e :=: s.contents[this.top];
        this.top++;
    }

    procedure pop(out T r) {
        this.top--; 
        r :=: this.contents[this.top];
    }

    preserves procedure Integer depth() {
        return this.top;
    }

    preserves procedure Integer remaining_capacity() {
        return max_depth - this.top;
    }

    clears procedure clear() {
        this.top := 0;
    }
}

enhancements/Flip/Flip.enh

enhancement Flip {
    operation flip()
        ensures this = reverse(#this);
}

enhancements/Flip/realizations/Swapping.impl

realization Swapping {

    procedure flip() {
        Stack this_flipped;
        T next_entry;

        while (this.depth() != 0
            maintaining #this = reverse(this_flipped) o this
            decreasing |this|
        {
            this.pop(next_entry);
            this_flipped.push(next_entry);
        }
        this_flipped :=: this;
    }
}
Clone this wiki locally