Sunday, 15 January 2012

Scripting Language with Thundax P-Zaggy

In this article I will show you the approach taken to solve a common problem: Creating your own Scripting Language using a compiler. Sometimes we need to create our own language and to achieve that we either can create from scratch a lexical analyser in a high level language or use an open source automatic generator of lexical analysers. As I do not want to spent too much time in this task, I obviously chose the second one using an easy approach using JFLEX and CUP. There are many to choose from, e.g. Antlr, JavaCC, SableCC, Coco/R, BYacc/J, Beaver, etc. but I like the way Cup/Jflex work together. Jflex will automatically generate the finite automata of the lexical analyser through the regular expressions that define the tokens from a language. CUP is a system written in Java used to generate LALR syntax analysers. Cup file will contain the syntax definition of the language (Grammar) using a notation similar to BNF (Backus-Naur).
This approach will allow me to create a small compiler which will contain a lexical analyser, a syntax analyser and a semantic analyser without having to add extra workload to my simple Language builder. Cup/Jflex compiler will handle whether the language is well defined or not and in my Delphi project I only have a dummy function that will run if the previous compiler returns 0 errors.
I have also taken advantage of the SynEdit multi-line edit control to enhance the visualization of the simple Scripting Language. The exposed example is pretty basic and it will help to keep the grammar out of the Delphi code. I have tried other Delphi alternatives like GOLD parser and Coco/R but I did not get the expected results. So, If you have any better idea, please do let me know!.
This example will allow the user to define the graph by calling the layout object and connecting one node from another using a numeric description. In left side figure you can actually see the basics of the language and its definition. It invokes the layout object and then it generates a connection from the node(1) to node(2) and it ends the sentence with ";" character. There is no need to define the nodes as if a node does not exist it will be automatically created. The compiler created will manage all kind of errors and warnings and it will inform the user about them. Check that it will inform whether you are committing a lexical error (non recognized symbols, etc) and syntax error (bad composed expressions) through an Error or a semantic error (duplicated lines) through a warning.
A most developed example is shown in the following figure:
Once the language is free of errors, the generator is enabled and P-Zaggy can build the graph using the instructions previously defined and analysed by the external compiler. Note that to be able to check the language you must have installed Java.

The following graph have been created using the basic language:
As you can see it is faster than placing all the objects with the mouse and we can use external tools to generate the script and then check them with the compiler.

Here you can get the latest version of the app:


Jflex regular expressions:

<YYINITIAL>layout {
  if (CUP$parser$actions.verbose)
    System.out.println(yytext());
  pos += yytext().length();
  return new Symbol(sym.layout, new sToken(pos - yytext().length(), yyline, yytext()));
}
<YYINITIAL>node  {
  if (CUP$parser$actions.verbose)
    System.out.println(yytext());
  pos += yytext().length();
  return new Symbol(sym.node, new sToken(pos - yytext().length(), yyline, yytext()));
}
<YYINITIAL>\( {
  if (CUP$parser$actions.verbose)
    System.out.println(yytext());
  pos += yytext().length();
  return new Symbol(sym.OpenBracket, new sToken(pos - yytext().length(), yyline, yytext()));
}
<YYINITIAL>\)  {
  if (CUP$parser$actions.verbose)
    System.out.println(yytext());
  pos += yytext().length();
  return new Symbol(sym.CloseBracket, new sToken(pos - yytext().length(), yyline, yytext()));
}
<YYINITIAL>tonode {
  if (CUP$parser$actions.verbose)
    System.out.println(yytext());
  pos += yytext().length();
  return new Symbol(sym.tonode, new sToken(pos - yytext().length(), yyline, yytext()));
}
<YYINITIAL>parameters  {
  if (CUP$parser$actions.verbose)
    System.out.println(yytext());
  pos += yytext().length();
  return new Symbol(sym.parameters, new sToken(pos - yytext().length(), yyline, yytext()));
}
<YYINITIAL>\. {
  if (CUP$parser$actions.verbose)
    System.out.println(yytext());
  pos += yytext().length();
  return new Symbol(sym.dot, new sToken(pos - yytext().length(), yyline, yytext()));
}
<YYINITIAL>\; {
  if (CUP$parser$actions.verbose)
    System.out.println(yytext());
  pos += yytext().length();
  return new Symbol(sym.endSentence, new sToken(pos - yytext().length(), yyline, yytext()));
}
<YYINITIAL>[0-9]+ {
  if (CUP$parser$actions.verbose)
    System.out.println(yytext());
  pos += yytext().length();
  return new Symbol(sym.INTEGER, new sToken(pos - yytext().length(), yyline, yytext()));
}

[ \t\r\n]+  { pos = 1; }
[a-z]+          {
  System.out.println("ERROR at line "+(Integer.valueOf(yyline)+1) + ": '" +yytext() + "' not recognized");
  pos += yytext().length();
  return new Symbol(sym.ff, new sToken(pos - yytext().length(), yyline, yytext())); }
.   {
  System.out.println("ERROR at line "+(Integer.valueOf(yyline)+1) + ": '" +yytext() + "' not recognized");
  pos += yytext().length();
  return new Symbol(sym.ff, new sToken(pos - yytext().length(), yyline, yytext()));
}

Cup grammar:

terminal        sToken  layout, node, OpenBracket, CloseBracket, tonode, parameters, dot, endSentence, ff;

terminal  sToken          INTEGER, ITEM, QUOTE;

non terminal  scriptClass graphLine;
non terminal    sToken          itemBracket, itemReturned;

graphLine ::=  layout : t1
   dot : t2
   node : t3
   itemBracket : e1
   dot : t6
   tonode : t7
   itemBracket : e2
          endSentence : t10
          graphLine
                        {:
    scriptLines.AddTokens(e1, e2);
    RESULT = scriptLines; :}
          | error;

itemBracket ::= OpenBracket : t4
   itemReturned : e1
   CloseBracket : t5 {: RESULT = (sToken)e1; :}
                        | error;

itemReturned ::= INTEGER : e1 {: RESULT = (sToken)e1; :}
                 | error;

Related links:

0 comments:

Post a Comment