XCL Programming language

Constantine Plotnikov

Abstract

This document describes XCL programming language. This document is under construction.

Table of Contents

1. Introduction
2. Language Architecture
Overview
Lexical layer
Segment layer
Basic concepts
A segment parser architecture
Syntax description
Term layer
3. XCL lanaguage constructs
Overview

Chapter 1. Introduction

XCL stands for eXensible Capablity Language. This language is extensible and allow to introduce new programming language constructs and this based on capablity security model.

This language is designed to reflect Sebyla platform and makes all its features avaialble. If something is valid construct that will pass verifier, then it could be expressed in the XCL.

Chapter 2. Language Architecture

Overview

Syntax is based on extensible langauge architecture that is being developed as part of XC4Jproject.

There are following layers in langauge design.

Lexical layer

Segment layer

Basic concepts

The focus of the segment syntax is a simplification of error recovery and editing. Syntax is pretty much like python's. One of the goals of the syntax is to allow spliting text to some logical units without knowledge of syntax definition for file. Grammar is specified above those logical units.

The role of syntax is make it suitable for using in programming languages. And it could be considered as one of step in derection of development sgml-like (sgml were originally targeted for writing document by author directly, xml removed much of those authoring features like optional ending for elements with content for purposes of toll simplicity) for programming languages. While simplicity of processing by tools were considered by this specification, it has lower priority to convinence for text writers.

There are following units that used in parsing process.

significant tokens

these tokens that are potentially interesting to parser, this exclude whitespace, comments and many other things

ignorable tokens

these tokens should be not interesting to final parser, this group includes some tokens that were already interpreted by line parser like "{" "}" tokens, whitespace and new line tokens.

segments

these usually correspond to statements in programming languages, they consists from tokens, brackets, and blocks. seqment usually takes one line. Segment is terminated by new line if there are no unfinished constructs like block or bracket that are on this segment or last token is not like continuation token.

blocks

blocks consists from one or more segments. blocks are specificed by "{" "}"

brackets

brackets consists from open bracket and close bracket tokens and tokens and blocks between them. brackets should always match in the syntax, it is not possible to introduce construct with unmatched brackets. "{}" are reserved for blocks.

processing instructions

These are instructions that are given to consumer of line parser. Line parser ignore those instruction altogehter.

source file

source code is zero or one segments, so empty source code is allowed.

A segment parser architecture

Segmet parser is a filter between lexer and grammar based parser. Its task is to simplify work of real grammar parser. Segment parser annotates some tokens are ignorable and return other tokens from lexer as is. Segment parser also inserts following control tokens to token stream. The specification is expressed as description of work of such filter. But real parser could be organized in other way.

This document describe work of parser on correct text files. Processing of files that contain errror is out of scope for this specfication

segment start

this token indicate start of segment

segment end

this token indicate end of segment

block start

this token indicate start of block

block end

this token indicate end of block

braket start

this is annotated lexer token that indicate that bracket started

braket end

this is annotated lexer token that indicate that bracket ended

significant

this is annotated lexer token that indicate that tokens should be processed by higher level parser as part of segment.

ignorable

this is annotated lexer token that indicate that token could be ignored by higher level parser.

processing instruction

this is annotated lexer token that indicate that token should be processed by higher level parser but is not part of normal token stream.

On bracket starts and bracket ends no token are inserted into token stream as lexer level tokens bring enought information.

The syntax is heavyly based on notion of physical line and position inside line

While segment syntax do not require lookahead more then one token, it could be required to look at history for dermining some outcomes. For example it need to track nesting of blocks brackets and segment. Some fragmets use look ahead more then one token for specfication, but this could be emulated by states, it were ensured in the specfication that no nested states as

Syntax description

This section is overview of segment syntax. It assumes that like in most editors columns are numbered from 1.

All whitespaces newlines and comments are always ignorable tokens. lines that contain only ignorable tokens are ignored. This tokens are reffered collectively as space tokens.

Source consist from zero or one segments that starts at column 1. It is an error if there is segment encountered at column greater then 1.

Simpliest segment is a physical line consisting from significant tokens. First non space token indicate start of segment. New line ends current segment. For example:

// valid line
a b c "sdsdfsa" 1 1E10F 2#0.101001# 16#7FFF# 

Segment could contain brackets. Brackets are always matched in the segement. It is an error to have unmatched brackets in the segment.

"{" indicate start of the block. "}" indicate end of block, block contains segments. segments are terminated by unescaped new line. sgments could be also terminated by ";". Indentation is ignored.

a {
 b {
  1
  2
 }
 c 
a(b{  
 a{ 
  d
  w
 }
 c{
  d
  q})

These rules support both fuctional style programming:

// a weird sample for factorial :)
// there are no real need add 1 then 
// substract it except for demonstration
// of language features.
define factorial(n) {
  if n < 0 {
    nil
  } else {
    let rc = if (n = 1 or n = 0) { 
        1 
      } else { 
        factorial(n-1)*n 
      } + 1
    print rc-1
    rc-1
  }
}

and java like inner classes:

// hellow world sample
class public HelloWorld {
  method public static main(args:String[]) {
    // heavy way to print hello world
    new Thread(new Runnable() {
      method public run() {
        System.out.println("Hello, World!")
      }
    }).start() 
  }
}

While end of line usually ends current segment if there are no unmatched block starts and bracket starts, but there are some exceptions.

Phisical lines could also form one logical line with continuation symbol "\<new line>" continuation could not break valid tokens.

 0 \
   +1 \
   +2 \
   +3

There could be only new line after "\". It is an error to encounter any other kinds of token.

Other way to continue a segment are brackets, if there are unmatched brackets between start of current segment and new line, then new line will not end the segment. For example

 (0
   +1 // line comment
   +2 // another line comment
   +3)

Term layer

Chapter 3. XCL lanaguage constructs

Table of Contents

Overview

Overview

XCL langauge is most similar to E and C# languages.