as types, expressions or statements). When the trans-
formation is applied to a particular parse tree s, rules
are tested to determine if they match s, if so, the first
matching rule is applied to s. Rules are grouped into
rulesets, based on the syntactic categories of L
1
.
For example, some rules for translating OCL
(OMG, 2014) types to Java 7+ could be:
OclType::
Integer |-->BigInteger
Real |-->BigDecimal
OclAny |-->Object
Boolean |-->boolean
String |-->String
Set(_1) |-->HashSet<_1>
Sequence(_1) |-->ArrayList<_1>
Map(_1,_2) |-->HashMap<_1,_2>
Variables 1, ..., 9 represent subnodes of an L
1
syn-
tax tree node s. If s matches an LHS containing vari-
ables, these variables are bound to the corresponding
subnodes of s, and these subnodes are then translated
in turn, in order to construct the subparts of the RHS
denoted by the variable name.
Thus for the above ruleset OclType, applied to
the OCL type Map(Integer, String), the final rule
matches against the type, with 1 bound to Integer
and 2 bound to String. These are translated to
BigInteger and String respectively, and hence the out-
put is HashMap < BigInteger, String >.
The special variable ∗ denotes a list of subnodes.
For example, the rule
Set{_*} |-->Ocl.initialiseSet(_*)
translates OCL set expressions with a list of argu-
ments, into a corresponding call on the static method
initialiseSet of the Java Ocl.java library. Elements of
the list bound to ∗ are translated according to their
own syntax category, and separators are preserved.
Conditions are a conjunction of predicates, sepa-
rated by commas. Individual predicates have the form
_i S
or
_i not S
for a stereotype S, which can constrain the kind of
element bound to i. For example, the type of i can
be tested by using stereotypes Integer, Real, Boolean,
Object, Sequence, etc.
A ruleset r can be explicitly applied to variable
i by the notation i‘r. ∗ ‘r denotes the application
of r to each element of ∗. This facility enables the
use of auxiliary functions within a code generator. In
addition, a separate set of rulesets in a file f .cstl can
be invoked on i by the notation i‘f .
By default, if no rule in a ruleset applied to source
element s matches to s, s is copied unchanged to the
result. Thus the rule String 7−→ String above is not
necessary. Because rules are matched in the order of
their listing in their ruleset, more specific rules should
precede more general rules. A transitive partial order
relation r1 @ r2 can be defined on rules, which is true
iff r1 is strictly more specific than r2. For example,
if the LHS of r2 and r1 are equal but r1 has stronger
conditions than r2.
C S T L is a simpler notation than template-based
code generation formalisms, in the sense that no refer-
ence is made to source or target language metamod-
els, and no interweaving of target language text and
code-generation language text is necessary. The tar-
get language syntax and the structure of the source
language grammar need to be known, in order to write
and modify the rules.
C S T L has been applied to the generation of Swift
5 and Java 8 code, to support mobile app synthe-
sis (Lano et al., 2021a). It has also been applied to
natural language processing and reverse-engineering
tasks. However, a significant effort is still required
to define the C S T L rules and organise the transfor-
mation structure. In the next section we discuss how
this effort can be reduced by automated learning of
a C S T L code generator from pairs of corresponding
source language, target language texts. This removes
the need for C S T L users to understand the details of
the source language grammar.
3 SYNTHESIS OF CODE
GENERATORS FROM
EXAMPLES
The goal of our machine learning procedure is to auto-
matically derive a C S T L code generator g mapping a
software language L
1
to a different language L
2
, based
on a set D of examples of corresponding texts from L
1
and L
2
. The generated g should be correct wrt D, ie.,
it should correctly translate the source part of each
example d ∈ D to the corresponding target part of d.
In addition, g should also be able to correctly
translate the source elements of a validation dataset
V of (L
1
, L
2
) examples, disjoint from D.
We term this process code generation by-example
or CGBE.
Thus, from a dataset
Integer int
Real double
Boolean boolean
Set(Integer) HashSet<int>
Code Generation by Example
85