-
Notifications
You must be signed in to change notification settings - Fork 131
Expand file tree
/
Copy pathChanges
More file actions
568 lines (448 loc) · 22.9 KB
/
Changes
File metadata and controls
568 lines (448 loc) · 22.9 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
***********************************
Changes to the NIST Express toolkit
***********************************
Don Libes, libes@cme.nist.gov
Last revised: 19-Aug-1992
POOP adj. (Acronym for Post-OOP) A paradigm (q.v.) long
awaited by many. Also, reminiscent of the sound made by
the collapse of an overinflated balloon.
OVERVIEW OF CHANGES
The bad news is: Much has changed. You will not be able to recompile
applications without changing them.
The good news is: The system is faster. Much faster. And the library
is based on the Express DIS, and implements everything needed to do
full resolution of all features of Express.
Until formal documentation is written, you will have to look at the
code. The good news is that the code is much much shorter and
cleaner. The bad news is that I left in some of the original code as
comments, so you may be distracted by this.
I have converted over two pieces of programs that depend on the
library. exp2cxx (in ~pdevel/src/fexp2cxx) and the step parser
(in ~pdevel/src/fstep). Since I didn't write either one originally, I
don't take credit for the overall readability, but they at least
provide proof that the library functions.
Here is an overview of what's changed.
- The overall structure has been changed to allow easier interfacing
and more customization. Even sophisticated applications can use the
default main now. To use the default main, define EXPRESSinit_init
as:
void EXPRESSinit_init() {
EXPRESSbackend = your-backend-function-goes-here;
}
Other hooks can be found by looking at the true definition of main.
- The OO system is gone. Everything is pointers to real structures
rather than "objects". This is what accounts for much of the speed
improvement. Debugging is easier, too, since you no longer have to
rely on functions to print out structures.
The downside is that some of structures have embedded unions. This
can be confusing at first, but at least the compiler and debuggers can
now understand what you are doing and help you out.
- Almost all of the functions in the old library are unnecessary in
the new one since you can access structure elements yourself now.
Nonetheless, for compatibility, I have defined replacements for the
most likely used functions. If you have a function with no
definition, either there is no counterpart, I didn't think anyone
actually used that function, or I just haven't gotten around to
writing it.
- The functions most likely to counterpart-less are some of the:
schema functions - the definition of a schema changed quite a
bit due to USE/REF and nested schemas changing)
type functions - types don't resemble those in the old
library. See more info below.
expression functions - expression don't resemble those in the
old library. See more info below.
- Error processing has been speeded up. The error messages are
greatly improved (no more overloading of a single error message for
different situations), more descriptive and much (much, much) more
error checking is done. And files are tracked now along with line
numbers for all objects.
Some specific notes can be found below.
GETTING A COPY OF FEDEX AND THE LIBRARY
**************************
Getting a precompiled copy
**************************
The fedex executable and library can be found in ~pdevel/bin and
~pdevel/arch/lib respectively. They will be regularly updated by me
as bugs are fixed. So make a copy if you want a static version.
**************************
Getting the source
**************************
To retrieve the source, link to the RCS directory, check out the
CheckOut file, and then run CheckOut itself. "make" by itself will
build an executable while "make libexpress.a" will build the library.
Here are real commands to do this:
mkdir -p ~/pdevel/src/fexpress2
ln -s ~pdevel/src/fexpress2/RCS ~/pdevel/src/fexpress2
co CheckOut
CheckOut
make
Incidentally, the name 'fexpress2' is temporary while this release is
being tested. Eventually, we will give it a better, more permanent
name.
**************************
'Libmisc' is dead, but ...
**************************
Note that the 'libmisc' library is no longer necessary. (It has been
integrated directly into the express library.) However, you still
need the the 'usual' tools in pdevel/bin and the 'usual' other
libraries in ~pdevel/arch/lib. You can change the targets in either
Makefile or make_rules as appropriate. The express directory has its
own make_rules for simplicity.
**************************
Documentation
**************************
There is none. Ok, just kidding. What there is, is a file called
Changes which you'll get from CheckOut, describing the changes from
the old version to the new version.
It is very rough. There is little consistency, although I tried for
completeness. (It's 22K.) Nonetheless, it is still an overview and
skimps on precise details of many calls. Really, it's just there to
jog my memory when I write the real documentation, or for experts
(like you) who don't want to wait for the documentation.
MISCELLANEOUS NOTES
The following are miscellaneous notes that you may find helpful -
especially because there is no other documentation. (Sorry.)
Numerous elements in the language are now resolved including:
ALIAS, RULE, QUERY
It is interesting to note that there was formerly no way to even
represent them because the libmisc package had no means to do multiple
inheritance. Steve and I talked about implementing multiple
inheritance but were convinced that it would drastically slow down
every other part of the system. This seemed a poor tradeoff
considering that we only needed inheritance from at most two
orthogonal classes.
Enumerations are now separated into different scopes. For the same
reason as above, this was formerly impossible.
======================================================================
Class x; -> Class_of_what x;
i.e.,
Class_of_Type x;
Similarly, OBJget_class is now specific to whatever class you are using.
I.e.,
OBJget_class(type) -> TYPEget_type(type)
if (class == Class_Aggregate_Type) -> if (TYPEis_aggregate(class))
Rationale: underlying type system changed completely. Class/object
system gone, but efficiently faked. Can no longer call object type
'Class'.
======================================================================
Some people assumed many functions returned const values. Many
functions did in fact return such values. Now they do not.
Rational: Most functions are now macros, returning pointers right out
of the data structures. Since these are the real objects, they are
writable.
======================================================================
Most objects returned from functions do not have to be OBJfree'd.
You will have to look at the documentation to see which ones. Thus,
OBJfree has been turned into a no-op.
Rational: Most functions now return pointers right out of the data
structures. Freeing them would corrupts the system.
If you are getting a list, call the appropriate data structure
function to free it. I.e., SCOPEget_entities_use returns a list, you
should call LISTfree to free it.
======================================================================
SCOPEget_entities_supertype_order now no longer returns USEd entities.
Use SCHEMAget_entities_use and SCHEMAget_entities_ref to get either of
these.
Rationale: At KC's request. This decision might be revisited.
Perhaps another function could be added.
======================================================================
ENUM_TYPEget_items now returns a dictionary instead of a list. Each
element is an expression of type 'enumeration' instead of a symbol.
Rationale: Efficiency.
======================================================================
DYNA_init is dead and gone. Remove all such calls.
Rationale: Hopeless nonportable and ultimately of little value.
======================================================================
The original pass1/pass2 idea has been revamped. "pass1" is now
referred to as "parse" (since that's what it is). "pass2" is referred
to as "resolve" (since that's what it is). The resolve pass actually
consists of several (currently 5) passes. The current pass number is
stored in EXPRESSpass. This number is really only useful for
debugging purposes.
EXPRESSparse prefers to open the file itself. Either call it as
EXPRESSparse(model,(FILE *)0,"filename");
or EXPRESSparse(model,filepointer,(char *)0);
EXPRESSparse takes a "model" argument that can be a new or old express
abstraction. This allows you to call EXPRESSparse repeatedly to read
additional schemas in to an old set.
To create a new express model, call EXPRESScreate().
To resolve an express model, call EXPRESSresolve(model).
======================================================================
The STRING abstraction has been removed. You should use the Standard
C library calls to deal with strings. I've left a couple macros in
place to aid in conversion, but these may go away in the future.
Rationale: The STRING abstraction allowed different underlying
representations for strings, but was incomplete to the point that
users had to assume that the standard C representation was used.
It was pointless to complete it, since the Standard C library is now
very rich in string support. The result would have just been
confusing.
======================================================================
A number of facilities are provided for referencing objects outside
the current file.
1) It is possible to logically insert other files during analysis by
use of an INCLUDE statement. INCLUDE statements were, at one time,
valid Express. However, they are not currently. It is best to think
of them as a preprocessing phase of the implementation that has
nothing to do with the language proper.
(With that in mind...) INCLUDE statements can appear outside a schema
or at the top-level of a schema. Included files are not restricted to
including schemas, but may include, for example, a set of entities, a
rule, etc. For example:
INCLUDE 'schema-file.exp';
2) Referencing a schema that is not defined in the file (or included
from another file) causes fedex to search for a file with the same
name as the schema with a ".exp" extension in the directories named by
the environment variable EXPRESS_PATH. For example, in the C-shell,
you could say:
setenv EXPRESS_PATH "~pdes/data/part42 ~pdes/data/part202"
In order to facilitate this, I recommend that all schema files have
symbolic links created to them by the names of any schemas within that
are likely to be externally referenced from them. Stable schemas may
have symbolic links placed in a directory of stable part files, while
unstable schemas should be referenced from a specific part directory.
For example, imagine that the directory for stable schemas is
~pdes/data/schemas/standard while, part 202 is still undergoing
evolution. In this case, the appropriate command might be:
setenv EXPRESS_PATH "~pdes/schemas/part42 \
~pdes/schemas/standard"
If not set, the default path of "." (the current directory) is used.
======================================================================
The old "warning" kludgery is gone. It has been replaced by several
routines in the ERROR package including
ERRORcreate_option
ERRORset_option
ERRORset_all_options
To associate an option string with a particular error, call
ERRORcreate_option.
ERRORcreate_option("subtypes",ERROR_missing_subtype);
To actually set or unset an option, it suffices to say:
ERRORset_option(sc_optarg,set);
where set is a true/false value. This is especially convenient with
getopt, since you can use the same code to set or unset an option just
by testing the option letter inside of the 'set' argument. I.e.
ERRORset_option(sc_optarg,c == 'w');
To print all the options out, say:
LISTdo(ERRORoptions, opt, Error_Option *)
fprintf(stderr,"%s\n",opt->name);
LISTod
======================================================================
Fedex has been changed to print errors immediately rather than
buffering them up and sorting them by line number. The underlying
function to toggle this is defined as follows:
ERRORbuffer_messages(boolean);
While the buffering code has been speeded up (it used to call two
extra processes, now it doesn't call any), I see little point to
sorting by line numbers. The order in which diagnostics are presented
to the user are the order in which problems should be resolved. I.e.,
a missing schema will be detected immediately, and will cause many
spurious errors.
======================================================================
The error routines have been beefed up in other ways as well,
especially for robustness. For example, if an internal or operating
system error occurs, a strong attempt is made to produce all previous
diagnostics, rather then just dumping core.
The main entry for reporting errors was changed from
ERRORreport_with_line to ERRORreport_with_symbol.
ERRORreport_with_line still exists for programs that don't know
anything about symbols (in which case, we guess at the information).
Rational: This was a necessary change in order to provide diagnostics
with filenames. The symbol abstraction itself also had to be
augmented with filenames.
======================================================================
The error messages are formatted a little differently so that the
default Emacs compile bindings can automatically read in and position
the appropriate Express file and display the error at the same time.
As an aside, Jim Wachholz has built an Express mode for Emacs.
Contact him for more info.
======================================================================
I have backed off on the original code's attempt at significant
information hiding. In particular, while some of the hiding worked,
some didn't. For example, users had to know whether information was
returned as a list or a dictionary. In fact, it is possible to hide
this as well - I don't know why Steve didn't bother, except that he
was tired.
For example, instead of a single LISTadd_last routine, there would have to
different LISTadd_last routines for every class. This would have improved
typechecking.
The new code is more efficient for a variety of reasons. The original
code paid a heavy price in efficiency for dynamic typechecking, and
using individuals function to access each data element in a structure.
The new code allows direct access. There is necessarily some dynamic
typechecking left in the system, but it quite small. The number of
switch statements is surprisingly small (less than two dozen).
The new code simulates the class hierarchy used by the old code in
spirit. In reality, the class hierarchy has been compressed from 5
levels to 2. The resulting code is much, much faster.
The key notions in the new system are:
a handful of base classes
dictionaries understand classes
Instead of objects being self-descriptive, context is used. The
dictionary is one such example. When you store an object, you
describe it to the dictionary. Upon later retrieval, you get the
object and the description back. When the object is not in the
dictionary, there is no descriptor. Your code takes over the job of
remembering what something is. Invariably, this very straightforward.
I.e., you might keep a list of entities, in which case you are
guaranteed all the elements on the list are entities.
A small number of operations can be performed on all classes. For
example, it is possible to get the printable description of a class by
saying:
OBJget_type(type)
All OBJ functions are implemented by single-table lookups.
Mnemonically-suggestive characters are used as indices into the OBJ
table.
======================================================================
Notes on fedex arguments:
b flag (buffering) - Now "off" by default. fedex reports the
most important error messages first. The idea of
messages appearing in the order of line numbers has
little value, especially in the context of multiple
input files.
r flag (no resolve) - Skip resolve pass.
p flag (print pass info) - This takes a string argument
object types to print out while being processed.
Valid object types are:
p procedure
r rule
f function
e entity
t type
s schema or file
# pass #
E everything (all of the above)
For example, the following prints out entity and rule
names as they are being processed:
fedex -p er
======================================================================
While some ALGxxx functions (macros, really) still exist, some have
been replaced by ones specific to the type of algorithm. For example,
ALGget_parameters should be changed to FUNCget_parameters,
RULEget_parameters, or PROCget_parameters.
======================================================================
The whole idea of passes has been revamped. The old pass2 (now called
resolve) is no longer monolithic but is broken into several more
passes. The old pass2 did a depth-first resolution over the object
tree. Besides requiring a very deep stack, it forced on-demand
resolution which was extremely painful - everything had to constantly
check whether things had been resolved or whether there was infinite
recursion (due to USE/REF).
It was possible to restructure this into several breadth-first passes
over the object tree. It does not appear as though a heavy penalty is
paid for the additional passes. Here is an outline of passes.
RENAME-SCHEMAS
For each schema
For each rename clause
Connect the schema symbol to the real schema.
At this point, some renames and schemas are marked 'failed'.
Interestingly, rather than reading the dictionary to get
schema names, we use a FIFO, since schemas names can be
dynamically introduced while resolved USE/REFs when reading
other files.
RENAME-OBJECTS
For each schema
For each rename clause
Connect the final object to the rename
At this point, renames are marked 'rename_resolved'
and some are marked failed.
SUBSUPERS
foreach schema
foreach entity, type (including within functions, etc)
resolve sub/supertypes in types
resolve local types
RESOLVE-TYPES resolve type defs and entity attribute defs
foreach schema
resolve type definitions
foreach entity, alg
resolve attribute types (including LOCALs)
resolve proc/func parameter/return types
At this point, the only types not resolved are the control variables
in query types and repeats. In order to resolve them, you have to
do expression resolution. Fortunately, both can be done in an order
so that no forward references are required.
RESOLVE-INHERITANCE-COUNT (can be combined with RESOLVE-TYPES above)
requires: superclasses to be resolved to entities
foreach scope
foreach entity (e)
X: foreach superclass (sc)
if entity-inheritance(sc) is not calculated
X(sc)
e->inheritance += sc->inheritance
foreach scope, recurse
EXPRESSIONS-&-STATEMENTS
foreach schema
foreach scope (entity, alg)
resolve expression in query, repeat and therefore resolve
type of control in query, repeat
resolve derived attributes
resolve attribute initializers
do only entity attributes have initializers???
resolve statements (recurse)
foreach type
resolve where clause
======================================================================
Original code did not check for redefining keywords. Fixed.
======================================================================
USE and REFERENCE are handled by having separate lists and
dictionaries to remember schemas and individual objects that are USEd
and REFd. 'rename' structures are used to point to the remote object.
(This avoids the need for copying dictionaries, which enabled large
time/space savings.)
Once the rename has been processed, the rename points directly to the
final object, even if several schemas have USEd one another.
(The old USE/REF implementation did not detect recursive refs and
failed ungracefully in the presence of certain schema errors.
Dictionaries entries could not be removed while another part of the
code was traversing the dictionary.)
======================================================================
Enumerations are expressions which are entered into two scopes. One
scope is that of their own type definition. To adhere to the special
visibility rule placed on enumerations, they are also entered into the
immediately enclosing scope. In order to allow multiple enumeration
tags with the same name (but from different enumeration scopes), the
dictionary recognizes such overloads and marks such definitions as
"ambiguous" so that later retrievals fail with an appropriate message,
while other retrievals succeed.
Since the dictionary already knows object types, and this code is only
executed during conflicts, it is not expensive to have the dictionary
do this. However, it did require another dictionary routine
specifically for the purpose of adding enumerations to the enum-scope
to handle enumerations with the same name in the same type scope as a
real error.
======================================================================
Formal parameter tags are recorded but not analyzed, since it is
possible to do all type resolution without it. Oddly, tags are not
necessary, I suppose they could be useful for a run-time evaluator.
======================================================================
Implicit loop controls and ALIAS are handled by associating with them
a "tiny" scope of one element.
The function SCOPEget_nearest_enclosing_entity had to be invented to
extract the true referent of a SELF when you're inside of a tiny
scope.
======================================================================
Local variables are handled the same way at the schema level that they
are at the entity level or any other scope. I only mention this
because the the previous implementation did not support locals.
======================================================================
Classes of object types can be represented as bit strings (see
express_basic.h). This enables efficient handling of things like the
-p flag. More importantly, it can be helpful to give search functions
hints, such as when searching for a type (which normally includes
entities as well). For example, this provides a way of figuring out
the type when given the (legal) attribute declaration of:
A1: A1;
It is not sufficient to merely start searching at a superscope since
types can be defined within the current scopes. The important thing
is to ignore attributes. This and the business of allowing duplicate
enumerations are exceptions to the rule of only allowing one
definition with the same name in a single scope.
======================================================================
CONSTANTs are represented by attributes but with the flag.constant bit
on. Unlike normal attributes, these can be found in non-entity scopes.
======================================================================
Always code as if the person who will maintain your code is a
sadistic, psychopathic maniac who knows where you live.
- David Olsen
Writing documentation actually improves code. The reason is
that it is usually easier to clean up a crock than have to
explain it. - G. Steele.