-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathExpr.hs
255 lines (226 loc) · 15.7 KB
/
Expr.hs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
module Expr where
import Data.Maybe
import qualified Data.Map as M
import qualified Data.Set as S
import qualified Data.List as L
type Symbol = [Int]
--type Name = Symbol
type Name = String -- temporary until symbols have a string mapping
garbageName = ""
type Binding val = (Name, val)
type Env val = M.Map Name val
--data Env val = Env [[Binding val]] -- constraint satisfiers are bindings
envEmpty = M.empty
envInsert env key val = M.insert key val env
envInserts = L.foldl' $ \env' (key, val) -> envInsert env' key val
envLookup env key dfault = fromMaybe dfault $ M.lookup key env
envLookupFail env key msg = envLookup env key (error msg)
data Typing = Typing Expr EEnv
{-data Variance = Invariant | Covariant | Contravariant-}
{-data Constr = Constr { constrTag :: Symbol,-}
{---constrVar :: [Variance],-}
{-constrTy :: Typing }-}
{-
type ascription vs. term "ascription" (need a better word (could also use in place of type ascription): specification? description?)
SpecifyType (:, <:), SpecifyTerm (:>)
in this dependent type, the constructor (->) is passed A, and some type determined by B(x), where x is a term of type A
x:A -> B(x)
using the same notation in this term this seems weird, because it looks like we pass x, but assert its type is A, rather than passing A and naming a sub-term of term A
Constr x:A B(x)
(:) used in two ways; instead, use another operator to note the term name in a dependent type? maybe <: vs. :> is appropriate?
-}
-- these can be in terms of dependent constructors
{-data Pi = Pi Name Expr Expr-}
{-data Sigma = Sigma Name Expr Expr-}
-- data Sigma = Sigma [Binding Expr] (Maybe Expr) -- Sigma x:A B(x)
-- dependent: Tuple; how to access elements? project? destruct?
-- non-dependent: Record, Variant (sum of tuples)
-- optimizations: Contiguous Array, Sparse Map for field lookup by name
-- non-deterministic computing, logic vars, term inference, etc.
-- additive (normal) sets: {} is bottom, {x} is x, {x, y} is x or y
-- subtractive (inverse) sets: {} is top, {x} is anything but x
-- dispatch: path or instruction sequences to describe how to unpackage a (possibly dependent) product while accessing the case/records by label
-- are conditions needed? at some point you only want to conditionally inspect...
-- inspect (dispatch on (next N? conditionally?) field), skip (ignore the first N fields? or skip to field N?), apply/use (pass the next N fields as args to the target function/destructor)
-- alternative dispatch (much simpler): products (tagged or otherwise) are splittable types; tagged products are just products with an extra symbol at the head
-- split at N: (a, b, ...N, y, z) becomes left: (a, b, ...N) and right: (y, z), where a singleton tuple (x) is the same as x
-- split takes an index, a 2-arg (left, right) receiver function, and a product to split
--
-- story for algebraic types
-- an algebraic definition defines a dependent product of the form: [(ThisTagNamespace :> tag) Payload(tag)]
-- thus, the memory representation of algebraic data looks like:
-- addr-A: [tag][addr-B] (2-element contiguous memory)
-- addr-B: Payload(tag)
-- tags and ptr to payload are separate from payload itself, and could likely be held flat in registers rather than heap
-- tag could also (sometimes? often?) be encoded directly in ptr's unused bits
-- this decoupling seems to make it easier to erase unnecessary information from runtime representation without needing multiple representations (tagged vs. tagless) for each data type
-- it also allows for a nice implementation of splitting, operating via simple pointer arithmetic, giving an efficient story for random access indexing and slices
--
-- maybe slices are a better than splits? actually, if you allow singleton tuples, add a deref/pop/access/select/index operation to unpack the first element, and add a join/concat operation, you can explain slices, and functional slice updates (that may even change size); join operator could also allow for (inefficient?) construction of large arrays
--
-- maybe a left/right pair of operators is nicer than split? maybe split should return a pair rather than passing the left/right results to a receiver function?
--
-- records: functional update, projection/retraction(specify what remains)/deletion(specify what to remove)/elide/omit, extension/join, field access/retrieve/select/dispatch
-- project, and omit sound good for the shrinking case; join is probably fine for extension via asymmetrically combining two records
--
-- support types that are both tuples and records? record-like behavior for chosen tuples exposed via typeclass?
--
-- next questions:
-- scripts, modules, procedures...
-- module linking and completion/record-reification as separate steps?
-- modules whose available bindings only include external ones (via wrapping in a procedure and passing it externally evaluated field values?)
-- modules whose available bindings only include internal/linked ones
-- environments and variable referencing
-- modules to explain letrec
-- unlike previously thought, defining and linking modules should probably be effect-less
--
-- two recursively linked modules produce a new module with two environment threads to extend in the future; values from each original module close over different environments, and the recursively referenced names that are linked end up referenced in both environments; optimization could potentially rewrite the mutually recursive code to obtain a single environment thread?
-- records and modules both respond to the same linking interfaces, but only completed modules have fields to access like records
--
-- barriers to name pollution in newly-defined modules; onion-like layers; expand-in-env, import/require/parameterize should be innermost?
-- this-env (expands? seems tough) produces the current environment as a record
-- expand-in-env (for loading modules from file, see below); linking is not enough to take care of base language injection? may not be able to even define anything in a module
-- parameter list to shield from names in expansion env
-- expand-in-env won't work for inlined inner modules directly, since they will have been expanded before expand-in-env has an env to use
-- special module expanders should be defined as syntax (in terms of expand-in-env), available when expanding the outer module, to ensure the special expander is used for the inner module
-- for instance, something with a name like expand-in-kernel-env (probably the most common choice when not expanding in the current env)
-- kernel module containing the primitives everything else is defined with; bootstrapping entails running a script with this module as the environment, and building everything else
--
-- how the hell does all this fit nicely into exprs/terms? seems like you need non-expr values
-- ffi and other opaque functions must at least list their types, including effects, allowing for abstract interpretation; and they can be represented as nullary constructor terms
--
-- maybe modules should specify let/letrec (possibly several, possibly alternating) sections for their definitions; this explains records as modules
--
-- another explanation of plain records: let, with a body extracting this-env, projected to only show the let-binders as fields; all record ops are now module ops
--
-- types and security: how to avoid sealed type constituent information from leaking over the wire?
-- for instance, refined types such that you know a function accepts or returns only a subset of the full set of a sealed type's constructors
-- maybe it's enough to prevent refined and flow types from being sent into unsafe hands (over the wire, to a separate vat, whatever)
--
-- if mutually-recursive linking of procedures can be explained, then have modules be procedures, add letrec directly, and records become simple? but procedures and records have the same linking operation? hmm...
-- alternative story for recursive module linking: parametric modules are procedures, built in letrec builds a (possibly cyclic wrt indirect references) data dependency graph, pre-allocates destination space for computed values, and provides addresses to the uninitialized values
-- basically, letrec provides non-read-access of its result value to its bindings before computation begins so the binding definitions can reference the result as long as they don't read the result's data directly; and this is really only good for referencing the addresses of the entire result, or any of its components (assuming the relevant type info is available)
-- or, just route mutually-linked module contents to each other within a letrec as delayed thunks; hopefully most of these thunks can later be optimized away
type ESet = S.Set Expr
type EEnv = Env Expr
type Proc = ([Name], Expr) -- params (some implicit?); body expression to evaluate
type Mod = ([Name], [Binding Expr], DepGraph) -- params: handle positional, keyword, and implicit; list of definitions; definition deps
type DepGraph = [(Name, [Name], [Name])] -- a, bs, cs; a is directly dependent on bs, and indirectly dependent on cs
-- DepGraph is needed for module type, but what about module value? maybe simply 'undefined' is fine for a module value
--type CodesA annot = [(Code, annot)]
--type Codes = CodesA ()
type Codes = [Code]
type CProc = ([Name], Codes, Name) -- params, body, return cont name
data Value = ValProc CProc EEnv
| ValModule Mod EEnv
| ValRecord EEnv
-- todo: an efficient array-like tuple?
-- todo: efficient sets
| ValSetPositive ESet
| ValSetNegative ESet
deriving (Show)
-- todo: use a finally-tagless approach
-- continuation env, value env, local value stack, local continuation, return continuation label, annotation
--data ExecState0 cann sann = ExecState (Env (ExecState0 cann sann)) EEnv [Expr] (CodesA cann) Name sann
data ExecState0 annot = ExecState { stateConts :: Env (ExecState0 annot),
stateVals :: EEnv, stateLocals :: [Expr],
stateCodes :: Codes, stateContLab :: Name,
stateAnnot :: annot }
deriving (Show)
type ExecState = ExecState0 ()
-- todo: due to CProc and CModule, this will need to be recursively annotated
data Code = Halt
| Return
| CVar Name
| CLit Value
| CApply Int
| CAbstract CProc
| CTuple Int
deriving (Show)
initExecState codes = ExecState envEmpty envEmpty [] codes garbageName
procToCProc (names, body) contLabel = (names, exprToCodes body ++ [Return], contLabel)
argsToCodes args = concat [exprToCodes arg | arg <- args]
exprToCodes (ExprA expr _) = case expr of
Var name -> [CVar name]
Literal val -> [CLit val]
Apply proc args -> cargs ++ cproc ++ [CApply $ length args]
where cproc = exprToCodes proc
cargs = argsToCodes args
Abstract proc -> [CAbstract $ procToCProc proc "todo"]
Tuple args -> argsToCodes args ++ [CTuple $ length args]
cstep es@(ExecState conts vals locals (code : codes) label annot) = case code of
Halt -> es -- stay at the same state intentionally
Return -> push cont result -- todo: got data-lens?
where [result] = locals -- assert single value in locals stack
cont = envLookupFail conts label "Return: missing label" -- todo: really need string formatting
[] = codes -- assert
CVar name -> push next val
where val = envLookupFail vals name "CVar: missing name"
CLit val -> push next $ ExprA (Literal val) () -- todo: expr annotations need borrowing from Code
CApply nargs -> let ExprA (Literal (ValProc (pnames, pcodes, plabel) penv)) _ = head locals
surplus = nargs - length pnames
nargs' = min nargs $ length pnames
(args, rest) = splitAt nargs' $ tail locals
pnames' = drop nargs' pnames
penv' = envInserts penv $ zip pnames args
cproc' = ExprA (Literal (ValProc (pnames', pcodes, plabel) penv')) ()
codes' = if surplus > 0 then CApply surplus : codes else codes
cont = case codes' of -- can also tail call with [Halt] ... maybe substitute empty code list for Halt and Return; Halt as a cont label
[Return] -> envLookupFail conts label "Tail Call: missing label"
_ -> next { stateLocals = rest, stateCodes = codes' }
pconts = envInsert conts plabel cont in
if surplus < 0 then push next cproc'
else ExecState pconts penv' [] pcodes plabel annot -- todo: annotation
CAbstract cproc -> push next $ ExprA (Literal $ ValProc cproc vals) ()
{-CTuple nargs ->-}
where next = es { stateCodes = codes }
push state val = state { stateLocals = val : stateLocals state }
cexec = iterate cstep -- todo: find the halting state
ceval codes = cexec $ initExecState codes ()
eval expr = ceval $ exprToCodes expr ++ [Halt]
mkExpr expr = ExprA expr ()
idcomb = mkExpr $ Abstract (["id"], mkExpr $ Var "id")
testid = mkExpr $ Apply idcomb [idcomb]
ucomb = mkExpr $ Abstract (["u"], mkExpr $ Apply varu [varu])
where varu = mkExpr $ Var "u"
ycomb = mkExpr $ Abstract (["f"], mkExpr $ Apply ucomb [mkExpr $ Abstract (["h"], mkExpr $ Apply (mkExpr $ Var "f") [mkExpr $ Apply ucomb [mkExpr $ Var "h"]])])
testUid = mkExpr $ Apply ucomb [idcomb]
testUU = mkExpr $ Apply ucomb [ucomb]
--testY =
test = eval testUid !! 400
type Expr = ExprA () -- todo: temporary to avoid having to parameterize everything while still sketching things out
data ExprA annot = ExprA (Expr0 (ExprA annot)) annot -- annotated expressions
deriving (Show)
data Expr0 expr = Var Name
| Literal Value -- literally reference a more efficient value representation; have this even though pure values should all be representable syntactically
| Apply expr [expr]
-- | ApplyImplicit expr [expr]
| Abstract Proc
| Record [(expr, expr)] -- module constructor that supports non-symbol field names (as type-level values?); need for this might just be temporary
| Module Mod
| ModuleAccess expr expr expr -- 'default' expr can be 'undefined' to induce type error for accessing missing fields
| ModuleProject expr expr -- module; list of names; new module keeping only the listed names
| ModuleOmit expr expr -- module; list of names; new module removing the listed names
| ModuleImport [expr] expr -- add the fields of the listed modules to the target module, creating a new module
| ModuleLink [expr] -- asymmetric mutually-recursive linking of multiple modules
| Tuple [expr] -- dependent constructor, ie: Constr x:A y:B | F(x, y)
| TupleAccess expr expr
| TupleSplit expr expr -- see rambling above
| TupleConcat [expr]
| SetPositive [expr] -- empty is bottom/void/undefined
| SetNegative [expr] -- empty is top/any
| TypeAscribe expr expr -- term; type
| TypeStructural expr -- Denote a structural type: the type of a tuple is a tuple of the element types; without this mark, the type of such a type would not clearly be 'Set'
| TypeSet -- the type of types; temporary: throw this away after adding nominal types
{-| TypeSeal expr expr -- Hide the structural type of value behind a nominal type; sealer and value to seal; resulting type depends on the sealer-}
{-| TypeUnseal expr expr-}
-- | build variant (as an algebra?)
deriving (Show)
-- definitions:
-- bindings
-- variants (finite and otherwise)
-- classes
-- instances
-- specifying defaults/implicits
-- modules
-- linking