One personal preference is that a scripting syntax be somewhat ‘C-like’.. which might recommend a straight C embedded implementation although I think that makes some compromises.
counter :: int
for counter in <1, 10-counter> (
print(counter)
print(" ")
)
Using backfor to count backwards is an odd choice. Why not overload for? backfor counter in <1, 9> print(counter, " ")
This is confusing to me. Maybe I'm misunderstanding the design principles, but the syntax seems unintuitive.The example I gave was strange and I’ll have to change it. Not sure what I was trying to show there. The basic syntax is just:
for counter in <1, 5> print(counter)
backfor counter in <1, 5> print(counter)
It’s not overloaded because ‘for’ is basically a macro, expanding to ‘iterate, increment counter, break on counter > 5’ where ‘>’ is hard-coded. If ‘for’ was a fundamental operator then yes, there would be a step option and it would be factored into the exit condition.
You’ve got me thinking, there’s probably a way to overload it even as a macro.. hmmm…
Are you doing this at runtime (reference counting or similar), or have you found a way to make the static analysis tractable by restricting what aliasing patterns are allowed?
The 250kB size is impressive for a language with inheritance and N-dimensional arrays. For comparison, Lua's VM is around 200-300kB and doesn't include some of those features. What did you have to leave out to hit that size? I assume no JIT, but what about things like regex, IO libraries, etc?
Also - calling back into C functions from the script is a key feature for embeddability. How do you handle type marshalling between the script's type system and C's? Do you expose a C API where I register callbacks with type signatures, or is there reflection/dynamic typing on the boundary?
Aliases are strongly-typed which helps avoid some issues. Memory mods come with the territory —- if ‘a’ and ‘b’ point to the same array and ‘a’ resizes that array, then the array behind ‘b’ gets resized too. The one tricky situation is when ‘a’ and ‘b’ each reference range of elements, not the whole array, because a resize of ‘a’ would force a resize of the width of ‘b’. Resizing in this case is usually not allowed.
Garbage collection is indeed done (poorly) by reference counting, and also (very well) by a tracing function that Cicada’s command line script runs after every command.
You’re exactly right, the library is lean because I figure it’s easy to add a C function interface for any capability you want. There’s a bit of personal bias as to what I did include - for example all the basic calculator functions are in, right down to atan(), but no regex. Basic IO (save, load, input, print) is included.
Type marshaling — the Cicada int/float types are defined by cicada.h and can be changed! You just have to use the same types in your C code.
When you run Cicada you pass a list of C functions paired with their Cicada names: { “myCfunction”, &myCfunction }. Then, in Cicada, $myCfunction() runs the callback.
Thanks for the questions! This is exactly the sort of feedback that helps me learn more about the landscape..
The aliasing semantics you describe (resizes propagating through aliases) is an interesting choice. It's closer to how references work in languages like Python than to the "borrow checker" approach Rust takes. Probably more intuitive for users coming from dynamic languages, even if it means some operations need runtime checks.
The hybrid GC approach (reference counting + periodic tracing) is pragmatic. Reference counting handles the common case cheaply, and the tracing pass catches cycles. That's similar to how CPython handles it.
The C registration API sounds clean - explicit pairing of names to function pointers is about as simple as it gets. Do you handle varargs on the Cicada side, or does each registered function have a fixed arity that the interpreter enforces?
Scripted functions have no set arity, and the same applies to callback C functions. Scripted functions collect their arguments inside an ‘args’ variable. Likewise, each C function has a single ‘argsType’ argument which collects the argument pointers & type info, and there are macros to help unpack them but if you want to do the unpacking manually then the function can be called variadically:
ccInt myCfunction(argsType args)
{ for (int a = 0; a < args.num; a++) printf(“%p\n”, args.p[a]); return 0; }
So all functions are automatically variadic.
It’s good to know that these GC/etc. solutions are even used by the big languages..
Beyond NNs, my use case to embed fast C calculations into the language to make scientific programming easier. But the inspiration was less about the use case and more about certain programming innovations which I’m sure are elsewhere but I’m not sure where — like aliases, callable function arguments, generalized inheritance, etc.
That’s a great list — most of those languages I’ve honestly never heard of..
How does it deal with use after free? How does it deal with data races?
Memory safety can't be solved by just eliminating pointer arithmetic, there's more stuff needed to achieve it
There’s actually no ‘free’, but in the (member -> variable data) ontology of Cicada there are indeed a few ways memory can become disused: 1) members can be removed; 2) members can be re-aliased; 3) arrays or lists can be resized. In those conditions the automated/manual collection routines will remove the disused memory, and in no case is there any dangling ‘pointer’ (member or alias) pointing to unallocated memory. Does this answer your question?
I agree that my earlier statement wasn’t quite a complete explanation.
Of course, since it interfaces with C, it’s easy to overwrite memory in the callback functions.
The name came when I was living in Seattle and missed the sounds of east coast summer..