51 pointsby briancr16 hours ago9 comments
  • smartmic14 hours ago
    Cool, I like these kinds of projects. When it comes to embedding a scripting language in C, there are already some excellent options: Notable ones are Janet, Guile, and Lua. Tcl is also worth considering. My personal favorite is still Janet[0]. Others?

    [0]: https://janet-lang.org/

    • forgotpwd1614 hours ago
      Io is nice (Smalltalk/Self-like). A mostly comprehensive list: https://dbohdan.github.io/embedded-scripting-languages/
      • publicdebates9 hours ago
        That list (or any similar list) would be so helpful if it had a health column, something that takes into account number of contributors, time since last commit, number of forks, number of commits, etc. So many projects are effectively dead but it's not obvious at first sight, and it takes 2 or 3 whole minutes to figure out. That seems short but it adds up when evaluating a project, causing people to just go to a well known solution like Lua (and why not? Lua is just fine; in fact it's great).
      • briancr12 hours ago
        Should have replied directly —- thanks! That’s a great list..
    • dualogy6 hours ago
      AngelScript. Matured & maintained since 2003, is fully typed and with C syntax. https://www.angelcode.com/angelscript/
      • briancr4 hours ago
        Yes very C-like.. One immediate difference is that in these C-like scripting languages there’s a split between definitions and executable commands. In Cicada there are only executable commands: definitions are done using a define operator. (That’s because everything is on the heap; Cicada functions don’t have access to the stack). I personally think the latter method makes more sense for command-line interactivity, but that’s a matter of taste.
    • zem5 hours ago
      • briancr5 hours ago
        Yes I like this one. It’s similar and even more C-like, in that it discriminates between classes, class instances, functions, methods vs constructors, etc. (Cicada does not).
    • briancr12 hours ago
      Thanks! I’m unfamiliar with Janet but I’ve looked into the others you listed.

      One personal preference is that a scripting syntax be somewhat ‘C-like’.. which might recommend a straight C embedded implementation although I think that makes some compromises.

  • publicdebates11 hours ago
    The for loop is odd. Why is the word counter in there twice?

        counter :: int
    
        for counter in <1, 10-counter> (
           print(counter)
           print(" ")
        )
    
    Using backfor to count backwards is an odd choice. Why not overload for?

        backfor counter in <1, 9> print(counter, " ")
    
    This is confusing to me. Maybe I'm misunderstanding the design principles, but the syntax seems unintuitive.
    • briancr10 hours ago
      Yeah this is why the syntax is customizable.. maybe it’s not optimal.

      The example I gave was strange and I’ll have to change it. Not sure what I was trying to show there. The basic syntax is just:

      for counter in <1, 5> print(counter)

      backfor counter in <1, 5> print(counter)

      It’s not overloaded because ‘for’ is basically a macro, expanding to ‘iterate, increment counter, break on counter > 5’ where ‘>’ is hard-coded. If ‘for’ was a fundamental operator then yes, there would be a step option and it would be factored into the exit condition.

      You’ve got me thinking, there’s probably a way to overload it even as a macro.. hmmm…

  • newzino12 hours ago
    The "aliases not pointers" approach for memory safety is interesting. Curious how you handle the performance implications - traditional aliasing analysis in compilers is expensive because determining whether two aliases point to the same memory is hard.

    Are you doing this at runtime (reference counting or similar), or have you found a way to make the static analysis tractable by restricting what aliasing patterns are allowed?

    The 250kB size is impressive for a language with inheritance and N-dimensional arrays. For comparison, Lua's VM is around 200-300kB and doesn't include some of those features. What did you have to leave out to hit that size? I assume no JIT, but what about things like regex, IO libraries, etc?

    Also - calling back into C functions from the script is a key feature for embeddability. How do you handle type marshalling between the script's type system and C's? Do you expose a C API where I register callbacks with type signatures, or is there reflection/dynamic typing on the boundary?

    • briancr11 hours ago
      Good questions! The short answer to the first is that the language is interpreted, not compiled, so optimizations are moot.

      Aliases are strongly-typed which helps avoid some issues. Memory mods come with the territory —- if ‘a’ and ‘b’ point to the same array and ‘a’ resizes that array, then the array behind ‘b’ gets resized too. The one tricky situation is when ‘a’ and ‘b’ each reference range of elements, not the whole array, because a resize of ‘a’ would force a resize of the width of ‘b’. Resizing in this case is usually not allowed.

      Garbage collection is indeed done (poorly) by reference counting, and also (very well) by a tracing function that Cicada’s command line script runs after every command.

      You’re exactly right, the library is lean because I figure it’s easy to add a C function interface for any capability you want. There’s a bit of personal bias as to what I did include - for example all the basic calculator functions are in, right down to atan(), but no regex. Basic IO (save, load, input, print) is included.

      Type marshaling — the Cicada int/float types are defined by cicada.h and can be changed! You just have to use the same types in your C code.

      When you run Cicada you pass a list of C functions paired with their Cicada names: { “myCfunction”, &myCfunction }. Then, in Cicada, $myCfunction() runs the callback.

      Thanks for the questions! This is exactly the sort of feedback that helps me learn more about the landscape..

      • newzino11 hours ago
        Thanks for the detailed response. The interpreted approach makes sense for the use case - when you're embedding a scripting layer, you usually want simplicity and portability over raw speed anyway.

        The aliasing semantics you describe (resizes propagating through aliases) is an interesting choice. It's closer to how references work in languages like Python than to the "borrow checker" approach Rust takes. Probably more intuitive for users coming from dynamic languages, even if it means some operations need runtime checks.

        The hybrid GC approach (reference counting + periodic tracing) is pragmatic. Reference counting handles the common case cheaply, and the tracing pass catches cycles. That's similar to how CPython handles it.

        The C registration API sounds clean - explicit pairing of names to function pointers is about as simple as it gets. Do you handle varargs on the Cicada side, or does each registered function have a fixed arity that the interpreter enforces?

        • briancr9 hours ago
          Yes there are lots of runtime checks.. unfortunately, but I always fork the time-consuming calculations into C anyway so those checks don’t really affect overall performance much.

          Scripted functions have no set arity, and the same applies to callback C functions. Scripted functions collect their arguments inside an ‘args’ variable. Likewise, each C function has a single ‘argsType’ argument which collects the argument pointers & type info, and there are macros to help unpack them but if you want to do the unpacking manually then the function can be called variadically:

          ccInt myCfunction(argsType args)

          { for (int a = 0; a < args.num; a++) printf(“%p\n”, args.p[a]); return 0; }

          So all functions are automatically variadic.

          It’s good to know that these GC/etc. solutions are even used by the big languages..

  • codr79 hours ago
    Nice, the more the merrier!

    I've been working on one for Kotlin lately:

    https://gitlab.com/codr7/shik

    • briancr7 hours ago
      Very cool! I’ve never used Kotlin..
  • briancr13 hours ago
    Thanks for the references! Writing a language was almost an accident — I worked on a neural networks tool with a scripted interface back around 2000, before I’d ever heard of some of these other languages.. and I’ve been using/updating it ever since.

    Beyond NNs, my use case to embed fast C calculations into the language to make scientific programming easier. But the inspiration was less about the use case and more about certain programming innovations which I’m sure are elsewhere but I’m not sure where — like aliases, callable function arguments, generalized inheritance, etc.

    That’s a great list — most of those languages I’ve honestly never heard of..

  • nextaccountic9 hours ago
    > Uses aliases not pointers, so it's memory-safe

    How does it deal with use after free? How does it deal with data races?

    Memory safety can't be solved by just eliminating pointer arithmetic, there's more stuff needed to achieve it

    • briancr8 hours ago
      There’s no multithreading so race conditions don’t apply. That simplifies things quite a bit.

      There’s actually no ‘free’, but in the (member -> variable data) ontology of Cicada there are indeed a few ways memory can become disused: 1) members can be removed; 2) members can be re-aliased; 3) arrays or lists can be resized. In those conditions the automated/manual collection routines will remove the disused memory, and in no case is there any dangling ‘pointer’ (member or alias) pointing to unallocated memory. Does this answer your question?

      I agree that my earlier statement wasn’t quite a complete explanation.

      Of course, since it interfaces with C, it’s easy to overwrite memory in the callback functions.

  • tayistay12 hours ago
    Can I call into the interpreter from multiple threads or does it use global state?
    • briancr11 hours ago
      There’s no multithreading capability built into Cicada. So a given instance of the interpreter only has a single concurrent state, and all C callbacks share memory with that global state. Multithreading would require a C-based thread manager.
  • eps13 hours ago
    What's the use case? Clearly, you made it with some specific use in mind, at least initially. What was it?
    • briancr13 hours ago
      To be more specific (see my general comment), I’ve used the language in two open-source projects: 1) a chromosome conformation reconstruction tool, and 2) a fast neural network generator (back end). Re Project 2: I’m also planning to embed the language into results webpages served from the NN generator website.
  • languagehacker12 hours ago
    I've lost count of projects called Cicada
    • publicdebates11 hours ago
      A new one seems to pop up every year, and some every 13 or 17 years.
      • briancr7 hours ago
        This one’s Brood VI!
    • briancr11 hours ago
      I know, I was dismayed to find out that there’s even another scripting language called Cicada.

      The name came when I was living in Seattle and missed the sounds of east coast summer..