9
lorentz
2y

[Rust]
I have a bunch of computational steps in a Rust program, all very expensive. They all depend on each other, forming a cycle-free and rather small graph of dependencies which is not a tree. The results of each of them for a given input are likely used tens of times by the others, so I would like to cache the subresults dynamically.

How would I go about doing this, considering that caching (rightfully) requires mutable access to the cache and multiple operations often refer to the same subresult?

I can't ask SO because they'd just tell me to use another language or recalculate everything every time, fully convinced that difficult questions can only emerge from design mistakes.

Comments
  • 0
    Disclaimer: never used Rust. not going to anytime soon.
    If thread safty is not an issue bc no parallel calcs, then put everything In a huge ass mutli kloc function. then hold evertyhing there.
  • 0
    Is that a Rust-language limitation? I know you cannot hold more than one mutable borrow at the same time. But there's no way to write to and read from a global cache?
  • 1
    You can create a cache object that owns Option of all the results.

    Then you step through the graph performing the computations in topological order (this ensures that all required results are in cache).

    Each computation borrows immutable references to the cached results and the cache object is the only one allowed mutable references to the fields.

    See https://pastebin.com/YGeS4LyW
  • 0
    Can you not just cache on a node basis? If each node has an in and output it can just remember the output for a given input right?
  • 1
    @rantydev I see there's a misunderstanding. The kinds of values a given result depends on are simple; the parameters however are complex and overlaps between the subresults of computations are very common but spontaneous. Here's the file, as you can see, load_project fails due to ownership issues. You can safely ignore the dynamic IO, I can devise some special handling for that.

    https://github.com/lbfalvy/orchid/...
  • 1
    @lbfalvy I see. The problem is that you cache twice: when you build `preparsed` you're putting `loaded` in the cached closure; but `loaded.get` borrows a mutable reference to `loaded` whichnprevents the further mutable borrows required in `exports`.....

    Hmmm very interesting dilemma. I'll need some time to think about it.
  • 1
    @rantydev Thanks, I really appreciate your effort.
  • 2
    @lbfalvy very interesting problem indeed. I learned about this problem with the functional-style Cache pattern :) ty.

    Assuming that you don't want to redesign the logic to have one unique cache to cache all intermediate computations, I think that the best option is to use interior mutability and `RefCell`s to be able to get around the borrow checker and mutably borrow at runtime. It shouldn't break thread safety; but I'm not a super expert on threaded Rust :)

    https://pastebin.com/1fJ2LjND
  • 1
    @rantydev I'm new to Rust and this is a learning experience so if there's a better way to go about all of this I'm happy to restructure it, but the RefCell solution looks perfect to my untrained eyes.
  • 0
    I hit another, very similar wall through ownership which I decided to bypass with a lot of cloning. I realized that with the current setup the cache will want to take ownership of the arguments, so anything you pass by reference needs to live at least as long as the cache. I'm not sure whether Rust is familiar with the notion of a weak map but references seem a bit too low level for it. If the number of allocations becomes a serious problem I could use a CoW pointer but I don't think it's "nicer" or solves any real or future problem besides allocations so I don't see a reason to do it in advance.
  • 0
    The RefCell will send me to dependency hell when I try to figure out how to order my calls such that the caller doesn't borrow an indirect callee. On the upside, if I do fuck it up it'll fail every time because the calling order is static, so it's easy to debug.

    I think I found the messy side of Rust; it's not really possible to represent an operation that technically mutates something but the interface ensures that the operation does not have side effects
  • 0
    I think I managed to come up with a really good design for the cache, it's in the repo. I decided to use a RefCell for the hashmap so that Cache::find takes an immutable self reference. I know this is a bit misleading but find doesn't alter any observable property of the Cache so it's still in line with the spirit of Rust's immutability guarantees. I'm also using a crate called mappable-rc for mappable refcounts, which uses entirely too much unsafe code but it seems solid enough.

    Now, how does one publish a crate?
Add Comment