6
Wisecrack
40d

I had the idea that part of the problem of NN and ML research is we all use the same standard loss and nonlinear functions. In theory most NN architectures are universal aproximators. But theres a big gap between symbolic and numeric computation.

But some of our bigger leaps in improvement weren't just from new architectures, but entire new approaches to how data is transformed, and how we calculate loss, for example KL divergence.

And it occured to me all we really need is training/test/validation data and with the right approach we can let the system discover the architecture (been done before), but also the nonlinear and loss functions itself, and see what pops out the other side as a result.

If a network can instrument its own code as it were, maybe it'd find new and useful nonlinear functions and losses. Networks wouldn't just specificy a conv layer here, or a maxpool there, but derive implementations of these all on their own.

More importantly with a little pruning, we could even use successful examples for bootstrapping smaller more efficient algorithms, all within the graph itself, and use genetic algorithms to mix and match nodes at training time to discover what works or doesn't, or do training, testing, and validation in batches, to anneal a network in the correct direction.

By generating variations of successful nodes and graphs, and using substitution, we can use comparison to minimize error (for some measure of error over accuracy and precision), and select the best graph variations, without strictly having to do much point mutation within any given node, minimizing deleterious effects, sort of like how gene expression leads to unexpected but fitness-improving results for an entire organism, while point-mutations typically cause disease.

It might seem like this wouldn't work out the gate, just on the basis of intuition, but I think the benefit of working through node substitutions or entire subgraph substitution, is that we can check test/validation loss before training is even complete.

If we train a network to specify a known loss, we can even have that evaluate the networks themselves, and run variations on our network loss node to find better losses during training time, and at some point let nodes refer to these same loss calculation graphs, within themselves, switching between them dynamically..via variation and substitution.

I could even invision probabilistic lists of jump addresses, or mappings of value ranges to jump addresses, or having await() style opcodes on some nodes that upon being encountered, queue-up ticks from upstream nodes whose calculations the await()ed node relies on, to do things like emergent convolution.

I've written all the classes and started on the interpreter itself, just a few things that need fleshed out now.

Heres my shitty little partial sketch of the opcodes and ideas.
https://pastebin.com/5yDTaApS

I think I'll teach it to do convolution, color recognition, maybe try mnist, or teach it step by step how to do sequence masking and prediction, dunno yet.

Comments
  • 2
    The only catch with this approach is I suspect what would happen is any trained graph using this method would struggle with out-of-distribution data.

    Might be interesting to train a graph of nodes for any specific problem, that simply classifies false positives/false negatives, treat it as a sort of discriminator, and go around the issue as a result. Use that as a loss function for running variations on the original graph and then refine the original it until it finds a better internal loss function.

    And then simply dogfood the discriminator back into the graph (or even automate the dogfooding while generating variations of the discriminator), continually, splitting new data into train/test/validation as it arrives, and generating variations.

    Who needs a gradient for local minima, when you can just keep trying till you get it right?

    Probably really sensitive to hyperparameters (like whether or not to do depth first search) if my intuition is correct.
  • 0
    @jestdotty lol, doubt, but it's fun to try new approaches.
  • 0
    @jestdotty hows your project going btw?

    You were talking about threads. Did you consider doing something higher level like spawning new processes and letting the OS do thread allocation for you?
  • 1
    @jestdotty glassbox and xgboost do something like that, using a gridsearch of hyperparameters.

    Catch is it exists outside the core algorithms more as infrastructure.

    I was thinking more modular, where the hyperparameter search is part of the network itself, so finding more efficient graphs is equivalent to finding more efficient discriminators and vice-versa.
  • 0
    @jestdotty a little over my head, so I assume you catch the errors and then retry?

    Just curious but are you using an OS-agnostic layer over the thread api?

    How does a thread 'await()' an input?

    Is it just polling all the way down, or does it wait for system interrupts, or os events?
  • 0
    @jestdotty ah, now I feel like the dumb one.

    nice.

    what library and language you using by the way? Thought I saw something about javascript. I assume you're running on node then?
Add Comment