Ranter
Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API

From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Comments
-
@atheist i beat you to it :P I'm looking at polynomial regression as we speak :P hehe since you can approximate sin and cos as p series would kind of make sense no ?
-
@atheist isn'yt that interesting your sf answer doesn't use an activation function.
-
Hazarth91464yJust a tiny correction: ReLU isn't a layer type, it's an activation function. all outputs eventually go through (or should) an activation function like ReLU, atan or sigmoid (or any other)
that being said, yeah, with a network large enough you can approximate anything. Neural networks are literally just doing curve fitting on some unknown function that's just observed as your training set points. However some functions are harder to fit than others and start requiring disproportionally large networks to still approximate the function well.
Also an important thing to consider is that activation functions play a *significant* role in your networks performance. see the weights and biases are nothing but multiplication and addition, that really just *scales and moves* the activation function :) what the network is trying to figure out is how to bend all the activation function series to fit your training data -
Hazarth91464yfor deep learning, especially with stuff operating on binary data like images and text, you don't really want softness in your functions, which is probably why ReLUs perform so well in those cases... but when trying to approximate something much smoother using a smooth sigmoid function can sometimes increase learning performance because the network can work with it
just image you want to teach the Sin() function to a single perceptron with a single input and single output
In -> Ac(W|B) -> Out
where Ac - activation function
W - weight
B - Bias
if your activation function is something linear like ReLU it will never ever be able to fit anything similar to a Sin() it can only produce all sorts of linear functions no matter how it changes W or B...
but now change the activation function fir Sin or Cos and even as a human you can see that you can just set B = 0, W = 1 (sin) or B = -pi/2, W = -1 (cos) and the perceptron performs at 100% -
Hazarth91464ymy point ultimately is, your networks will perform the best when they are built correctly to attack the problem at hand. Network architecture matters a lot and indeed some of the State of the art results are only achieved by changing architectures, not by adding anything too fancy or new to the paradigm.
you have to understand how networks work, what your inputs are, what the problem of transforming the input to the output means and how to best equip your network to deal with it... People are pretty bad at all of this and most of this falls under experimentation and hyper parameter tuning to really see what behaves how -
Hazarth91464y@atheist true, but for the sake of argument it might as well be in case of single perceptron.
But you're right, it can still be used to learn Sin with sufficiently big network! -
@atheist haha I had calc II only a few years ago and I don't rmemeber half of it lol I never use it for anything. I was thinking you could approximate sin with simpsons rule ! lmao
-
@atheist honestly I look at it and just see a narrowed range its 0 to +inf and discards half its inputs if they're -.
one of you said 'with a large enough network', i thought adding a lot of layers and neurons actually could make a nn not function. am I incorrect ?
and where relu is concerned, as I said I'm going off examples that likely were meant for mnist, which didn't narrow its values at the ends using sigmoid like mine does, but played a reLu activation function between all other layers.
I've read one should do this.
as for all the other layers other than linear i don't see how they work or what they'd be used for yet. -
@Hazarth so in a sense between all the layers is really where that good tasty number magic is occurring ? even though you have linear layers applying a linear function to the inputs to create its output ?
so in the activation functions not having linear regularity should be enough is what you're saying to twist and bend shit around that may have a less than straight line regression model ?
i need a fuzzy logic way of thinking about this ! -
@atheist @hazarth wouldn't it make sense to leave relu out altogether since having insanely varying values to plug into sigmoid would allow you to get more combinations of values from all the layers and allow your network to train for more possible values ? instead of smashing values between layers into [-1,1] ? or in the case of relu [0,inf] ?
-
@atheist I"m using hardtanh atm because its output range is between -1 and 1 and since in my case i Took a smaller portion of my original project and moved away from telling something how to move, instead trying to fit output values directly to input (velocity, angle) vs output (projectile landing, velocity and time and max height) some of these will be negative values.
with some architchture changes I am happy to announce it spits out landing distances that are completely random and vary every 1000 training batches at 20 epochs per. -
@atheist I also think I reached my first RelU lock as well, so that led to a redesign.
Let me see if I can understand this. Basically this is a series with no exponents ? In a really kind of roundabout way ? -
Hazarth91464y@AvatarOfKaine
To your first comment. Mostly yes, a lot of the basic tasty stuff happens on layers. Literally a lot of tiny pieces working together to put functions together in a way that starts approaching your model.
When you think about it, backprop is doing a lot of partial derivations, so in laymens term, It's trying to figure out how each weight is responsible for the activated version of the next node and tweak it sligthly in the direction that's closer to the desired output. If you plotted each perceptrons output on a graph as a function, you'd see that each perceptron is just your actication function at that node being moved around and rescaled
The sum of all the perceptron operations are your outputs at the end. It's logical when you think about it at the simplest possible model
And then there's part two, which is the overall architecture think about Auto Encoders, or the U-Net architecture which even has hidden "metadata" passed around -
Hazarth91464y@AvatarOfKaine
Ofc there's a good chance you don't need anything fancy for the problem you're solving.
But it's always good to keep in mind how your network design affects the output.
Regarding ReLUs it turns out that ReLUs perform really great with larger networks and proper deep learning, which is probably why they are so frequently used. They are so simple, fast and unspecific, that you can compensate for their simplicity with larger networks and you can build pretty complex shapes and functions with them as long as your network resolution is. They are really generic and allow you to model pretty múch anything. In the end It's easier to use a lot of straight-ish lines to model a curve than to use a bunch of S shapes to model a line... But you specifically mentioned calculating an angle of a turret so Im thinking that maybe a few sin/cos activated nodes might help with that part... Not sure though, Im still not sure about your inputs, it should probably be simplified further hmm -
@Hazarth In the case in question the angle solution is already calculated, the problem ends up being linear, the problem is when i include a velocity coeficient(n) from which I can later retrieve at inference time the proper angular velocity (V{max}*n). That seems to realllly slow down training and i wonder if it will ever reach the results desired.
But it itself only has one conditional which is obviously that simply subtracting one angle theta from the other, if the distance is greater than pi then the opposing measure should be followed in the opposite direction..
All this gets calculated into clean output, and the angle in question is already calculated and effects the training data.
not so much included in it. -
@Hazarth
you have the following for outputs:
self.FireNow = float(firenow)
self.RotateClockWise = float(clockwise)
self.RotateCClockWise = float(cclockwise)
self.MoveUp = float(up)
self.Down = float(down)
self.OutOfRange = float(oor)
self.vspeed = float(vspeed)
self.hspeed = float(hspeed) -
@Hazarth And the following for inputs:
pol.R/r.Max,
pol.Theta/math.pi/2,
thetadelta/math.pi/2,
phidelta/math.pi/2,
targetdistance/r.Max,
self.TurretOrientation.Theta / math.pi/2,
self.TurretOrientation.Phi / math.pi/2,
self.MuzzleVelocity/
ProjectilePath.maxCannonMuzzleVelocity,
self.MaxTrackSpeed.Theta/ ProjectilePath.maxRotationalVelocity,
self.MaxTrackSpeed.Phi/ ProjectilePath.maxRotationalVelocity,
self.SplashRadius/r.Max]).float() -
pol is the polar coordinates of the target.
so there are some comparative and linear relationships here which would suggest that once the math is done you could somehow create some kind of crazy covariance between the inputs and outputs.
the velocities SHOULD be shaped to the maximum velocity of theta and phi max by simple inference.
but i know from the last time we did this if i remove those and just have the turret move at a constant rate and not be adaptive that this model WILL train.
the problem I have is that presently the velocity component seems to be what is screwing up training.
but yes, its all pretty much linear... if you specified those nice large curly things that indicated where differences in output variables occur related to input conditions and ranges. -
Hazarth91464yWhat happens if you drop the maxXYZ inputs? You could just clamp() the outputs yourself instead of confusing the network
Honestly I'd simplify it to just target Position, turretPosition, currentRotation
And the outputs to just "turnLeft, turnRight, fire) one-hot-encoded
It would be like pressing left/right/fire button on a keyboard, so the max speeds are ensured by the actual logic rather than the network.
Also how do you train it? Is this unsupervised training or do you pre-generate some training data?
With unsupervised in this case your NN really doesn't need to know the max speeds of anything, its an implicit property of the simulation that it will figure out, the bullet will only get there so fast, and you only reward the turret when it does hit anyway. -
@Hazarth
To train: I generate random targets and then calculate the moves that should be made to reach it.
the reasoning behind including max speeds is the idea that you can have multiple types of turrets. but yes at this point they are constant values and can probably be omitted, splash radius presently is pretty standard too atm, having pretty much a constant value, but would be nice to include as well.
i don't imagine these values being omitted would speed training but its worth a shot.
the network only has a 22 neuron hidden layer. is that enough ? -
@Hazarth anyway then i have two modes of supervised learning.
in one it follows the garbage outputted by the nn in its post mini-batch state, and if it hits something WAY off, like something that freezes it moving at all, it trains until one of the parameters satisfies itself.
otherwise the other way is it always moves the turret to the calculated input.
the previous way you'd think best but i think it has a tendency to overfit
the second seems to create 'grooves' so to speak that the turret falls into and follows when it hits a point that is heavily trained.
when i previously trained this it resulted in a turret occasionally doing a 360 dgeree turn, and suddenly homing in on the target and firing.
the velocities would be nice however. the aforemnetioned behavior moves the turret at a constant speed limiting precision.
in reality this speed is pretty simple to calculate so you'd think could be trained really easy. -
@Hazarth it's merely:
t = (Theta_Pos - Target_Theta) /theta_max_vel
if (t > 1): theta_max_vel
else Theta_Pos - Target_Theta -
@Hazarth but yes if i eliminate those values it trains fast and works.
I just don't like that i can't incorporate precision into it or model parameters. I'd like to have an idea of how fast I can train it WITH those parameters.
you can see why this is important
imagine that the theta/phi corresponded to the strength of an electrical signal sent to a voltage controller on a motor for example.
kind of a good POC if it works. -
@Hazarth otherwise could simply be interpreted as "HURRY HURRY" OR 'JUSTTTTTT A LITTLE BIT MORE....'
-
HELL maybe making it mean that would make more sense where a human hand is concerned.
-
Hazarth91464y@AvatarOfKaine you might need a much bigger network to train *with* those parameters... but not because it's difficult but because it probably makes no sense to the network, especially if it's not capped to 0-1 values and the angles are. It's probably not helping if the values are static all the time too, the network can't learn from a static number. Does your training involve randomly changing the max speeds to see if it can generalize in such environment? also if you didn't already, you should normalize the max speed parameters to 0-1 values, maybe 0 is stuck and 1 is say 2rad/s rotation or something. for linear projectile speeds cap it to 0-1 where 1 is some set value, like 50px/s...
If you already have the values clamped in <0,1> then try changing those values in the training set too, and if you already tried that as well then I'm not really sure how much you can do... -
@atheist I'm trying something I think should work after our previous discussions.
Sigmoid will produce results from 0 to 1, but its has to receive input that will allow this .
So for the regression test to just feel around for a function
f(x) = x^2 + y^2 + 2x^3 + 2y^3
I made:
self.linear_relu_stack = nn.Sequential(
nn.Linear(2, 1024),
nn.Linear(1024,1024),
nn.Linear(1024,1),
nn.Sigmoid()
)
note the lack of activation functions before sigmoid.
The value literally by the time it reaches sigmoid has to be -6 to 6
I don't honestly even think the initial tanh is necessary because my inputs are between 0 and 1
This should work. its 2 inputs, 1 output and it should be fitting to a rigid mathematical definition.
if the function never trains the weights and biases to be above 1 and most of them to be beneath it, it should work and it allows for positive and negative parameters for sigmoid right ?
so when i plot it -
the big 3d plane should show values intersecting with it after random inference. the question is, do i have enough neurons and layers ?
1 input
1 hidden
1 output
2048 connections (1024 * 2 inputs) from input to hidden, 1024 connections to the output layer, 3072 total weights and biases.
should be enough right ? especially since its not learning unpatterned data, its leanring data with a specific mathematical relationship... however it is a multivariable relationship with terms that have an exponent between 2 and 3 -
@atheist therein is a good example of a design question being posed. between how many possible outputs for regression and how many neurons, not how many input and how many output since the data being modeled could potentially be down to the number of digits of precision between the two inputs and the one output.
2048+1024 different connections.
that should be sufficient for estimation eh ?
if sin can be approximated with the same number within a finite input domain.
and i'm limiting this input range from 0 to 100, so 0/100 = 0 to 100/100 = 1
however the precision of results could be up to the maximum precision of float32 in pytorch.
so there could be numbers like
0.12227773339463637373 -
@atheist @hazarth to backtrack and understand layer design I have to understand the precision problem with real number approximation.
the earlier example I gave works when I remove the velocity parameters and the like and just give the muzzle velocity, angle values and radius of the target and make the move speed a constant 0.1 degrees on each axis, -180 to 180 degrees on the theta, -90 to 90 on phi.
but that doesn't quite solve the problem
the output responses can be the following
move up = 0.75 to 1.0
move down = 0.75 to 1.0
move clockwise = 0.75 to 1.0
move counterflock = 0.75 to 1.0
fire = 0.75 to 1.0
out of range = 0.75 to 1.0
so a total set of 2^6 or 64 output values interpreted BROADLY from 0.5 to <0.75 being false and 0.75 to 1.0 being true.
so.. small network is fine.
approx 20 neurons for the hidden layer
the implied training will occur for certain values depending more on their greater/less than or equal status as opposed to any real numerical calculation. -
@Hazarth
"you might need a much bigger network to train *with* those parameters... but not because it's difficult but because it probably makes n"
yes I am thinking that so the question now is HOW BIG ? which is what the more recent comments I am making are posing. :) right there with ya. -
!devrant
is there any way of scrolling through my notifications ?
@-red kind of +1'd every last post I've made LOL he likes me he really likes me lol -
@atheist jesus how long must it have taken to train an object classifier to spit out bounding boxes !
-
@atheist yeah mostly I’m curious just what it’s capabilities are
Define dsp again -
This is mainly academic the only uses I would have are actual image recognition and handwriting recognition
-
@atheist what’s best for object detection ? Like not classifying but detecting boundaries in an image
Personally I would think multi angle camera arrays would have squashed that application -
@atheist "Define a boundary. Do you want to separate different people? Do you want to separate different parts of the body?" honestly that does sound more like classfication you're describing. I would simply want to objects as they appear before the brain figures them out, meaning this is part of this and this is seperate from that even if you have the occasional optical effect which makes that difficult till you get closer or see something move.
-
@atheist you know what angers me is I want to read all this and did previously wish to read all this all the things i learned are alive in my mind but if these fucks get there way we'll all be back to square one. doesn't this ever depress you ?
its one series of actions which render decent peoples objectives inert simultaneously rolling back all the trouble a bunch of jackasses made during this time period which sparked a series of conflicts across the country.
some days I wish the 'preferred pronoun' people had been carted off into the boonies by a bunch of perverse rednecks and shot.
extremism is bad enough without retarded extremism being added in. -
@atheist btw I spent a good number of years as an application dev.
what subfield involves all of this if I wanted a nice heaping helping of relevant theory and best practices ?
data science ? -
@atheist oh speaking of which. i fixed mny shit code and at 10k training mini batches. here is the result. yay.
-
@atheist so finally evidence of training. yay.
two variable, one output, polynomial function with up to a power of 3 exponent and coefficients in two of the terms summed, on a domain of 0 to 100 randomly trained and with lotsa decimal thingies with the plots being 100 inferenced output values :) -
@atheist lmao honestly I'm not a big fan of pythons graphics and image display capabilities.
opencv works for example but it feels clunky, you know ? -
@atheist sounds like my tkinter project :P which visualizes a turret as a dot with a dashed yellow line to indicate where its pointing, its firing range as a big red circle and a black line and blue circle to indicate the landing area and the light green circle to indicate the firing radius at the current phi.
-
why do i seem to remember some kind of major unexpected shift in values after some time training....
-
so anyway, I am convinced if i redesigned my other network i could make it perform as it should with more neurons.
this has 1 hidden layer of 1024. -
Has there been any research into neural networks that can use attention to pick between activation functions for specific tasks/problems?
-
@atheist unhappy development.
at 54000 mini-batches the results changed from a curve which loosely fit the equation plane, to what you see now.
could this be due to network size being too small ? -
@Wisecrack I'm trying to remember what I saw regarding human supervised selection at various points, where the network trained and occasionally humans provided input of some sort.
See now why I understand that in essence given a vector (parameters), you modify weights and biases minimally and these get passed through a set of dropoff style layers like ReLU and that in the end each layer leading to an output will basically sum up to a value that goes through sigmoid and concurrently equals the value desired once trained..... i don't see how this could cover all bases when parts of the math used to calculate the output is trigonemetric and polynomial. I mean not complex math ! Real basic things in my case, but a polar from cartesian coordinate conversion, angle and leg size, etc all going into determining that a target equals a landing zone and if not how to move things to it.
Is there something I'm missing where you kind of model the math because at best sin and cos could be a power series.
random