data science

Ranter

123abc

251

Comments

2

devTea

21682

5y

.
1

electrineer

28364

5y

,
1

Ranchonyx

10406

5y

;
1

Ranchonyx

10406

5y

Apart from that, what in the unholy name of C# is that?
Looks cool and all but wha-
2

cho-uc

1845

5y

Your accuracy for the 2nd model is 84%.
If those are the real-life data you're using, then the result is pretty good in my opinion.
I don't know the nature of of your data and why you choose to use KNN Regression
Maybe you can compare with other models, if you can squeeze more accuracy.
2

123abc

251

5y

@cho-uc thanks!

those are real data. yea i was also told that knn regression isnt normally used with volatile data.

ill probably do svm regression or LDA next. idk. do you have any model to recommend thats good with volatility?

thanks again :)
2

cho-uc

1845

5y

@cuburt
I just try on everything from Polynomial Regression, Lasso, Ridge to ElasticNet and pick the best one.
If you have time and power, you can even try neural net for regression

@NoMad is more of an expert on this.
3

NoMad

13491

5y

Disclaimer: am not an expert, it's a hit or miss with me.

First of, why are you plotting prediction dots? (green) What are you trying to see? The prediction line is basically what you need.

Your model is meh accurate. I don't get why knn, but maybe that could work. But you may need more data points to get a better answer.

What are you using (method/function) to get that correlation? Correlation without a point of reference doesn't have much meaning. But all in all, that number most likely says there should be small-ish to moderate negative correlation, which your line of prediction kinda supports that. (or so I think, correct me -anybody- if I'm wrong)
2

NoMad

13491

5y

Also maybe play with the K (currently at 20 kernels) and see if your result are any different. I could suggest 3,4, 5, 9, 10 instead. I don't think higher k could give you better answer necessarily (your data is a tad too cluttered to see any definite patterns here.)
1

123abc

251

5y

@NoMad now that i think about it, idrk why i plotted those green dots lol. i based the variable selection from a correlation matrix
1

123abc

251

5y

@NoMad yep, i already did that. i made a line graph for that as well
3

NoMad

13491

5y

@cuburt wait, why is accuracy on training set soooo different from test set??? Like, accuracy on training set should not be on all the set at once... I'm confused about the accuracy on training set here tbh.
2

NoMad

13491

5y

Like, you train the model, regardless of accuracy, and then the accuracy on test set is your measure. While on training, your model is still learning. Accuracy has no meaning for it.
2

NoMad

13491

5y

And on your test, k's 2,3 and 14 work too.
0

123abc

251

5y

@NoMad oh, maybe because sets are being fitted separately and scored separately
1

NoMad

13491

5y

@cuburt accuracy on training has no meaning afaik ¯\_(ツ)_/¯
Unless you mini-batch the jobs, then yes your loss/accuracy are relevant to overfitting.

My best suggestion can be use SVM instead. Use RBF and play with kernels until you get a proper "Test" result.
1

123abc

251

5y

@cho-uc @NoMad

Thank you so much! I'll keep those in mind
2

NoMad

13491

5y

@cuburt actually, one more thing. Try k's 2 and 3,and compare the results of test to this one. I'd be interested to know how it goes.
1

123abc

251

5y

@NoMad so i should fit the train and score the test?

the reason i did that is so i can compare their accuracy, that way ill be able to tell if its overfitting
2

NoMad

13491

5y

@cuburt well, are you batching the jobs? If yes, then yeah you can compare accuracy/loss.
And yes, bingo, ding ding ding, correct, hooray! fit the train and score the test.
It's better if you don't train/fit on test set as well, or you are actually doing magic on the model. (except if your doing life-long learning which tbh, I personally have problems wrapping my head around it so I'll leave you to it)
1

123abc

251

5y

@NoMad

k=3. i fitted only the training set and scored the test set. its gotten low. LOL
2

NoMad

13491

5y

@cuburt accuracy went near zero, but error rate stayed the same 🤔 Interesting indeed
1

NoMad

13491

5y

Can you get the same accuracy graph for kernel numbers, on test set again; without fitting on test set? That may change now.
1

NoMad

13491

5y

Just to point out something, look at PctWTP at around 1. The variation is more than half of the range for Slippage in training set. I mean, maybe this is why your model is confused. It finds neighborhoods with very high variations.
0

123abc

251

5y

@NoMad
1

NoMad

13491

5y

@cuburt interesting. What accuracy would a k=25 give you?
1

123abc

251

5y

@NoMad sometimes it goes .22
2

NoMad

13491

5y

@cuburt 😂😂😂 I'm legit confused on how accuracy could go below 0 😂😂😂
1

123abc

251

5y

@NoMad im not sure if its accuracy, its the model score or something
1

NoMad

13491

5y

Because, 0 accuracy means none of the examples were predicted right. Accuracy below zero means what? They were predicted right, but not? 🤔 Like, it can't mean reverse, unless you're not getting accuracy but the distance of something.
1

123abc

251

5y

@NoMad here i used rmse.
3

NoMad

13491

5y

Assuming you're using ` sklearn.neighbors.KNeighborsRegressor` I assume you're using this one:
https://scikit-learn.org/stable/...

which does say " The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0."
2

NoMad

13491

5y

@cuburt that looks nice-er. Still, 30 kernels is a shitton. Like, I don't know, maybe I'm wrong. If your model responds to it, maybe that's how it should go. Can you find out how many tests it gets wrong, with a reasonable margin? That could give you a better measure of accuracy
2

NoMad

13491

5y

Actually, another idea could be to find another feature (X, in your case is only PctWTP. Maybe add another feature that could be linked to how these neighborhoods are defined. AKA make the model 3D, or more)
Ofc, it'd take time so it's up to you if you want to go down that rabbit hole.
2

NoMad

13491

5y

On a second thought, ignore that accuracy measure. RMSE is your error rate in this cass.
0

123abc

251

5y

@NoMad is that what multivariate model is? because thats where im supposed to be heading
2

NoMad

13491

5y

@cuburt I think so. Because this is showing you that PctWTP is not the only factor in estimating the Slippage.
1

NoMad

13491

5y

Tho, iirc MANOVA (which deals with multivariate stuff) deals more with variation. 🤔 At this point, I'm even confusing myself, so let's say idk. ¯\_(ツ)_/¯
1

NoMad

13491

5y

One last request (cuz curiosity totally didn't kill the cat), can you show me how your test graph looks using k=25 or 30 or 40 or whatever you like?
(titled KNN regression, slippage vs PctWTP)
0

123abc

251

5y

@NoMad k=30. as the previous line graph showed, accuracy also increased (although accuracy doesnt really make much sense to me now, its not even the r squared which i thought it was)
2

NoMad

13491

5y

@cuburt if you are actually curious, what you can do next is to map two other lines. Your prediction line +/- the RMSE and then calculate how many data points fall in between these two lines. Then, you can say your model has a variation of RMSE, and it has an error rate of {whatever number that didn't fall in between two lines divided by total number of samples}. Which could be a much better model than Just the line of prediction (that is practically what SVM does much nicer)
1

123abc

251

5y

@NoMad okay, ill def keep that in mind. im actually planning to do SVM regression next

thank you so much! it makes much more sense to me now
0

Wisecrack

9328

5y

@pythonInRelay I keep on seeing this, what is this from?

Related Rants

Add Comment

random