Ranter
Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Comments
-
p100sch15005y@hack that would be too easy. Now let's try guessing if it is contains a web site, web app, server/client, mobile or desktop application. That would be impressive.
-
3picName7295yFor those interested, I'm doing a bit of pre-processing on the crawled commits, then a Tfidf vectorization without stemming and without stop word removal. Finally a random forrest classification on around 5000 data points (~1min)
Suggestions for improvement are welcome 😊 -
endor57515y"I have no idea what I'm doing, but it's working, so I must be doing something right"
One of the most hilarious (but also frustrating) aspects of programming 🤣 -
Are you using neural nets or classic classifiers?
My first approach would be a bag of words or a string kernel to calculate the (contextual) dissimilarities between the entries, then you can use a random projection or PCA to reduce the amount of features, and then you use a simple k-NearestNeighbour classifier to find the class for your entry.
Related Rants
Currently getting into Machine Learning and working on a joke-project to identify the main programming language of GitHub repositories based on commit messages. For half of the commits, the language is predicted correctly out of 53 possible languages. Which is not too bad given the fact that I have no clue what I'm doing...
random
project
machine learning
ml
supervised learning