Ranter
Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Comments
-
Why are data scientists being made to write libraries?
Most places I know have engineers specialized in taking models and implementing them in well engineered, performant libraries. -
Oh, this is about code in papers.
The only thing that's guaranteed to do is reproduce the results in the paper :p (and if it doesn't, that's a big problem)
Expecting anything else is too much (which is where the engineers come in).
Data science/ML code can be pretty bad, but wait till you see systems code... -
JsonBoa30281yAnd that's why journals go out of their way to encourage scientists to keep their code private.
Not explicitly, of course.
But I've seen plenty of papers get recanted because someone found the GitHub with the source code and noticed that *it does not matches what was described in the paper*, sometimes in some pretty serious ways. -
@JsonBoa Private code means no reproducibility. Worthless crap because anyone can make up some diagrams of whatever. That's not even science to begin with.
The encouragement isn't to keep shit secret, it's to actually do real science or fuck off. -
While genuinely agreeing for the most part... im also a major data junky thats building research analytics with a massive self-engineered model and am a chronic over-engineerer. But i know myself.
-
@Fast-Nop i know Fortran... but i dont think I've been writing any similar code to Fortran since i was like 10... totally gonna double check that now.
-
@awesomeest I had the misfortune of meeting Fortran code at uni back in the 90s, and afterwards of seeing comparable crap again whenever I encountered anything written e.g. by a physicist.
As in, it did work, did amazing things, but it also made me long for eye bleach. -
@Fast-Nop oh, my encounter was totally, misguidedly, self-inflicted.
My father, the epitome of that saying 'personality of a wet dish rag', went to college/uni for 7 full years, all passing grades, 3 schools, 0 degrees of any kind-- got kicked out of each after 2+yrs of refusing to declare a major. He would brag about never reading a book as an adult and knowing Fortran. 7/8yr old, highly unsupervised me(already knew/perforned basic web dev) figured there must be something to this Fortran thing... learned it... wasnt practical back then, certainly not now, and realised my father didnt actually know shit.
It's now one of many entries on my list of advanced skills/knowledge that are extremely practical... in specific post apocalyptic conditions. -
@awesomeest I mean, Fortran used to have one big thing actually going for it, making it much faster than C before C99.
That's because of C's type based aliasing. If everything is of the same type, typically in math operations (float / double), then you get tons of pointless reloads.
Fortran never had that issue because it doesn't allow aliasing in the first place, and C99 fixed that with the "restrict" keyword. -
@Fast-Nop i learned C, and some basic binary, around the same time due to the realisation that just because i could accurately modify (physically in ways not at all intended by OEM) mobos and connect hardware i desoldered from others, didnt mean itd actually work in a computer unless it had proper drivers. That said, im well aware of your example.
i wasn't exactly on trend/current advancements with programming back then (im still basically a code dinosaur at 31). I wrote workarounds for lack of restrict... while restrict existed. Even shortly after knowing about it, i still had little practical exp with it so just used my, likely over-engineered, methods.
Frankly, I appreciate the context reminders... I didnt realise just how long ive been a chronic over-engineerer (though totally aware i am), before this thread of thought.
It's definitely a double-sided blade. -
@Fast-Nop nowadays I'd be inept without aliasing in general... especially with my current efforts to write/structure things for others, who dont live in my polyglot, autistic, legacy and totally void of any basic style code, brain. (im literally needing to write a lexicon for some of my code and administrative notes as im already short on time)
Im just greatful that no one currently needs to understand my main cache of code (notoriously bad at saving prior code... extremely)... mainly batch(incl DOS... i <3 DOS) and bash scripts ive created/modified for over 20yrs... rotating windows install/mod so it doesn't annoy me with features i don't want and aliased cli for annoying minutiae like converting long win/linux path variables between formats... remapping sys variables and reg edits to have just core components on c: and programs/etc on a separate drive, etc
I'm very odd and ocd when it comes to anything data architecture relevant... & historically alias scripts on scripts like CSS
Related Rants
Data Scientists/Researchers
Stop building libraries.
You can't build libraries.
You're not software engineers.
Write your script as plainly as possible.
Why?
Cus for every fucking paper that has code associated with it, unless it's from Meta or Google, I'm having to edit to make shit work.
Stop over-engineering shit.
Write your model and fuck off.
rant
ai
data science
software engineering
stop over-engineered shit
python