devRant - A fun community for developers to connect over code, tech & life as a programmer

One of my minions (erm, I mean, "a valued junior member of my team") asked to be assigned to tasks more "data science related".
Regardless of the very last-decade sounding request, I tried to explain to the Jr that there is more to "data science" than distilling custom llms and downloading pytorch models. There are several entire fields of study. And those are all sciences. In this context, science equals math.
But they said they were not scared of math.
I've seen them using their phones to calculate freaking tips. If you can't do 15% of a lunch bill in your head, hypothesis tests might be a bit more than challenging.
But, ok then. Here we go.

So I had them do some semi-supervisioned clustering. On a database as raw as dirt, but with barely 5Gb, few dimensions and regarding subjects with easily available experts.
Even better, we had hundreds of manually classified training and test cases.
The Jr came back a month later with some convoluted mess of convoluted networks; just the serialized weights of the poor thing were about as large as the database itself.
And when I tried it on some other manually classified test datasets... Freaking 41% error rate, for something that should be a slam dunk. Little better than a coin toss.
One month of their time wasted on an overfitted unusable mess.

I had to re-assign the task to someone else, more experienced, last friday. It was monday when they came up with an iterative KNN approach giving error rates for several values of K... some of them with less than 15% error on the test dataset.

WTF are schools teaching and calling "data science" nowadays?!?!?
I reeeeally need to watch those juniors more closely. Maybe ask for middle-sprint demonstrations. But those are soooo boring and waste so much time from people who know what they are doing...
Does anyone have a better idea to prevent this type of off-track deviation? Without being a total bore, that is.

And... should I start asking people "gotcha" data analysis questions before giving them free reign on this type of tasks? Or is it an asshole boss move? I would hate someone giving me a pop quizz before letting me work... But I got no other ideas.