Ranter
Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Comments
-
Cool ! But change the name "heuristic translation machine learning". DO we want a second HTML ;p ?
-
Machine Learning is roughly about pattern matching.
The patterns come from known data.
This seems a bit different than looking what *unknown* data might do or consists of.
And yes, 'file' does this already without needing terabytes of storage, dozens of CPUs and Gigabytes of RAM.
Although it only identifies known formats, of course. -
@Yamakuzure The data in question is a bitstream. Not a bytestream. I have a source of data that it is not known how the bits are encoded or if they even have a complete byte in some cases. I intend to search for possibly characters in latin character set (the stream is old, like 40 to 50 years old) and possibly other datatypes. I have no confidence in the data being complete even for individual bytes. Yes, for data that has bytes that are unknown then other tools are available.
Related Rants
I had a splash of inspiration. I would like to develop a method for analyzing unknown bitstreams of data. The method would involve determining the format of the data by trial and error machine learning algorithms. This would allow determining data types and byte formats and meanings of streams of data. Could be useful in data forensics. I would call the method: heuristic translation machine learning. I am currently developing code that does this. It will be fun to learn about reinforcement algorithms.
joke/meme
wk176
rant
goal