4

Trying a custom eMbEdDeR today using open AI and pinecone on some douments of financial data. Let's see if its smart enough to be able to answer even the most basic questions, or if it just fumbles completely.

My bias is to be highly skeptical.

Will report back with results.

Comments
  • 1
    Ok, first pothole, pinecone vector uploads don't work, i go to look into it, i find i'm reading chinese:

    https://docs.pinecone.io/docs/...

    haha, and everything i read essentially boils down to "well, just brute force trying every combination and pick the one that works best!"

    seems like we've really made a lot of progress in ML 🙄
  • 1
    PineconeError: undefined

    welcome to hell
  • 2
    "The session with the highest return is the session on Tuesday, March 14, 2023, with a return of 0.24%."

    yeah, i don't think so buddy

    sigh... just about what I expected... absolute garbage
  • 1
    it's somewhat decent if you ask exactly the right questions, but it hallucinates a bit and gets stuff wrong... need to research how i can fine tune it
  • 1
    @fullstackcircus "it hallucinates a bit"

    Welcome to modern ML
  • 2
    @atheist BuT tHeY tOLd mE iT wAs A TriLlIoN dOLLaR iNduStRy!!!!
  • 0
    I had an idea that instead of trying to "find" the answers itself, it would write the CODE to find the answers (the SQL query or similar), and then THAT could be the specific data for the answer (could use another step of LLM to actually summarize the results)

    from my experience it never hurts to add deterministic steps mixed in with AI models
  • 1
    How is that even works? I mean using pinecone with open ai. Are you using chat completions?
  • 1
    @hack saw this on hackernews, its a framework where you can just use typescript to connect all the pieces: https://github.com/axilla-io/ax/...

    you use pinecone to store your documents in vector form and open AI's embedder to actually make sense of them (from a machine's perspective)

    but like i said, the model's responses themselves not working too well at initial try
  • 0
    OCR on the docs to get the Text and feed it into mangodb(or other document based database) and just query it manually.

    No need for fancy algorithm, that just do the same job.
  • 0
    @max19931 ... why would you use OCR when 1. already have the literal text in file form

    2. you could put the texts in a full text search in postgres

    the power of a chat based way to query data is that you can have a moving context window and have some intellegence beyond to dig deeper or show related things. you can't quite do that with a full text search / regex based query
  • 0
    @fullstackcircus i thought you had Scanned financial records on paper.

    But if they have the data already extracted, it would simply be a query like on a database. Very old school, a relational database structure.
  • 0
  • 0
    @max19931 a financial scrub is not going to be able to write / query "SELECT * FROM sessions ORDER BY "return" DESC LIMIT 1;"

    thus the whole point of embedding / RAG / chat based applications. it enables the non programmer / engineer to be able to do programming / engineering type things
Add Comment