Cloudflare AutoRAG first impressions

A long time back, I built a Retrieval Augmented Generation (RAG) app that contained all publicly available data on Warren Buffett and Charlie Munger. I could ask it questions and it would reply in a way that would make it happen in their shareholder meetings. I thought I should test CloudFlare's AutoRAG product using that dataset and see what results it gets.

Some raw notes from my testing:

Some vibe-check questions I asked:

Answers were good, but retrieval isn't. Lots of improvement can and should be made for retrieval. But then maybe I feel some retrieval techniques are domain and dataset dependent, so maybe it's not possible for a generalised product like this?

Where would I use this?

What I would like to see before considering this beyond prototyping:

Overall, I think this product fits well with their current serverless direction. They have been building products that pick sane defaults, abstract away messy details, and allow their customers to ship ridiculously fast, as long as they relinquish control.

Their AI gateway products look a lot more promising, and so does their vectorise product. Not sure if I want to review them right now.

A side note:

Six months back, I designed some 15 prompts, of which Chatgpt failed 7, whereas the RAG I built only failed on 2. I reran those same prompts, and Chatgpt failed only 2. Incredible process in these 6 months. I suspect niche RAG will become less and less valuable. This is especially true if the dataset was smallish and easily publicly available. Proprietary datasets that aren't publicly available will still benefit from RAG. I would say RAG is selectively facing extinction.