After running into only paid tools or overly complicated setups for turning web pages into structured data for LLMs, I was pretty much tired of this, wanted free open source solution to convert websites to MD format so built Mojo (for NotebookLM, or any RAG-like solution)
Hi, thanks for your comments (itβs on the plan), since Mojo is early-stage software, there is still things that need to be integrated, however mojo is not a mass-crawler, (you have to specify directly what to crawl), so even if I add robots.txt (wich is in the plan) Evil users can still just bypass this (I expect mojo to be used by technical (non-evil) folks).
Mojo it's extremly fast, supports proxy rotation and it's MIT licensed -> https://github.com/malvads/mojo