GPT Road | new | subscribe | contact | submitLoading...
Swiss Army Llama: Embeddings, Completions, Grammars, etc. with FastAPI (github.com)
2 likes by eigenvalue over 1 year ago | 3 comments
This project originally started out with a focus on easily generating embeddings from Llama2 and other llama_cpp (gguf) models and storing them in a database, all exposed via a convenient REST api (hence why it was originally called the not-very-catchy name of "llama_embeddings_fastapi_service" when I submitted it a few weeks ago). But since then, I've added a lot more functionality: 1) New endpoint for generating text completions (including specifying custom grammars, like JSON). 2) Get all the embeddings for an entire document--can be any kind of document (plaintext, PDFs, doc/.docx, etc.) and it will do OCR on PDFs and images. 3) Submit an audio file (wav/mp3) and it uses whisper to transcribe it into text, then gets the embeddings for the text (after combining the transcription segments into complete sentences). 4) Integrates with vector similarity library (pip install fast_vector_similarity) to provide an "advanced" semantic search endpoint. This uses a 2-step process: first it uses FAISS to quickly narrow down the set of stored embeddings us cosine similarity, then it uses the vector similarity library to compute a bunch of more sophisticated (and computationally intensive) measures for the final ranking. 5) An endpoint to automatically generate a BNF grammar definition from a sample JSON file or string, and also from a Pydantic data model definition. Grammar files can be directly used with llama_cpp for constrained sampling, an incredibly useful thing when making applications. Also includes code for automatically validating grammar files. 6) An endpoint to view the application logs in a nice web view with helpful coloring, with the ability to download the logs or copy to clipboard. 7) And endpoint to add a new model file by supplying the URL to the model (e.g., a Huggingface URL). Previously, you had to manually edit a function in the code to add a new model. As a result of all these additions, I changed the project name to Swiss Army Llama to reflect the new project goal: to be a one-stop-shop for all your local LLM needs, so you can easily integrate this technology in your programming projects. As I think of more useful endpoints to add (I constantly get new feature ideas from my own separate projects-- whenever I want to do something that isn't covered yet, I add a new endpoint or option), I will continue growing the scope of the project. So let me know if there is some functionality that you think would be generally useful, or at least extremely useful for you! A big part of what makes this project useful to me is the FastAPI backbone. Nothing beats a simple REST API with a well-documented Swagger page for ease and familiarity, especially for developers who aren't familiar with LLMs. You can set this up in 1 minute on a fresh box using the docker TLDR commands, come back in 15 minutes, and it's all set up with downloaded models (I include and ready to do inference or get embeddings. It also lets you distribute the various pieces of your application on different machines connected over the internet.





© 2023 GPT Road. All Rights Reserved.