Biblexica: An Auditable, Scalable AI Translation of the Bible
We set out to translate the Bible from source texts using AI, in a process that was auditable, scalable, and fully "vibe coded." This post outlines the personal and project motivation, process, and results. We encourage others to adapt our results, try translating new texts with AI, and explore the Bible in more detail. These translations are free to use and share; copyright is retained by Biblexica. If there is enough interest, we may open-source the code, prompts, and translation APIs.
If you'd like to contribute to the project, please see "Contributions" at the bottom.
Goals
Biblexica's Bible aims to achieve the following:
- Proximity to Source Truth: translate verse-by-verse from the original Hebrew and Greek texts.
- Fully Scalable and Auditable: each verse is a file; verses scaffold into chapters, chapters into books, books into testaments, testaments into the Bible.
- Configurable: different readers prefer different registers. The Message reads differently from the KJV. You may want a translation tuned to a particular scholar or tradition. In Catholicism this can read as anti-dogma (sorry, Catholics!), but in evangelical Christianity this is often how churches form. That is why we ship both "Poetic" and "Optimal" translations, even if the poetic can feel far out compared with conventional versions.
- Adaptable Process: the pipeline should adapt to other texts, not just the Bible.
How We Translated the Bible
We started with fully open source data:
- Hebrew Bible: OSHB WLC (Open Scriptures Hebrew Bible) with lemma and morphology data (CC BY 4.0 for morphology; WLC text is public domain).
- Greek New Testament: Westcott-Hort text with morphological parsing (public domain).
- Septuagint: Swete LXX is present as a witness source; the current pipeline keeps the LXX as a witness source, but it is not used in every verse translation.
We cleaned and supplemented the source texts using LLMs. The data were normalized into data/index/verses.jsonl, so every verse had a clean packet: source text, tokens, and connective signals. Intertext seed links (OT/NT quotes and allusions) were also available for harmonization. Biblical note: our MVP did not tackle certain longstanding spiritual and textual questions (Matthew 17:31, Acts 8:37).
We followed a "laddering" process for translation: verse, chapter, book, testament, Bible.
At the bottom of the ladder, we translated every verse, using the three primary sources as inputs and outputting a JSON file for structure. Each verse includes footnotes and controversies (if applicable). From each verse packet, we began to build the Bible:
- Chapter review: verses are pulled together; critiques, footnotes, and conflicts are harmonized.
- Chapter finalizer: an editing prompt merges footnotes and makes decisions based on a reviewer committee of biblical scholars.
- Book harmonization: targeted overrides and directives for consistent voice and lexemes.
- Testament and Bible harmonization: higher-level passes (we use Gemini here for its longer context window) reconcile global consistency and thematic cohesion.
- Post-processing: a glossary builder promotes lexeme preferences; audit and metrics scripts quantify density, n-grams, and TTR.
- Final Text Builder: compiles final outputs for distribution.
The end results are pretty good, but not perfect. Some editorial notes made it into the final editions and need scrubbing. There may be issues with favorite verses and plain mistakes. We look forward to hearing about these and refining the process.
For reference, the translation runs through the verses typically took 2-3 days using a custom-built CLI runner and OpenRouter. We used free preview models to keep costs low. If there is serious interest or sponsorship, we would gladly repeat on a model of your choice.
Known Issues (So Far)
- Poetic translation choices can drift into problematic interpretations or outright heresy. Some may remain in the current body.
- LLM translations tend toward "moralism" vs. "revelation as truth." This is not a problem for non-believers, but it is a barrier for the deeply devout to trust the process.
- Hallucinations and cleanup are challenging and strain the context window. We expect editorial artifacts scattered throughout that need cleanup -- a remnant of pushing the context window.
The Website
The website was vibe coded and deployed to Cloudflare for simplicity and speed. It is a bit buggy, but fine. OpenRouter currently uses a free model for search and interpretation; we'd prefer stronger models, but we will see.
Code, Prompts, and Artifacts
These are not cleaned up and ready for consumption, but they are not "secret" either. There is a repo; it is just not public.
Sponsorship, Contribution, Comments
End to end, this project took four months. It took about two weeks to get the runs up and running, managing the process the entire time on "worse" CLI models. We are interested in hearing from anyone passionate about the intersection of faith and AI technology. Reach out with comments and contributions at contribute@biblexica.com.
If you feel compelled to take more action:
- Please visit our home church: cornerstonesf.org
- Please donate to our home church: Ways to Give
- Contribute to our effort: contribute@biblexica.com
Thanks and God bless.
Postscript: Personal Thoughts on Faith and AI
Do we really need another Bible translation? It is hard to name a single text that has generated more scholarship, effort, and investigation than the Christian Bible. So why bother?
There are three reasons:
- LLMs are natural translators. Translation provides a deeper relationship with a text.
- The translation process is interesting from an LLM chaining perspective.
- Faith is more important than ever in a world where we may be keen to worship "the new gods" of technology and AI.
Translation as Embrace
I recall translating the Aeneid by Virgil from Latin in high school. The work was painstaking (declensions, tenses), and honestly most of my classmates could not be bothered to attempt it. When you translated, the result was that, with effort, you earned the dubious honor of reading an existing text that someone had already translated -- likely much better than your first attempt. Occasionally, the text in its original form revealed Easter eggs -- word choices, phrasing, prosody, and more -- that gave you a new appreciation of why a text was so revered in the first place. You had to engage slowly, deliberately.
LLMs provide this opportunity to everyone with the most important text ever created by humanity. Bible study is already a time-honored tradition in Christianity, but LLMs let you get one step closer to the original source text.
You can ask better questions, and we can collectively use the text to build better relationships with the text and, ultimately, with God.
A Useful Process
The full text of the Bible sits just inside a 1M token window enabled by Gemini. My thought was that this could allow Gemini to "crunch" the entire Bible and look at different threads and aspects in translation.
I see LLMs more as fast-food workers than "AIs," and I was curious if there was a process that could produce a usable result.
My initial take is there is, but you will be the judge.
Faith in the AI Age
A lot of the AI conversation is centered around "NGMI" -- short for "Not gonna make it." If you do not learn AI, if you do not get a lot of money, if you do not do this. My personal take is a hopeful one, and my generic take on AI doomerism is that people should touch grass -- reality has a surprising amount of detail. If millions of flesh-and-blood humans could not take over the world, why do we think some superintelligent AI can navigate it? Still, the gloom can get to you, especially during the upcoming upheaval, job loss, and conflict.
Faith is and always has been a critical part of humanity. Everyone worships, said David Foster Wallace; your choice is what to worship.
You should consider worshipping Jesus Christ, in all earnestness and seriousness. Other "gods" like money, fashion, and intellect will eat you alive.