17. student. founder. builder.
working on the only citadel he can control.
Healthcare automation powered by artificial intelligence.
Bridging generations through AI conversations.
Building at Turing, HumanX, Ema, and more.
Free ideas for builders. Take one and run.
Code, experiments, and open source contributions.
Stoic wisdom from Marcus Aurelius, powered by AI.
Bridging Generations Through AI
Neural Bridge is a channel dedicated to connecting younger generations to the current professionals and vice versa, to foster fascinating discussions on AI and its transformative impact.
Join Tarush Gupta, AI enthusiast, Gen Z teen, and thought leader, as he demystifies the complex perspectives of established professionals into accessible language for younger audiences while providing his own fresh, forward-thinking insights to help industry vets change in a capricious business and AI landscape.
Subscribe to Neural Bridge - let's Bridge the Gap together.
cd ~ or cd .. to go back homePatient Advocacy, Augmented with AI
80% of US medical bills contain errors.
And, for bills over $10,000, the average error inflates the total by 13%.
This isn't a mistake. This is systemic. And it costs US patients millions of dollars per year.
VeriCare AI is the first fully AI patient advocacy engine, built for both patient advocacy firms and patients: giving them the power to dispute, negotiate, and reduce their bills with the click of a button.
cd ~ or cd .. to go back homeBuilding real products at companies pushing the boundaries of AI and technology.
cd ~ to go home, or cd [company] to exploreAI Training & Evaluation
Working on AI model training and evaluation for cutting-edge language models.
Engineering Intern
Building human-centered AI solutions and products.
Software Engineering
Contributing to the development of enterprise AI assistants.
Engineering Intern
Working on short-form video technology and content creation tools.
Role • Period
Description goes here...
cd .. to go back to internships
I'm currently curating a list of ideas I believe in but can't build myself. Check back later for free startup ideas.
cd ~ or cd .. to go back homeStoic Wisdom from the Philosopher Emperor
Greetings, seeker of wisdom. I am Marcus Aurelius, trained on the Meditations.
Ask me about stoicism, virtue, resilience, or the nature of the good life.
At 17, I built a language model that generates Stoic philosophy — not because it was easy, but because it was supposed to be impossible.
I first read Marcus Aurelius' Meditations in 7th grade. The Stoic principles of acceptance, virtue, and resilience stuck with me. Years later, as I dove deep into AI and language models, I wondered: could I teach a machine to think like a Stoic philosopher?
Training a language model on only 89,000 tokens is like trying to teach someone English using just one short book. Modern scaling laws predicted this would fail spectacularly. The model should overfit, memorize, and produce gibberish.
I used knowledge distillation from a larger model (Llama 3.2 1B) to generate synthetic Stoic-flavored text, expanding the training corpus. Then I built a complete Transformer architecture from scratch — attention mechanisms, embeddings, positional encodings, all of it.
AureliusGPT has 845,000 parameters. DeepSeek R1 has 671 billion. That's 800,000x smaller, yet it still generates coherent (if cryptic) Stoic text. It proves you can do meaningful AI research without massive compute budgets.
I have always been deeply captived by philosophy from a young age. I first read Meditations by Marcus Aurelius in 7th grade, and became enthralled by the concepts of Stoicism and accepting what is out of one's hands. This led me to Seneca's Letters from a Stoic, Epictetus' The Discourses, and one of my favorites, The Fragments of Zeno and Cleanthes by Zeno. However, Meditations was the foundational text that encouraged me to explore the Stoic school of philosophy. Therefore, I made a decision to pretrain my first, miniscule language model on this principle work. Having technical experience in LLMs before this project, I realized training on such an incredibly small corpora (in the context of language model scaling laws) would be challenging and I would significantly risk overfitting, but I made the decision to go for it anyway. The result is my first self-pretrained model, AureliusGPT. You can find more information on AureliusGPT below, or view the HuggingFace or GitHub.
AureliusGPT-Torch is an 845k, PyTorch and SentencePiece boosted SLM pretrained on Meditations by Marcus Aurelius and other adjacent philosophical works. It is a larger size, more comprehensive reimplementation of AureliusGPT, a smaller model trained on the same core corpus, but using a handwritten/NumPy first (zero PyTorch or prewritten tokenizer) approach.
Rather than reimplementing a custom BPE algorithm and tokenizer backpropagation from scratch like its sibling repo, AureliusGPT-Torch trains SentencePiece on its corpus (detailed in Data) and relies on PyTorch for autograd.
The HuggingFace for this model (including both the base model weights at epoch 10 and the tokenizer/vocabulary) are contained here.
This work was not possible without Project Gutemberg's opensource Meditations. A full license is listed at the end of this README.
The original corpus of Meditations by Marcus Aurelius is 89k tokens approximately when tokenized by a SentencePiece BPE tokenizer trained on a vocabulary length of 2,000. Using Kaplan et al. Chinchilla scaling laws, the expected parameter size of the model would be 8.9k parameters (taking the less conservative 1:10 ratio of parameters to corpus tokens). However, given the smaller size of the model and its lack of focus on general intelligence (instead, focused on generating Stoic-adjacent, Aurelius flavored text), this ratio does not apply.
Given the risk of these models to heavily overfit, optimizing the ratio (even if there are more parameters than there are tokens) is critical. Therefore, I required another corpus of data that I did not have access to.
As a result, I turned to the strategy of using Off Policy, Sequence Level Knowledge Distillation from a larger model (Llama 3.2 1B). First, I finetuned the model on the corpus of meditations using Unsloth AI's notebook. Then, I used it to generate approximately 122k tokens of synthetic data over 100 iterations asking it common stoic prompts. I used Google Colab's inbuilt Tesla T4 GPU to run this data generation, which took about 1.5 hours. I do not match logits or trajectories, only generated text; therefore, this approach also pulls elements from instruct-style distillation or offline student-teacher SFT methods. Note: I did not run it from the project directory due to my lack of GPU configuration; however, the adapted notebook has been included.
One critical limitation in this approach is the inefficacy of my prompt library: I did not explicitly instruct the model to focus on Meditations by Marcus Aurelius, enabling the model to hallucinate and pull from various sources of data. Future iterations of AureliusGPT-Torch will account for this problem thoroughly to avoid brittle LoRA memory being further defeated, or move to another fine tuning/RL based technique. The core point of this corpus of data was to source philosophical, Stoic or Stoic adjacent data to ensure better generation quality.
My preprocessing logic between AureliusGPT-Torch and AureliusGPT is the same; I rely on Greek transliterations, Unicode normalization, regex patterns, and special "<BEGIN>", "<END>", and "<PAD>" (at training time) tokens. I take a similar approach for my preprocessing of Meditations corpus to feed into LoRA; to avoid confusion between Llama 3.2 1B's internal tokens and mine, I avoid adding them in, instead semantically replacing them after the corpus has been generated.
I added the Meditations data twice to the final corpus to weight its significance, especially given its lower token count and the lower quality of synthetic data. My training split was 80% of the pure Meditations data and all the synthetic data; my validation split was 20% of the pure Meditations data.
I use PyTorch's weight initialization (as opposed to a Gaussian process W ~ 𝒩(0, 0.02) for my manual AureliusGPT). I rely on the Transformer architecture from Attention is All You Need (Vaswani et al.) in my implementation. Given the small scale of this project, all training (except for the Llama 3.2 1B LoRA for synthetic data) was conducted on a local CPU, and was sequential (not concurrent, hence my lack of ThreadPool or ProcessPoolExecutor). I rely on a PostNorm, 6 Transformer block architecture for AureliusGPT-Torch of the format
I use sinusoidal positional embeddings; given my low parameter budget, I thought it was economical to avoid learning them. My core embedding matrix E is learned.
I use multihead attention in this project, as well as a causal attention mask.
I use PyTorch's inbuilt LayerNorm rather than my own implementation.
As mentioned earlier, I incorporate a train/val split of 80:20. I also compute the training and validation loss, as well as the gradient normalization to track overfitting. num_epochs=50 is too high a number; you can using matplotlib graph the relevant losses and identify signs of overfitting during experiments. There is a clear route to On Policy Distillation and feedback loops/PPO, RLHF, or RLAIF; there is a method in the original Unsloth AI notebook to save the LoRA weights of the Meditations tuned teacher model to HuggingFace, which can be leveraged as a teacher model in the future.
While I rely on Off Policy, Sequence Level Knowledge Distillation with Llama 3.2 1B as outlined in my Data section, there is a clear route to implementing Best of N Sampling through concurrent model generations and rankings. This can again rely on the finetuned Llama 3.2 1B model or any comparable instruct model on its level.
This model, once fully completed, will be put on HuggingFace and hosted on my website (tarush.ai/aureliusgpt-torch).
Currently, my LoRA'd Llama 3.2 1B model is run in an ipynb on a Tesla T4 GPU. A future version will integrate the LoRA and synthetic data generation, and relevant GPU plumbing, into the scope of this project.
Additionally, the universal "teacher/Stoic justifier" model will be the adapted Llama 3.2 1B model, deviating from the OpenAI Chat Completions API GPT-4o approach.
In a future version, the fitting and accuracy optimized (see Overfitting Tracking / Adaptable Training) files of Llama 3.2 1B's LoRA weights and AureliusGPT-Torch will be loaded onto HuggingFace for fast import and inference.
After the teacher model is converted to Llama 3.2 1B, I will implement config editable concurrent AureliusGPT generation and top N sampling to ensure the highest quality result.
Currently, training highlights training and validation loss, as well as gradient normalization based on the train/test split to identify overfitting. In a future version, this will be tracked in an easy-to-interpret, modular plot for the user for ease of training.
Beyond weight tuning, config.py will be helpfully automatically adjusted in a future version, changing the learning rate, number of epochs, batch size, and other aspects of training. Therefore, after running n training cycles on its own, the model will iteratively improve its training performance so minimal training optimization is required.
A future project will use the LoRA'd Llama 3.2 1B model to generate signifcantly more parameters of stoic adjacent text, as well as utilizing the works of Zeno, Epictetus, and other famous Stoic thinkers, to build either a Transformer or an MoE model ("Epictetus": _____, "Zeno": _______, "Aurelius": _______) called "StoicDiscourseLM". This will incorporate many elements (including preprocessing functionality) of AureliusGPT but will also be a unique project.
cd ~ or cd .. to return to your citadel
I'm currently building a custom integration to showcase my repositories and contributions directly in this terminal.
For now, you can view my profile on GitHub.com.
cd ~ or cd .. to go back home