tarush@citadel_os — zsh
CONNECTED
Press ENTER ↵ to execute
Press TAB to skip intro

Welcome.

INITIALIZING PORTFOLIO_V3
Dark Mode
Terminal Mode
← Back to Home cd ~
Welcome to My Inner Citadel

Tarush Gupta

17. student. founder. builder.
working on the only citadel he can control.

1
Company Founded
4+
Internships
8
Public Repositories
3
LLMs Implemented
Ideas Brewing
👥
Citadel Visits
---
Thank you for exploring my digital citadel
🚀

What I'm Building

Vericare AI

Healthcare automation powered by artificial intelligence.

Neural Bridge

Neural Bridge Podcast

Bridging generations through AI conversations.

💼

Internships

Building at Turing, HumanX, Ema, and more.

💡

Ideas Library

Free ideas for builders. Take one and run.

💻

GitHub

Code, experiments, and open source contributions.

AureliusGPT

AureliusGPT

Stoic wisdom from Marcus Aurelius, powered by AI.

Neural Bridge Podcast

Neural Bridge Podcast

Bridging Generations Through AI

76
Subscribers
21
Videos
5K
Views
9
Solo Videos
4
Podcasts

Neural Bridge is a channel dedicated to connecting younger generations to the current professionals and vice versa, to foster fascinating discussions on AI and its transformative impact.

Join Tarush Gupta, AI enthusiast, Gen Z teen, and thought leader, as he demystifies the complex perspectives of established professionals into accessible language for younger audiences while providing his own fresh, forward-thinking insights to help industry vets change in a capricious business and AI landscape.

Subscribe to Neural Bridge - let's Bridge the Gap together.

Contact Neural Bridge →
Type cd ~ or cd .. to go back home

VeriCare AI

Patient Advocacy, Augmented with AI

80% of US medical bills contain errors.

And, for bills over $10,000, the average error inflates the total by 13%.

This isn't a mistake. This is systemic. And it costs US patients millions of dollars per year.

VeriCare AI is the first fully AI patient advocacy engine, built for both patient advocacy firms and patients: giving them the power to dispute, negotiate, and reduce their bills with the click of a button.

And Patients Love It.

146 Likes
18k Impressions
21 Reposts
74 Comments
Type cd ~ or cd .. to go back home

💼 Internships

Building real products at companies pushing the boundaries of AI and technology.

Type cd ~ to go home, or cd [company] to explore
Turing 2025

AI Training & Evaluation

Working on AI model training and evaluation for cutting-edge language models.

AI Machine Learning NLP
HumanX 2024

Engineering Intern

Building human-centered AI solutions and products.

AI Product Engineering
Ema Unlimited 2024

Software Engineering

Contributing to the development of enterprise AI assistants.

AI Enterprise Automation
Proshort 2023

Engineering Intern

Working on short-form video technology and content creation tools.

Video Content Tech

Company Name

Role • Period

Description goes here...

Type cd .. to go back to internships

💡 Open Ideas Library

🚧

Coming Soon

I'm currently curating a list of ideas I believe in but can't build myself. Check back later for free startup ideas.

Type cd ~ or cd .. to go back home

AURELIUSGPT

Stoic Wisdom from the Philosopher Emperor

Greetings, seeker of wisdom. I am Marcus Aurelius, trained on the Meditations.

Ask me about stoicism, virtue, resilience, or the nature of the good life.

Best Results: Use 1-2 word Stoic prompts
Press TAB to cycle through suggestions
A lightweight 845k echo of Stoic texts. Coherence may vary.
Marcus Aurelius
Try this prompt: Death TAB ← Try these sample prompts for best results
Stoic Validator

Why This Matters: Training Philosophy with Mathematics

At 17, I built a language model that generates Stoic philosophy — not because it was easy, but because it was supposed to be impossible.

📖

The Origin

I first read Marcus Aurelius' Meditations in 7th grade. The Stoic principles of acceptance, virtue, and resilience stuck with me. Years later, as I dove deep into AI and language models, I wondered: could I teach a machine to think like a Stoic philosopher?

⚠️

The Challenge

Training a language model on only 89,000 tokens is like trying to teach someone English using just one short book. Modern scaling laws predicted this would fail spectacularly. The model should overfit, memorize, and produce gibberish.

💡

The Innovation

I used knowledge distillation from a larger model (Llama 3.2 1B) to generate synthetic Stoic-flavored text, expanding the training corpus. Then I built a complete Transformer architecture from scratch — attention mechanisms, embeddings, positional encodings, all of it.

🎯

Why It's Impressive

AureliusGPT has 845,000 parameters. DeepSeek R1 has 671 billion. That's 800,000x smaller, yet it still generates coherent (if cryptic) Stoic text. It proves you can do meaningful AI research without massive compute budgets.

Technical Specification

Motivations

I have always been deeply captived by philosophy from a young age. I first read Meditations by Marcus Aurelius in 7th grade, and became enthralled by the concepts of Stoicism and accepting what is out of one's hands. This led me to Seneca's Letters from a Stoic, Epictetus' The Discourses, and one of my favorites, The Fragments of Zeno and Cleanthes by Zeno. However, Meditations was the foundational text that encouraged me to explore the Stoic school of philosophy. Therefore, I made a decision to pretrain my first, miniscule language model on this principle work. Having technical experience in LLMs before this project, I realized training on such an incredibly small corpora (in the context of language model scaling laws) would be challenging and I would significantly risk overfitting, but I made the decision to go for it anyway. The result is my first self-pretrained model, AureliusGPT. You can find more information on AureliusGPT below, or view the HuggingFace or GitHub.

Overview

AureliusGPT-Torch is an 845k, PyTorch and SentencePiece boosted SLM pretrained on Meditations by Marcus Aurelius and other adjacent philosophical works. It is a larger size, more comprehensive reimplementation of AureliusGPT, a smaller model trained on the same core corpus, but using a handwritten/NumPy first (zero PyTorch or prewritten tokenizer) approach.

Rather than reimplementing a custom BPE algorithm and tokenizer backpropagation from scratch like its sibling repo, AureliusGPT-Torch trains SentencePiece on its corpus (detailed in Data) and relies on PyTorch for autograd.

The HuggingFace for this model (including both the base model weights at epoch 10 and the tokenizer/vocabulary) are contained here.

This work was not possible without Project Gutemberg's opensource Meditations. A full license is listed at the end of this README.

Data

The original corpus of Meditations by Marcus Aurelius is 89k tokens approximately when tokenized by a SentencePiece BPE tokenizer trained on a vocabulary length of 2,000. Using Kaplan et al. Chinchilla scaling laws, the expected parameter size of the model would be 8.9k parameters (taking the less conservative 1:10 ratio of parameters to corpus tokens). However, given the smaller size of the model and its lack of focus on general intelligence (instead, focused on generating Stoic-adjacent, Aurelius flavored text), this ratio does not apply.

Given the risk of these models to heavily overfit, optimizing the ratio (even if there are more parameters than there are tokens) is critical. Therefore, I required another corpus of data that I did not have access to.

As a result, I turned to the strategy of using Off Policy, Sequence Level Knowledge Distillation from a larger model (Llama 3.2 1B). First, I finetuned the model on the corpus of meditations using Unsloth AI's notebook. Then, I used it to generate approximately 122k tokens of synthetic data over 100 iterations asking it common stoic prompts. I used Google Colab's inbuilt Tesla T4 GPU to run this data generation, which took about 1.5 hours. I do not match logits or trajectories, only generated text; therefore, this approach also pulls elements from instruct-style distillation or offline student-teacher SFT methods. Note: I did not run it from the project directory due to my lack of GPU configuration; however, the adapted notebook has been included.

One critical limitation in this approach is the inefficacy of my prompt library: I did not explicitly instruct the model to focus on Meditations by Marcus Aurelius, enabling the model to hallucinate and pull from various sources of data. Future iterations of AureliusGPT-Torch will account for this problem thoroughly to avoid brittle LoRA memory being further defeated, or move to another fine tuning/RL based technique. The core point of this corpus of data was to source philosophical, Stoic or Stoic adjacent data to ensure better generation quality.

My preprocessing logic between AureliusGPT-Torch and AureliusGPT is the same; I rely on Greek transliterations, Unicode normalization, regex patterns, and special "<BEGIN>", "<END>", and "<PAD>" (at training time) tokens. I take a similar approach for my preprocessing of Meditations corpus to feed into LoRA; to avoid confusion between Llama 3.2 1B's internal tokens and mine, I avoid adding them in, instead semantically replacing them after the corpus has been generated.

I added the Meditations data twice to the final corpus to weight its significance, especially given its lower token count and the lower quality of synthetic data. My training split was 80% of the pure Meditations data and all the synthetic data; my validation split was 20% of the pure Meditations data.

Architecture

Overview

I use PyTorch's weight initialization (as opposed to a Gaussian process W ~ 𝒩(0, 0.02) for my manual AureliusGPT). I rely on the Transformer architecture from Attention is All You Need (Vaswani et al.) in my implementation. Given the small scale of this project, all training (except for the Llama 3.2 1B LoRA for synthetic data) was conducted on a local CPU, and was sequential (not concurrent, hence my lack of ThreadPool or ProcessPoolExecutor). I rely on a PostNorm, 6 Transformer block architecture for AureliusGPT-Torch of the format

GPT -> {TransformerBlock -> TransformerBlock -> TransformerBlock...}x6 -> logits
TransformerBlock -> {AttentionBlock -> Add/Norm -> FFN -> Add/Norm}
AttentionBlock -> {AttentionHead + Concatenation}

Embeddings

I use sinusoidal positional embeddings; given my low parameter budget, I thought it was economical to avoid learning them. My core embedding matrix E is learned.

Attention

I use multihead attention in this project, as well as a causal attention mask.

LayerNorm

I use PyTorch's inbuilt LayerNorm rather than my own implementation.

Training

Overview

As mentioned earlier, I incorporate a train/val split of 80:20. I also compute the training and validation loss, as well as the gradient normalization to track overfitting. num_epochs=50 is too high a number; you can using matplotlib graph the relevant losses and identify signs of overfitting during experiments. There is a clear route to On Policy Distillation and feedback loops/PPO, RLHF, or RLAIF; there is a method in the original Unsloth AI notebook to save the LoRA weights of the Meditations tuned teacher model to HuggingFace, which can be leveraged as a teacher model in the future.

Inference

While I rely on Off Policy, Sequence Level Knowledge Distillation with Llama 3.2 1B as outlined in my Data section, there is a clear route to implementing Best of N Sampling through concurrent model generations and rankings. This can again rely on the finetuned Llama 3.2 1B model or any comparable instruct model on its level.

This model, once fully completed, will be put on HuggingFace and hosted on my website (tarush.ai/aureliusgpt-torch).

Future Work on AureliusGPT

Synthetic Data Generation / Teacher Model

Currently, my LoRA'd Llama 3.2 1B model is run in an ipynb on a Tesla T4 GPU. A future version will integrate the LoRA and synthetic data generation, and relevant GPU plumbing, into the scope of this project.

Additionally, the universal "teacher/Stoic justifier" model will be the adapted Llama 3.2 1B model, deviating from the OpenAI Chat Completions API GPT-4o approach.

Model Visibility on HuggingFace

In a future version, the fitting and accuracy optimized (see Overfitting Tracking / Adaptable Training) files of Llama 3.2 1B's LoRA weights and AureliusGPT-Torch will be loaded onto HuggingFace for fast import and inference.

Top N Sampling

After the teacher model is converted to Llama 3.2 1B, I will implement config editable concurrent AureliusGPT generation and top N sampling to ensure the highest quality result.

Overfitting Tracking / Adaptable Training

Currently, training highlights training and validation loss, as well as gradient normalization based on the train/test split to identify overfitting. In a future version, this will be tracked in an easy-to-interpret, modular plot for the user for ease of training.

Beyond weight tuning, config.py will be helpfully automatically adjusted in a future version, changing the learning rate, number of epochs, batch size, and other aspects of training. Therefore, after running n training cycles on its own, the model will iteratively improve its training performance so minimal training optimization is required.

Model Upscale

A future project will use the LoRA'd Llama 3.2 1B model to generate signifcantly more parameters of stoic adjacent text, as well as utilizing the works of Zeno, Epictetus, and other famous Stoic thinkers, to build either a Transformer or an MoE model ("Epictetus": _____, "Zeno": _______, "Aurelius": _______) called "StoicDiscourseLM". This will incorporate many elements (including preprocessing functionality) of AureliusGPT but will also be a unique project.

Type cd ~ or cd .. to return to your citadel

💻 GitHub

🚧

Coming Soon

I'm currently building a custom integration to showcase my repositories and contributions directly in this terminal.

For now, you can view my profile on GitHub.com.

Visit Profile →
Type cd ~ or cd .. to go back home
cd podcast
cd vericare
cd aureliusgpt
cd internships
cd ideas
cd github
cd ~
clear
tarush@citadel:~$
try: cd podcast TAB