Tarush Gupta

17. student. founder. builder.
working on the only citadel he can control.

whoami

nameTarush Gupta

rolestudent · founder · builder

founderVeriCare AI

researchCisco · Turing · Dyssonance AI

focuslanguage models from first principles

Explore

VeriCare AI

AI patient-advocacy engine. Acquired by Harness Care.

Internships

Cisco, Turing, HumanX, Ema, and more.

Research

Training, evaluation, memory systems, small models.

GitHub

Code, experiments, and open-source work.

Blog

Notes on patient advocacy and building models from scratch.

Ideas Library

Free startup ideas for builders. Take one and run.

AureliusGPT

A Stoic language model, pretrained from scratch.

Neural Bridge

Bridging generations through AI conversations.

ls ~/podcast

Neural Bridge Podcast

Bridging Generations Through AI

Neural Bridge is a channel dedicated to connecting younger generations to the current professionals and vice versa, to foster fascinating discussions on AI and its transformative impact.

Join Tarush Gupta, AI enthusiast, Gen Z teen, and thought leader, as he demystifies the complex perspectives of established professionals into accessible language for younger audiences while providing his own fresh, forward-thinking insights to help industry vets change in a capricious business and AI landscape.

Subscribe to Neural Bridge — let's Bridge the Gap together.

YouTube Contact Neural Bridge →

cat ~/vericare

VeriCare AI

Patient Advocacy, Augmented with AI

80% of US medical bills contain errors.

And, for bills over $10,000, the average error inflates the total by 13%.

This isn't a mistake. This is systemic. And it costs US patients millions of dollars per year.

VeriCare AI is the first fully AI patient advocacy engine, built for both patient advocacy firms and patients: giving them the power to dispute, negotiate, and reduce their bills with the click of a button.

Visit VeriCare LinkedIn Medium Contact

ls ~/internships

2026

Cisco·AI Research Scientist

Building the bridge to possible.

2026

Dyssonance AI·AI Research

Building the next generation of AI memory systems.

2025

Turing·AI Training & Evaluation

Built datasourcing and post-training workflows for a frontier lab's SOTA model; reported to the CEO.

2024

HumanX·Engineering Intern

Competitive strategy and speaker logistics for a venture-backed, cross-vertical AI summit.

2024

Ema Unlimited·Software Engineering

Analyzed enterprise verticals and pain points for an AI “employee” that turns workflows into chat.

2023

Proshort·Engineering Intern

Competitive strategy and feature roadmap for a short-form video platform; the CEO is executing it.

ls ~/ideas

ideas: no public entries yet
A curated set of ideas I believe in but can't build myself — free startup ideas for builders to take and run. Not published yet; check back soon.
Have one you want to build? mail tarush

./aureliusgpt

AURELIUSGPT

Stoic Wisdom from the Philosopher Emperor

Greetings, seeker of wisdom. I am Marcus Aurelius, trained on the Meditations.

Ask me about stoicism, virtue, resilience, or the nature of the good life.

M·A

MARCVS·AVRELIVS

imperator · philosophus · 121–180

The soul becomes dyed with the color of its thoughts.

try a one-word prompt — Virtue TAB

Stoic Validator

Why this matters

At 17, I pretrained a language model that generates Stoic philosophy — on a corpus so small (~89K tokens) that scaling laws predicted only gibberish. AureliusGPT has 845,000 parameters; DeepSeek R1 has 671 billion, roughly 800,000× larger. It still writes coherent, if cryptic, Stoic text — which is the point: meaningful model work doesn't require a datacenter.

The educational groundwork was laid in the sibling repo, AureliusGPT — BPE tokenization, manual backpropagation, and the full Transformer built from scratch in NumPy, no PyTorch and no pre-built tokenizer. The full specification and methodology live in the technical writeup.

Read the full technical writeup

ls ~/blog

VeriCare AI · Medium ↗

The Comprehensive Guide to Requesting an Itemized Bill

How to request, read, and dispute an itemized medical bill — the first move in cutting an inflated balance.

Technical writeup

AureliusGPT: Building a Stoic Language Model from Scratch

An 845K-parameter Transformer pretrained on Meditations — architecture, data, distillation, and training in full.

Technical writeup

AureliusGPT: Building a Stoic Language Model from Scratch

Technical Specification

params845K

corpus~89K tokens (Meditations) + ~122K synthetic

archPostNorm · 6 Transformer blocks · multihead attention

tokenizerSentencePiece BPE · vocab 2,000

embeddingssinusoidal positional · learned token matrix

trainingPyTorch · local CPU · 80/20 split

distilloff-policy, sequence-level (Llama 3.2 1B)

GitHub — AureliusGPT-Torch HuggingFace — Tarush-AI/AureliusGPT

Motivations

I have always been deeply captived by philosophy from a young age. I first read Meditations by Marcus Aurelius in 7th grade, and became enthralled by the concepts of Stoicism and accepting what is out of one's hands. This led me to Seneca's Letters from a Stoic, Epictetus' The Discourses, and one of my favorites, The Fragments of Zeno and Cleanthes by Zeno. However, Meditations was the foundational text that encouraged me to explore the Stoic school of philosophy. Therefore, I made a decision to pretrain my first, miniscule language model on this principle work. Having technical experience in LLMs before this project, I realized training on such an incredibly small corpora (in the context of language model scaling laws) would be challenging and I would significantly risk overfitting, but I made the decision to go for it anyway. The result is my first self-pretrained model, AureliusGPT. You can find more information on AureliusGPT below, or view the HuggingFace or GitHub.

Overview

AureliusGPT-Torch is an 845k, PyTorch and SentencePiece boosted SLM pretrained on Meditations by Marcus Aurelius and other adjacent philosophical works. It is a larger size, more comprehensive reimplementation of AureliusGPT, a smaller model trained on the same core corpus, but using a handwritten/NumPy first (zero PyTorch or prewritten tokenizer) approach.

Rather than reimplementing a custom BPE algorithm and tokenizer backpropagation from scratch like its sibling repo, AureliusGPT-Torch trains SentencePiece on its corpus (detailed in Data) and relies on PyTorch for autograd.

The HuggingFace for this model (including both the base model weights at epoch 10 and the tokenizer/vocabulary) are contained here.

This work was not possible without Project Gutemberg's opensource Meditations. A full license is listed at the end of this README.

Data

The original corpus of Meditations by Marcus Aurelius is 89k tokens approximately when tokenized by a SentencePiece BPE tokenizer trained on a vocabulary length of 2,000. Using Kaplan et al. Chinchilla scaling laws, the expected parameter size of the model would be 8.9k parameters (taking the less conservative 1:10 ratio of parameters to corpus tokens). However, given the smaller size of the model and its lack of focus on general intelligence (instead, focused on generating Stoic-adjacent, Aurelius flavored text), this ratio does not apply.

Given the risk of these models to heavily overfit, optimizing the ratio (even if there are more parameters than there are tokens) is critical. Therefore, I required another corpus of data that I did not have access to.

As a result, I turned to the strategy of using Off Policy, Sequence Level Knowledge Distillation from a larger model (Llama 3.2 1B). First, I finetuned the model on the corpus of meditations using Unsloth AI's notebook. Then, I used it to generate approximately 122k tokens of synthetic data over 100 iterations asking it common stoic prompts. I used Google Colab's inbuilt Tesla T4 GPU to run this data generation, which took about 1.5 hours. I do not match logits or trajectories, only generated text; therefore, this approach also pulls elements from instruct-style distillation or offline student-teacher SFT methods. Note: I did not run it from the project directory due to my lack of GPU configuration; however, the adapted notebook has been included.

One critical limitation in this approach is the inefficacy of my prompt library: I did not explicitly instruct the model to focus on Meditations by Marcus Aurelius, enabling the model to hallucinate and pull from various sources of data. Future iterations of AureliusGPT-Torch will account for this problem thoroughly to avoid brittle LoRA memory being further defeated, or move to another fine tuning/RL based technique. The core point of this corpus of data was to source philosophical, Stoic or Stoic adjacent data to ensure better generation quality.

My preprocessing logic between AureliusGPT-Torch and AureliusGPT is the same; I rely on Greek transliterations, Unicode normalization, regex patterns, and special "<BEGIN>", "<END>", and "<PAD>" (at training time) tokens. I take a similar approach for my preprocessing of Meditations corpus to feed into LoRA; to avoid confusion between Llama 3.2 1B's internal tokens and mine, I avoid adding them in, instead semantically replacing them after the corpus has been generated.

I added the Meditations data twice to the final corpus to weight its significance, especially given its lower token count and the lower quality of synthetic data. My training split was 80% of the pure Meditations data and all the synthetic data; my validation split was 20% of the pure Meditations data.

Architecture

Overview

I use PyTorch's weight initialization (as opposed to a Gaussian process W ~ 𝒩(0, 0.02) for my manual AureliusGPT). I rely on the Transformer architecture from Attention is All You Need (Vaswani et al.) in my implementation. Given the small scale of this project, all training (except for the Llama 3.2 1B LoRA for synthetic data) was conducted on a local CPU, and was sequential (not concurrent, hence my lack of ThreadPool or ProcessPoolExecutor). I rely on a PostNorm, 6 Transformer block architecture for AureliusGPT-Torch of the format

                            GPT -> {TransformerBlock -> TransformerBlock -> TransformerBlock...}x6 -> logits

                            TransformerBlock -> {AttentionBlock -> Add/Norm -> FFN -> Add/Norm}

                            AttentionBlock -> {AttentionHead + Concatenation}

Embeddings

I use sinusoidal positional embeddings; given my low parameter budget, I thought it was economical to avoid learning them. My core embedding matrix E is learned.

Attention

I use multihead attention in this project, as well as a causal attention mask.

LayerNorm

I use PyTorch's inbuilt LayerNorm rather than my own implementation.

Training

Overview

As mentioned earlier, I incorporate a train/val split of 80:20. I also compute the training and validation loss, as well as the gradient normalization to track overfitting. num_epochs=50 is too high a number; you can using matplotlib graph the relevant losses and identify signs of overfitting during experiments. There is a clear route to On Policy Distillation and feedback loops/PPO, RLHF, or RLAIF; there is a method in the original Unsloth AI notebook to save the LoRA weights of the Meditations tuned teacher model to HuggingFace, which can be leveraged as a teacher model in the future.

Inference

While I rely on Off Policy, Sequence Level Knowledge Distillation with Llama 3.2 1B as outlined in my Data section, there is a clear route to implementing Best of N Sampling through concurrent model generations and rankings. This can again rely on the finetuned Llama 3.2 1B model or any comparable instruct model on its level.

This model, once fully completed, will be put on HuggingFace and hosted on my website (tarush.ai/aureliusgpt-torch).

Future Work

Planned iterations fold the LoRA'd Llama 3.2 1B teacher (and its GPU plumbing) into the project directly, replacing the GPT-4o justifier; publish the LoRA weights and optimized checkpoints to HuggingFace for fast inference; add Best-of-N / top-N sampling via concurrent generations; and surface overfitting metrics (train/val loss, gradient norm) alongside a self-adjusting config.py in a modular training plot. Further out: StoicDiscourseLM, a larger Transformer or MoE trained on Zeno, Epictetus, and Aurelius, reusing much of this preprocessing pipeline.

gh dashboard

fetching from github…

ls ~/research

2026

Cisco·AI Research Scientist

Building the bridge to possible.

2026

Dyssonance AI·AI Research

Building the next generation of AI memory systems.

2025

Turing·AI Training & Evaluation

Datasourcing and post-training workflows for a frontier lab's SOTA model.

Selected work

coming soon — a deeper research repo is in progress.

welcome.

Tarush Gupta

Explore

VeriCare AI

Internships

Research

GitHub

Blog

Ideas Library

AureliusGPT

Neural Bridge

Neural Bridge Podcast

Episode Title

Key Takeaways & Notes

VeriCare AI

Internships

Open Ideas Library

AURELIUSGPT

Why this matters

Blog

The Comprehensive Guide to Requesting an Itemized Bill

AureliusGPT: Building a Stoic Language Model from Scratch

AureliusGPT: Building a Stoic Language Model from Scratch

Technical Specification

Motivations

Overview

Data

Architecture

Overview

Embeddings

Attention

LayerNorm

Training

Overview

Inference

Future Work

GitHub

Research

Selected work