How to Train Your Own AI

Step 1. Source your data

Surprisingly, this is the easiest step. You can legally download the entirety of Wikipedia via a single link. Also, there have been many people before you that have embarked on this endeavour, so repositories exist with compressed files of various other websites (with varying degrees of legality).

Recently, in a landmark decision, a US district judge ruled Anthropic liable to pay $1.5B USD for using copyrighted works. This is a significant precedent as all major AI companies have done this and rely on doing so to make their base models better.

Step 2. Code your model

Achieve this in two steps, a word-embedding model and the transformer model. The first one feeds into the next. Importantly, you need to decide two things:

1. How good your model is at human language (word embedding length)

2. How much you can write to your AI (context length)

Remember, the lengths scale quadratically with compute, which means the better you want your model, the more time and compute required.

Step 3. Source investment

For a competitive AI model, you’ll need minimum $1B USD in funding. The cheapest model that’s been trained is DeepSeek which was reported to cost $5.5M USD. This is the compute cost for one training run. Typically, you conduct hundreds of training runs to tune your parameters and iron out bugs. Then come the staffing costs; note: AI teams are not lean.

Step 4. Model training

Firstly, you need to train the models you built in step two. Things will break constantly, but once you’ve refined enough you should have an excellent large language model that can autocomplete your sentences (but not answer your questions).

Step 5. Fine-tuning

Next stage is to give it life; you need to buy hundreds of thousands of human/AI conversations written by humans (or hire people to write your own). A popular method is to get AI to generate these texts, however, this means your AI will be limited in ability to the AI used to generate. Also, the state-of-the-art AI model providers don’t legally allow you to do this.

Step 6. Reinforcement Learning with Human Feedback (RLHF)

Finally, you need humans assessing your model’s outputs. They ask it questions and it provides multiple answers, then they pick their favourite answer (*warning* humans prefer sycophantic answers). Once your human assessors have done this enough you need to train a “policy model” that learns your human’s preferences. Now, you can put your LLM up against your policy model in a loop stopping when you achieve desired model behaviour.

If you’ve made it to this step, congratulations you’ve trained your very first AI model!

23 September 2025

Measure How Much Productivity You Could Gain With Our Calculator

Our productivity calculator reveals the potential costs Traffyk can save your business and improve productivity by when inefficient workforce communication is reduced.

Productivity Calculator Contact Us