The video compares the performance of two top AI models - OpenAI o1 and Anthropic Sonnet 3.5 - in creating a 3D game with physics of car parking.
🚗 Sonnet 3.5 failed the task, giving an uncontrollable car, while o1-preview was able to generate the basic functionality of the game. o1 made a controllable car with tire tracks with a 0-shot prompt. Websim made a real game out of this.
⚠️ However, on complicating the task (make a 3d game in the browser), the o1 model also failed - the car did not move, demonstrating that the model has not yet reached the level of a human developer.
🤖 Overall, the video shows that o1 is a more powerful model than Sonnet 3.5, but still has limitations and needs further improvement.
Cool case - start the code with an expensive model, then finish with cheaper ones (from o1 to websim)