OpenAI o1 and the Reasoning Breakthrough

OpenAI released o1 — a model that represents a fundamental shift in how AI systems approach complex problems. Unlike previous models that generate responses token by token in a single forward pass, o1 uses chain-of-thought reasoning at inference time — spending additional compute to "think" through problems before producing an answer.

The results are dramatic. On competition-level mathematics, o1 scores in the 89th percentile — up from GPT-4's 13th percentile. On PhD-level science questions, it outperforms human experts. On complex coding challenges, it solves problems that previous models could not even attempt. The model does not just know more. It reasons better.

What Changed

The key innovation is not in the model's training but in its inference process. Previous models generate responses in a single pass — reading the prompt and producing output sequentially, with no opportunity to reconsider, backtrack, or explore alternative approaches. o1 generates an internal chain of thought — a sequence of reasoning steps that the model works through before producing its final answer.

This is analogous to the difference between a student who writes the first answer that comes to mind and a student who works through the problem on scratch paper before writing their final answer. The additional reasoning time produces dramatically better results on problems that require multi-step logic, mathematical reasoning, or careful analysis of complex scenarios.

The Implications for Finance

The implications for financial analysis and decision-making are significant. Financial problems are often multi-step reasoning problems — evaluating a company requires analysing financial statements, assessing competitive dynamics, modelling future scenarios, and weighing risks. Previous AI models could assist with individual steps but struggled with the end-to-end reasoning chain. o1's improved reasoning capability makes it meaningfully more useful for complex financial analysis.

I have been testing o1 on regulatory analysis — a domain that requires reading complex legal text, identifying relevant provisions, reasoning about their application to specific scenarios, and synthesising conclusions across multiple jurisdictions. The improvement over GPT-4 is substantial. The model catches nuances that previous models missed, identifies edge cases that I had not considered, and produces analyses that require less human correction.

The Broader Trajectory

o1 represents a new scaling axis for AI capability. Previous improvements came primarily from training larger models on more data. o1 demonstrates that capability can also be improved by spending more compute at inference time — allowing the model to reason more carefully about each problem. This opens a new dimension of improvement that can be combined with traditional scaling.

The implication is that the AI capability curve — which was already steeper than most people expected — may accelerate further. Models that can reason at inference time can be made more capable simply by allowing them more time to think. The tradeoff is cost and latency — o1 is slower and more expensive than GPT-4. But for high-value tasks where accuracy matters more than speed, the tradeoff is clearly worthwhile.

My View

o1 is the most important AI development since GPT-4. Not because it is a better chatbot — it is not optimised for casual conversation. But because it demonstrates that AI systems can reason about complex problems in ways that were previously the exclusive domain of human experts. The gap between AI capability and human expertise is narrowing faster than anyone predicted — and o1 suggests that the narrowing will accelerate.

The shift from pattern matching to reasoning is the most important transition in AI since the transformer architecture. o1 is the first model that genuinely thinks about problems rather than just predicting the next token. The implications for every field that requires complex reasoning — including finance, law, and science — are profound.