GPT-4 Arrives and the AI Race Accelerates

OpenAI released GPT-4 on March 14th, and the capability jump is staggering. The model can reason about images — analysing charts, diagrams, and photographs. It scores in the 90th percentile on the bar exam (GPT-3.5 scored in the 10th). It passes the AP Biology, AP Chemistry, and AP Calculus exams. And in my own testing over the past 48 hours, it handles complex multi-step reasoning tasks that GPT-3.5 could not even attempt.

I have been using OpenAI's models for two years now. Each iteration has been meaningfully better than the last. But GPT-4 is not an incremental improvement. It is a step change — the difference between a tool that is occasionally useful and a tool that is consistently reliable for serious professional work.

What Changed

The improvements are not just in raw capability but in reliability. GPT-3.5 was impressive but inconsistent — it would produce brilliant analysis one moment and confident nonsense the next. GPT-4 is dramatically more consistent. It follows complex instructions more faithfully. It hallucinates less frequently. And it handles nuance — qualifications, caveats, uncertainty — in a way that makes its outputs genuinely useful for professional decision-making.

The multimodal capability — the ability to process images alongside text — opens entirely new use cases. Analysing financial charts. Reading and interpreting legal documents with complex formatting. Processing screenshots of user interfaces. Evaluating architectural diagrams. The model can now engage with the visual information that constitutes a significant portion of professional work.

The Competitive Landscape

GPT-4's launch has intensified the AI race. Google released Bard (based on its PaLM model) and is integrating AI into every product. Anthropic launched Claude, positioning it as a safer, more controllable alternative. Meta released LLaMA, an open-source model that enables researchers and companies to build on top of frontier capabilities. And a wave of startups — Cohere, AI21, Mistral — are building specialised models for enterprise use cases.

The competition is healthy. It is driving rapid improvement, reducing costs, and expanding access. But it is also creating pressure to move fast — to release capabilities before their implications are fully understood, to prioritise market position over safety research, and to deploy models in high-stakes domains before the reliability is sufficient.

The Implications I Am Watching

Three implications are particularly relevant to the themes I write about.

AI and financial services. GPT-4's reliability makes it suitable for professional financial work — analysis, compliance, due diligence, and client communication. The firms that integrate these capabilities will be dramatically more productive. The regulatory implications — around liability, disclosure, and the use of AI in regulated activities — are still being worked out.

AI and crypto. The intersection is becoming more concrete. AI-powered smart contract auditing tools are improving rapidly. AI agents that can interact with DeFi protocols are being prototyped. And the question of how to pay AI agents for services — which requires programmable, permissionless payment rails — is becoming practically relevant.

AI regulation. The EU is developing the AI Act. The US is debating executive orders and legislative frameworks. China has already implemented AI regulations. The regulatory landscape for AI is forming in real time, and it will shape how the technology is developed and deployed for decades.

My View

GPT-4 confirms what I have believed since the private beta: large language models are the most transformative technology since the internet. The capability curve is steeper than I expected. The competitive dynamics are more intense. And the implications — for finance, for crypto, for every knowledge-intensive profession — are more immediate than most people realise.

GPT-4 is not the destination. It is a waypoint on a capability curve that is steeper than anyone predicted. The professionals and organisations that are building with these tools today will define the next decade. The ones that are waiting will be defined by it.