Artificial Intelligence Can Now ‘Reason’ With Images — Because Apparently, Your Printer Still Can’t


Let me paint you a picture. Not a digital one, because OpenAI already beat me to that — and apparently, their new software can reason through images now. Not "recognize," not "describe," not "label with a healthy margin of error," but reason. That’s right, ChatGPT’s new multimodal brainchild has leveled up from “Dora the Explorer” to “Sherlock Holmes with a GPU.” Or so they want us to believe.

Welcome to 2025, where the hottest AI news is that OpenAI dropped something called o3 and o4-mini — and no, that’s not a sequel to “Ocean’s Eleven,” although George Clooney reasoning his way through a diamond heist would arguably be more entertaining. These two new models are allegedly capable of thinking — sorry, reasoning — through both text and images.

Let’s dig into this, because “AI reasoning with images” is a phrase that deserves the same skeptical squint you give to someone who says they’re “between jobs” but spends 10 hours a day in a VR headset.

First, Let’s Talk Names: o3 and o4-mini? Really?

Why do all cutting-edge AI tools sound like IKEA furniture or Apple Watch accessories? Was “Skynet Junior” already taken? We’re told o3 is the real deal and o4-mini is its smaller sibling — because if there’s one thing we know about AI, it’s that giving it a diminutive name totally stops it from becoming sentient and stealing your crypto.

These models apparently “spend a significant amount of time thinking about a question before answering.” In AI circles, that’s what we call “not crashing immediately.” OpenAI wants applause for this, as though your toaster suddenly taking five seconds to decide how brown you like your bread is a technological miracle instead of a minor inconvenience.

But hey — let’s not be haters. Let’s pretend, just for a minute, that the machine thinking is a feature and not just the fan spinning up because it’s trying to read a JPEG.

Yes, It “Reasons With Images.” But What Does That Even Mean?

So, what does “reasoning with images” entail? According to OpenAI, these tools can “manipulate, crop, and transform images in service of the task you want to do.”

Oh wow. So basically, Photoshop but with delusions of grandeur?

OpenAI researchers — who I assume have completely stopped blinking by now — assure us that these systems can look at sketches, posters, diagrams, and graphs and “reason” through them. Because when I look at a pie chart, what I really need is a $20-a-month chatbot to explain it to me like I’m an intern who just woke up from a nap under their standing desk.

Let’s be honest — AI “reasoning” is just a fancy way of saying it can now hallucinate in both words and pictures. You thought it was cute when ChatGPT made up fake sources in a college paper? Wait till it draws you a diagram that defies the laws of physics and insists it’s “98.7% confident.”

The Real Star Here: Codex CLI, aka AI for People Who Already Know How to Code

Now let’s talk about the thing only software engineers will care about until it eventually breaks the entire internet: Codex CLI.

This tool — a sort of AI butler for coders — lets you use these image-text-reasoning models to interact with your local code. Because nothing says “2025 productivity” like turning your bug-ridden backend over to a robot that once thought Abraham Lincoln was alive and living in Portland.

The best part? OpenAI is open-sourcing this tool! Generous, right? Well, sure — if you ignore the fact that open sourcing your tools is like tossing a Rottweiler into a dog park and saying, “Good luck, everyone!”

Programmers are thrilled. Now they can finally automate the part of their job that involved, you know, thinking. This is a tool that helps AIs help you help them write code — and if you just had a small aneurysm reading that, don’t worry, it’s normal. That’s what recursion does to the human brain.

Reasoning: The AI Buzzword of the Year

Let’s pause and talk about the word “reasoning” here. The New York Times insists this system “spends more time thinking before answering.” That’s like saying your cat is “writing a novel” when it stares blankly at the wall for six hours.

AI doesn’t “reason” in the way you do when choosing between pizza or salad. It doesn’t have a moral compass, or a stake in the outcome. It’s not pondering the meaning of life when you ask it about the stock market. It’s pattern-matching at scale and speed. But OpenAI knows that if it called this update “Slightly Less Shitty Guessing 3.0,” no one would click.

So “reasoning” it is — the kind of word that makes venture capitalists wet themselves and tech journalists drool into their keyboards. The reality? It’s still a glorified autocomplete engine — just now with fancier visuals and a better PR team.

Hallucinations, But Make It Visual

Of course, OpenAI couldn’t resist tossing in a reminder that the models “can still get things wrong” and “hallucinate.” Oh good. Because when a chatbot says the Civil War ended in 1982, that’s mildly concerning. But when it draws you a picture of it? That’s pure nightmare fuel.

Let’s be clear: “hallucination” is a euphemism. These systems don’t “hallucinate.” They make stuff up with the confidence of a man on his sixth IPA explaining cryptocurrency at a barbecue. And now, they’ll do it in MS Paint.

A Peek Behind the Curtain: Reinforcement Learning, Trial and Error, and a Lot of GPU Burn

The secret sauce here is reinforcement learning — that magical, AI Hogwarts process where the model fails a lot, learns a bit, and eventually stops suggesting that pi equals four. Apparently, OpenAI has used this process to get their models to “reason through” both text and images.

It’s like training a toddler to stack blocks — except the toddler is made of algorithms and occasionally tells you Julius Caesar invented the iPhone.

The real breakthrough? Getting the model to build on its own steps — like a digital Da Vinci sketching, then adjusting, then drawing again. The problem? Sometimes it builds a ladder to nowhere and confidently invites you to climb.

But Wait, There’s a Subscription Tier for That!

Here’s where we all pretend to be shocked: the new tech is only available to people who pay. Because AI for the people is great — as long as the people have $20 a month for ChatGPT Plus or a cool $200 a month for ChatGPT Pro. Yes, $200. That’s roughly the same cost as pretending to have a social life.

So if you’re a developer who wants to mess around with o3 and o4-mini, hope you’ve got your wallet open and your employer’s credit card handy. For everyone else? Enjoy watching the demos on YouTube while your free-tier chatbot still thinks a turtle is a kind of hat.

Let’s Not Forget: OpenAI Is Still in Hot Water With the New York Times

Oh, the irony.

Right at the bottom of the NYT article comes a spicy footnote: “The New York Times has sued OpenAI and its partner Microsoft for copyright infringement.” Yes, the paper reporting this breathless AI breakthrough is also suing the company behind it. Journalism in 2025 is basically writing puff pieces about the robot that stole your lunch money.

Of course, both OpenAI and Microsoft deny the allegations, because if there's one thing AI companies do better than hallucinate, it's lawyer up.

Final Thoughts: The Future Is Here, and It’s... Still Kind of Dumb

Let’s recap. OpenAI has made a tool that:

  • Can “reason” with images (aka stare at a graph and not immediately explode)

  • Takes longer to answer (so now it feels even more like talking to a philosophy major)

  • Still hallucinates, but now in both text and visuals

  • Has a new command-line tool for coders who enjoy existential dread

  • Costs as much as a nice dinner or an awkward therapy session

  • Is being sued by the newspaper promoting it

Progress!

Honestly, I get it. This is a major technical achievement. The engineers and researchers involved deserve their flowers. But the hype machine surrounding every new AI release is starting to feel like watching someone polish a slightly smarter Roomba and declare it the future of consciousness.

Let’s stop pretending this is reasoning. It’s math and statistics wrapped in a Silicon Valley fever dream. These systems don’t “understand” images — they juggle pixels and probability like caffeinated monkeys in a server farm. And that’s fine! That’s still incredibly powerful. But let’s keep the “digital philosopher-king” talk in check, okay?

Until my AI can look at a picture of a burnt lasagna and feel something about it, I’m keeping my reasoning crown, thank you very much.


TL;DR: OpenAI's o3 and o4-mini are impressive — if you ignore the fact that they "reason" like a goldfish with a college degree in guesswork. They’ll probably transform tech as we know it... or at least make hallucinations prettier.

And if you see ChatGPT trying to explain your own vacation photos back to you like it's writing a graduate thesis, just smile, nod, and unplug it.

Post a Comment

Previous Post Next Post