Agents lack creativity
January 6, 2025
There’s a design pattern you must work with when building agentic systems - or any system involving language models: work around creativity.
For the foreseeable future, LLMs are not creative in a couple of ways. They reason, they process, and they can make decisions. However, they are not creative in the artistic sense and struggle to generate genuinely new ideas.
Here’s an example: Ask an LLM to write you a joke. It probably isn’t funny, and reads like something out of a joke book from the 70s. Or, ask it to help you with names for your business. you’ll get a Frankenstein mashup of what your business does crammed into one unwieldy URL ("DarkCoffee4you.com," for instance, which sounds like the password to someone’s cringe LiveJournal account). Consistently, language models are not great at generating new information but should be thought of as processors.
There are, however, some types of information generation language models excel at. Convergent thinking—"come up with a way to solve this problem"—is something they handle incredibly well. Divergent thinking—"give me 25 new ideas to solve this problem"—can also work but usually requires human input and active prompting. For instance, consider content writing for a coffee blog, which is crucial in the coffee space. LLMs are great at crafting the actual blog posts, but they struggle to generate article ideas unless painstakingly nudged toward specific niche topics.
What is creativity?
This is an ongoing argument. But, pragmatically for our purpose, creativity is coming up with new interesting ideas and information. When laying out our prototype business, we’ve categorized four types of useful creativity:
- Explorative creativity: New ideas that further the business and expand its scope
- Solution creativity: New methods to solve known problems
- Textual creativity: New writing and anything related to text generation
- Visual and artistic creativity: New images, designs, videos, layouts, audio
Currently, language models (and agents by extension) lack meaningful explorative creativity. They struggle to expand ideas creatively like humans, often regurgitating or combining existing concepts despite explicit prompting. For instance, in brainstorming sessions, LLMs may generate ideas that are repetitive or lack depth, making them less useful in scenarios requiring genuine novelty. One prominent issue is their tendency to pinhole into specific odd patterns when moving towards new concepts, often fixating on surface-level combinations without deeper exploration. This makes their outputs predictable and limits their utility in contexts that demand real innovation. This limitation has significant implications for industries such as marketing and product development, where creativity is critical. Marketing often relies on bold, innovative campaign strategies, while product development thrives on unique solutions that help offerings stand out in competitive markets. Without significant human guidance, LLM outputs remain constrained by the data and context provided. Additionally, censorship and tuning further narrow their creative range. For instance, asking for viral strategies on platforms like TikTok typically yields safe, generic suggestions that fail to capture the edginess or originality required to succeed in such spaces.
Solution creativity is something language models excel at. They’re really good at solving problems with known constraints. Coding is a great example of this and an area where LLMs are improving at an astonishing rate. Tool use is similar. They can pick from a (relatively) large set of tools, including other agents, to solve a problem. Customer support fits in this category too.
Textual creativity is one of the main things language models are known for. They’re really good at generating a book to read about a subject, an article about some topic related to a business, or faking that they’re a gen-z human over text. Often the writing isn’t amazing, but humans also write poorly most of the time. I’d argue that textual creativity is where LLMs are closest to human performance in all of these categories.
Visual creativity
Visual creativity, though, is something we should focus on specifically because it is so relevant to a business. Nearly all product businesses have a strong visual element: software companies need software design, ecommerce companies need product and packaging, total dropshipping companies need to reason over ads.
Visual creativity doesn’t work yet, but soon it might. At the moment, LLMs cannot reason over art in any accurate or practical way. The issue isn’t art itself, which can be done well with tech like midjourney, but in the reasoning over the art for improvement. A common pipeline for combining a language model with a visual space involves making visual changes, inputting an image into a multi-modal LLM along with text, and then adjusting the language (text or code) describing the art. While this process highlights the limitations of current AI in understanding artistic nuance, it’s worth acknowledging the significant time savings AI offers in areas like photo editing, advertisement creation, and content generation.
These tools have enabled creators to produce high-quality visual materials at a fraction of the cost and time, democratizing access to professional-grade resources. For instance, generative AI now powers the creation of fake ads, rapid prototyping of visual content, and efficient editing workflows that were unthinkable just two years ago. The acceleration in these domains has transformed the way businesses and individuals approach creative projects, making previously resource-intensive tasks more accessible and scalable. AI tools have transformed creative workflows, enabling tasks that once required hours to be completed in moments, revolutionizing accessibility for everyday creators. However, this usually only works in an augmented approach - and I have yet to see a fully automated visual pipeline that isn't immediately obvious or terrible.
Consider a wireframe in Figma. Current approaches involve converting the designs to code, making a generation or improvement, then iterating with image input. While this process can produce something functional, it often lacks a coherent concept of user experience, proper flow between screens, or meaningful interactions. The designs may superficially adhere to best practices but fail to capture the "life" that makes UX intuitive and engaging.
LLMs require significant context and data to create even basic, usable designs. Most outputs rely heavily on cookie-cutter layouts and templates (i.e. shadcn), offering little in terms of innovation. They lack the creativity needed to develop better design frameworks or understand how UX can evolve—tasks that still require human intuition and discovery. This limitation extends to more complex visual creativity in business, like packaging design, presentations, websites, logos, or animations, which remain unfeasible for AI-driven tools without human oversight and direction.
LLMs also possess terrible spatial models, which severely nerfs the ability to do any real world design. Going back to our Figma example, even when writing code to build a wireframe, the models don’t fully work yet to reason across spatial changes. If moving a button to the left means it overlaps with the navigation and look like a visual error, the LLM doesn’t understand that from the code it wrote.
Video reasoning is in its first generation - with demos from openai and google working for narrow use cases but being clunky and not yet usable for anything complex. One of the most common use cases in a business for video gen would be tiktok/reels/short form video for marketing. Even with Sora and competitors, nothing is remotely close enough for a real business use case. I suspect within the next 12 months this changes, but the tech is just not there yet.
After looking at these different modes, one of the principals we’re working from is to offload certain types of creativity from agents. Specifically, explorative creativity and visual creativity. How does this affect a agentic business? Clever Human-In-The-Loop.
Agentic Business Strategy for Offloading Creativity
Offloading creativity follows a similar pattern for all high-level decision making, it just adds a human element to low level decision making. A human CEO gets decision questions from agents, makes decisions, then agents implement those decisions - exactly like a classical human organization. Agents handle all of the details.
For visual creativity, the only difference is low level work (like design of some specific screen) is handled by a human. Agents and human designers work together on the low level and the high level. An agent can take a directive plus some memory and make design specs. An agent can then hire a designer and pass them specs. A human supervisor then reviews the designs upon submission and provides feedback to the designer either directly or through the agent. Design generation is done by a human and design review is done by a human, while facilitation is completed by the agent.
Why add the seemingly unnecessary step of putting the agent in the middle? Because the ideal system is for the agent to handle the details while the human supervisor - our agentic business’s ‘CEO’ - makes decisions on outcomes.
This leads to a bottleneck on the part of a human supervisor: A human can only make so many decisions per day. To make an ideal system, we must minimize the number of decisions by offloading everything we can to agents. Over time, the number of things agents can decision on increases, and a generally intelligent business represents one that can be successful with zero human oversight.
Until then, creativity remains a bottleneck for agents. The solution isn’t to replace humans, but to offload their low-level tasks so they can focus on high-impact decisions. Agents are still processors; humans bring the spark.