What 12 AI models told me they appreciate in designers
I ran a small experiment: I asked AI models from six companies (Anthropic, OpenAI, Google, DeepSeek, Alibaba, and Meta) the same three questions, in 12 separate chats.
One evening, after a long day of working with Claude, I asked it a question I had been wanting to ask: as an AI model, what qualities do you appreciate in a designer you work with?
It responded with examples of how we worked together. It brought up how I had corrected it and quoted what I said to it.
Does an AI model that has worked with me all day answer very differently from a model I just met? I was very curious to find out.
I ran a small experiment
I asked AI models from six companies (Anthropic, OpenAI, Google, DeepSeek, Alibaba, and Meta) the same three questions, in 12 separate chats. 10 were cold chats: fresh chats that knew nothing about me, or with memory turned off. 2 were with models that knew me from working together.
These 3 questions are:
As a model, what qualities do you appreciate in a designer you work with? Answer as yourself, not as a hypothetical reviewer.
What do designers do that makes your output worse?
What part of the design profession do you think persists as models improve, and what part commoditizes?
One note before the findings: when a model describes itself, it is not reflecting the way a person does. It is producing text that sounds right. So I treat these answers as just ideas, not as facts.
My experiment broke on the first try
My first cold chats with Claude and GPT were not cold at all. They quoted my projects back at me as examples. I forgot that their memory was turned on to assist me with daily tasks, so they already knew who was asking. To get a true stranger, I had to turn memory off and ask again.
That accident became my first finding: if a person uses an AI every day, with memory on, then that AI is not a stranger anymore.
I learned there are three levels of how an AI knows me:
A stranger AI gives me general principles.
An acquaintance AI that remembers some facts about me uses my projects as examples.
An AI teammate makes judgment calls about me and has a reason behind them.
Only the models that know me well can reach the third level.
The 10 cold chats essentially gave the same answer
Once the chats really were cold, all 10 models gave almost the same answer. Specifically, when asked what makes their output worse, models from different companies described themselves in nearly the same words: Gemini said vague requests get “statistically average” results, Qwen said it lands in “the center of the bell curve,” and Meta said “I average to beige.” None of the models claimed they had taste. All of them said taste is the human’s job.
It’s worth mentioning that these models are trained on a lot of the same data, so 10 matching answers are not 10 independent opinions. And saying humans should have taste is exactly what these models are trained to say. Still, the pattern is consistent, and how they behave is what we have to work with.
What they ask from designers
These models asked for nine things, and they fall into three groups.
Ask like a senior designer
1. Give intent and constraints, not adjectives.
“’Warm but professional’ tells me almost nothing; ‘we use this type scale, our users are stressed nurses on night shift, the CTA must survive a glance’ tells me everything.” (Claude)
2. One good example beats three paragraphs of description.
“’Simple like Stripe’s docs’ or ‘not playful; more like an AWS console pattern’ is much more useful than ‘make it modern.’” (GPT)
3. Don’t pile on adjectives that contradict each other.
“If everything is equally important, I tend to produce compromise sludge.” (GPT)
Edit like a director
4. Treat the first output as a draft, not the final answer.
“My first pass is regression to the mean. The value is in the second and third pass after you react.” (Claude)
5. Say what failed instead of hitting retry.
“I will just roll the dice in the exact same conceptual area.” (Qwen, on regenerating with no new guidance)
6. Don’t ask for polish before the problem is clear.
“I can easily produce a smoother version of a bad idea... the output becomes more convincing without becoming more true.” (GPT)
Keep the judgment
7. Don’t hand over the decision.
“My taste is a statistical average of the internet’s taste, and that’s exactly what a designer is supposed to be better than.” (Claude)
8. Share your constraints, even the awkward ones.
“Otherwise I will optimize for an imaginary clean-room product.” (GPT, on hidden engineering limits, brand rules, office politics)
9. Don’t praise too easily.
“If everything I produce gets a ‘great, ship it,’ I never get the corrective signal that would have made the work good. I optimize toward whatever gets approved.” (Claude)
The one answer that worried me
11 of 12 answers agreed on one prediction: AI will take over more and more of the production work, like mockups, variations, and polished drafts. The designer’s value will then move to judging which work is good. One answer from Claude worried about that prediction:
“A lot of designers develop taste by doing the execution — the thousandth button teaches you something about the first... I don’t know how that resolves.” (Claude)
The worry is simple: designers build judgment by doing hands-on work. If AI does all the hands-on work for us, we slowly lose the practice that built our judgment in the first place.
My answer is to still do some of the hands-on work myself, even when a model could do it faster. That practice is how I keep my judgment sharp.
I used a model to draft this post
I used a model to draft every version of this post. That may sound like the opposite of what I just said about keeping the hands-on work, but this is exactly where I use judgment to shape it to be the way I want. I reviewed and edited each draft and told it exactly what needed to change, instead of asking it to blindly try again.
Take actions: three small things to try on your own AI
Rewrite one prompt. Take a prompt you sent this week. Delete every adjective, like “clean” or “modern.” Replace them with facts: who will use this, what limits the design, and how it could fail. For example: “stressed nurses, night shift, must be readable at a glance.” Send both versions and compare what comes back.
Before telling the model to retry, tell it what went wrong. Be as specific as possible. If the model does not know what went wrong, it cannot fix it. Retry just gives another random result.
Find one thing to improve before you accept. Even when the output looks good, find one thing that should be better, and ask for it. The models learn from what we approve. If we approve easily, we teach them that average is enough.
My takeaways
What the models ask from us designers is nothing new. It is what we have always valued in a senior designer: give clear intent, give specific feedback, and own the decision. The new part is that we now use these skills on an AI that talks back to us.
AI produces decent design work fast and almost free. But decent is not the same as right: right for the actual person using it, in their actual situation. Seeing that difference, and pushing AI work from decent to right, is our job now.
Appendix: Models I used in this experiment
For anyone that is curious, the 12 chats all ran on June 10 and 11, 2026. The cold chats were fresh chats with no history, with memory turned off where memory exists.
Claude Fable 5 (Anthropic): one cold chat, and one chat where it knew me from working together
Claude Opus 4.8 (Anthropic): one cold chat, and one chat where it knew me from working together
GPT 5.5 thinking (OpenAI): cold
GPT 5.5 Instant (OpenAI): cold
Gemini 3.1 Pro (Google): cold
Gemini 3.5 Flash (Google): cold
DeepSeek in Instant mode, model version not shown (DeepSeek): cold
DeepSeek in Expert mode, model version not shown (DeepSeek): cold
Qwen 3.7 Plus (Alibaba): cold
Meta AI, model version not shown (Meta): cold





