ai does not imagine, it remembers conventional truth

i once spent far too much time trying to make one simple thing: a photo of me playing guitar left-handed.

right hand on the fretboard. left hand near the bridge. reversed guitar. adjusted body position. the prompt was already detailed enough to make any sane person close the laptop.

the result?

of course, Gemini still made me play like a right-handed guitarist.

and no, this was not one of those cases where the user writes a lazy prompt and then blames the machine.

i tried everything.

image-to-image. text-to-image. simple prompts. aggressive prompts. absolute prompts. structured prompts. even JSON prompts, because apparently at some point i decided that if natural language failed, maybe bureaucracy would save me.

very touching, really. me, formatting the location of my own left hand like a government document for a machine that still wanted to make me normal.

i wrote the pose as instruction. i wrote it as prohibition. i wrote it as spatial logic. i wrote it as a hierarchy of constraints.

right hand on the fretboard. left hand near the bridge. reversed guitar. left-handed playing position. do not convert it into a right-handed pose. do not correct the guitar. do not follow the conventional guitarist posture.

Nano Banana listened with the calm face of a very expensive idiot.

then it made me right-handed again.

after enough attempts, the conclusion became difficult to avoid. this was not a lack of prompt engineering. this was not a missing adjective. this was not because i forgot to say “please” to the machine god. it was not even because the prompt lacked structure.

it simply could not hold that structure reliably.

my instruction was specific.

but the conventional truth was stronger.

apparently, the machine had already decided what a guitarist should look like. i was only providing exceptions it did not respect.

✦ ✦ ✦

i did not get angry immediately.

i only stared at the image for a few seconds longer than necessary and thought: interesting. for a machine that supposedly understands natural language, it is very religious about convention.

not because it is completely stupid. the opposite, actually. it is too good at remembering the habits of the world. most guitar photos on the internet follow the same orientation. so when my body arrives with a different logic, the machine does not treat it as another valid possibility. it treats it as something that needs to be corrected.

the technical term is dataset bias.

a very polite name for the habit of the world forcing everything back into its most familiar shape.

and maybe that is why ai became interesting to me. not because it is magical, but because it is never clean from the world that created it.

✦ ✦ ✦

i still remember when ai art felt like a wild territory.

for me, it began around the Stable Diffusion 1.5 era. or more precisely, from the chaos that grew around it. NovelAI became one of the important names in that phase, not only because its anime results were impressive, but because it showed something that felt almost impossible at the time: an open-weight model like Stable Diffusion could be fine-tuned into a very specific visual machine.

when the NovelAI leak happened, what spread was not just curiosity. it became a small fire that moved everywhere. people started talking about checkpoints, model weights, hypernetworks, embeddings, styles, datasets, Danbooru, and all the technical language that previously sounded like laboratory notes.

then A1111 became the workbench.

messy, unfriendly to beginners, sometimes annoying, but alive. you could change checkpoints, install extensions, try hypernetworks, use LoRA, adjust denoising strength, run img2img, inpaint, use ControlNet, and touch all the small tools that made image generation feel like opening a machine, not just pressing a button.

SD 1.5 had many weaknesses. the anatomy often collapsed. the hands looked like failed biological experiments trying to behave. the resolution was limited. if pushed too far, it broke easily.

but maybe that was why it felt close.

it was weak, but it could be tricked. it was wrong, but it could be argued with. it was not always beautiful, but it was open. there is something honest about a machine whose failures can be seen directly.

✦ ✦ ✦

then came SDXL.

SDXL felt like Stable Diffusion trying to grow up. cleaner images, more stable compositions, richer details, more polished faces. it no longer felt as primitive as SD 1.5. the improvement was obvious, especially for people who wanted more mature results without constantly repairing basic damage.

but SDXL did not feel like the final answer.

it was better, yes. heavier, yes. more polished, yes. but not always more obedient, and not always more useful for the kind of personal workflow i cared about. for certain styles, SD 1.5, with all its community models, LoRAs, ControlNet tricks, and ugly little workarounds, still felt easier to bend.

that is the interesting part of local ai development: newer does not always kill older. sometimes the older thing survives because it has become a workshop. dirty, narrow, but every tool has a place.

then FLUX arrived, and that was different.

before FLUX, most image models had a very obvious weakness: text. they could paint faces, bodies, lighting, dramatic backgrounds, cinematic posters, fake magazine covers, anything, really. but ask them to write a short literal phrase inside the image, and suddenly the machine became a drunk calligrapher possessed by alphabet soup.

FLUX was one of the first moments where that changed in a way that felt genuinely insane.

for the first time, an open model could write short sentences almost literally. not perfectly, not always, not without failure, but enough to make the old excuses collapse. suddenly, text inside images was no longer just decorative nonsense pretending to be language. it could actually become language.

that mattered.

because text is not just a detail. text is structure. text is instruction becoming visible. text is the machine proving that it can hold symbolic form inside visual form without immediately melting it into cursed typography.

FLUX made open-source feel serious again. prompt following became stronger. anatomy became more reasonable. visual quality became cleaner. and the fact that it could handle short text made it feel less like another incremental model and more like a real jump.

not a small step.

a jump.

✦ ✦ ✦

but even then, there was still another problem: consumer image ai.

for a long time, most consumer-facing image generators did not feel truly worth it to me. they were fun, sure. impressive for people who only wanted a pretty picture, maybe. but for actual workflow, for control, for character consistency, for editing something specific, for preserving identity without destroying the image, they often felt too shallow.

too locked.

too random.

too much like ordering food from a restaurant that refuses to tell you what ingredients it used.

before models like Nano Banana and GPT Image 2.0, i rarely felt that consumer image ai was genuinely useful enough to replace local workflow. it could generate something beautiful, but beauty alone is not control. a pretty mistake is still a mistake. sometimes it is worse, because people are more willing to forgive it.

Nano Banana changed that impression for me.

or at least, it cracked it.

its image-to-image ability felt different. it could preserve identity, understand context, change clothes, adjust atmosphere, and modify a scene without immediately destroying the subject. it did not behave like a normal local img2img workflow where the image is dragged into noise and rebuilt through denoising strength. it felt more semantic. more aware of what the object was supposed to be.

felt.

that word matters.

because once it failed to understand my left hand, that whole impression of “understanding” cracked very neatly.

then GPT Image 2.0 appeared as another sign that consumer image models were no longer just toys for casual prompts. i do not use it often enough to make a confident map of its limits, but it seems clear that this generation of image models is starting to handle things that older models struggled with: spatial instruction, object editing, identity preservation, maybe even details like left-handed poses with better consistency.

so the point is not that ai will always fail at the left hand.

the point is that every generation fixes one kind of stupidity and reveals another.

✦ ✦ ✦

i like contradictions like that.

a model can understand mood, lighting, texture, face, and even a long instruction that sounds like a legal contract. but when it comes to left and right, it can suddenly become a stubborn student who quietly believes it knows better than the teacher.

of course, things may continue to change. this problem is not an eternal law. it does not mean ai will never understand the left hand. technology moves, and sometimes it fixes small things that once felt impossible.

what interests me more is that every model has its own form of stupidity.

some models are beautiful but disobedient. some are obedient but ugly. some understand style but fail at structure. some preserve the face but quietly correct the body. and maybe some newer models are cleaner, sharper, more capable, but still leave me asking: how much do they truly understand, and how much are they simply becoming more skilled at imitation?

i do not hate ai.

if i hated it, i would not keep returning to it, trying new prompts, new workflows, new models, new ways to force the machine to understand the shape i mean.

i simply do not trust it completely.

and i think that is a healthy position.

because ai is not pure imagination. it is collective memory with an interface. it carries taste, bias, popular poses, visual standards, internet habits, and all the boring things repeated so often they start looking like truth.

✦ ✦ ✦

that is why local workflow still matters to me.

not because local is always smarter. often, it is not. local is annoying, slow, full of errors, and sometimes makes me feel like i am assembling a war machine out of old cables. but local gives me one thing closed platforms do not always give: the room to disagree.

if the result is wrong, i can break it open. if my left hand becomes my right, i can force it back. if the guitar is wrong, i can inpaint. if the pose drifts, i can lock it with ControlNet. it is not elegant, obviously. but at least i do not have to accept a beautifully packaged mistake as the final answer.

closed platforms give me magic.

local tools give me the right to distrust that magic.

and the longer this goes on, the more important that right feels.

because when a tool becomes too convenient, we often forget that convenience can also become dependency. we begin to accept the limits made for us. we begin to wait for features. wait for updates. wait for permission. wait for a company to decide whether something is still allowed or not.

everything sounds practical until one day the tool changes, and we realize we never truly owned it.

✦ ✦ ✦

maybe that is why, if i have to be serious, the answer is not only to find a smarter model.

the answer is to stop pretending that ownership is impossible.

i already know, more or less, how i would build my own ai.

not in the grand corporate sense, obviously. i am not hallucinating a data center in my bedroom. but the path is not mystical anymore. dataset curation, captioning, fine-tuning, LoRA, model merging, evaluation, iteration. the steps are ugly, repetitive, technical, and expensive, but they are not magic.

that is the annoying part.

i am not blocked by wonder.

i am blocked by money.

compute costs money. storage costs money. electricity costs money. hardware costs money. time costs money, even when people pretend it does not. a personal model is not impossible because the gods of technology forbid it. it is impossible in the ordinary way most things become impossible: the gate has a price tag.

very elegant, really. the future arrives, then asks for a GPU.

so no, i am not chasing the most perfect machine.

i am chasing a machine i can still own.

i do not want to become a product.

not now. not later. not ever.

because unfortunately, the only ones allowed to treat me like a product are my chosen partner, a cigarette factory, and a pizza maker.

everyone else can learn some manners.

i do not want my body, my style, my face, my way of drawing, my way of thinking, and my way of building a small world to become fuel for a system i cannot argue with. i do not want to keep renting imagination from a company that can change the rules at any time, close access at any time, remove features at any time, then call it an update.

if i have to use a weaker machine to keep control, maybe that is a fair price.

because a weak tool i can own is still more honest than a magical tool that slowly teaches me to depend on it.

and if one day i finally have enough money to build it properly, maybe the result will not be beautiful at first.

that is fine.

at least it will be born from a world i chose myself, not from a conventional truth that keeps trying to correct my shape into something easier to sell.

— rhea

Comments