If you want to see something rather amusing - instead of using the LLM aspect of Gemini 3.0 Pro, feed a five-legged dog directly into Nano Banana Pro and give it an editing task that requires an intrinsic understanding of the unusual anatomy.
Place sneakers on all of its legs.
It'll get this correct a surprising number of times (tested with BFL Flux2 Pro, and NB Pro).Does this still work if you give it a pre-existing many-legged animal image, instead of first prompting it to add an extra leg and then prompting it to put the sneakers on all the legs?
I'm wondering if it may only expect the additional leg because you literally just told it to add said additional leg. It would just need to remember your previous instruction and its previous action, rather than to correctly identify the number of legs directly from the image.
I'll also note that photos of dogs with shoes on is definitely something it has been trained on, albeit presumably more often dog booties than human sneakers.
Can you make it place the sneakers incorrectly-on-purpose? "Place the sneakers on all the dog's knees?"
My example was unclear. Each of those images on Imgur was generated using independent API calls which means there was no "rolling context/memory".
In other words:
1. Took a personal image of my dog Lily
2. Had NB Pro add a fifth leg using the Gemini API
3. Downloaded image
4. Sent image to BFL Flux2 Pro via the BFL API with the prompt "Place sneakers on all the legs of this animal".
5. Sent image to NB Pro via Gemini API with the prompt "Place sneakers on all the legs of this animal".
So not only was there zero "continual context", it was two entirely different models as well to cover my bases.
EDIT: Added images to the Imgur for the following prompts:
- Place red Dixie solo cups on the ends of every foot on the animal
- Draw a red circle around all the feet on the animal
i imagine the real answer is that the edits are local because that's how diffusion works; it's not like it's turning the input into "five-legged dog" and then generating a five-legged dog in shoes from scratch