Hizonner 3 days ago

This sort of thing would be really nifty, except that it relies on the model using encrypted message text whenever it needs "random numbers".

The thing is that I'm under the impression that people who actually use diffusion models to produce images value reproducibility, which they achieve by using pseudorandom generators with relatively short seeds, and remembering the seeds.

Not only that, but the software tends to try to embed prompts and the like in the created image files, and I suspect it puts the seeds, model identities, and anything else that matters in there, too, with the idea that the generated image contains everything you'd need to reproduce it. If it does leave leaves anything out, there's a good chance somebody is going change it to include that thing.

It's not that they actually want to reproduce it exactly, but that that stuff gives a baseline they can tweak to create refined versions.

That's incompatible with the steganography. If you have all the alleged inputs, you can detect an image bearing steganographic data by trying to reproduce it. And if it's normal to include all of the inputs, then images without them are going to look suspicious.

It doesn't seem useless. Even so, it seems as though it'd be a lot more useful if the people writing image generation software could be convinced to play along by using truly unpredictable cryptographic RNGs. But that would mean giving up reproducibility that I think they value.

  • orbital-decay 3 days ago

    Reproducibility was only possible very early on for a very short time a couple years ago. Nowadays the amount of settings, software, tooling, hardware, and runtimes is vast, and it's not possible to match everything. Controlnet/adapters inputs aren't stored at all, and composite images can be whatever you like even if they contain something resembling metadata. And it's not like many people care about this.

  • tusharjois 2 days ago

    Hey, author here. Thanks for your comment. You're right that communities that prioritize full and total reproducibility wouldn't be an ideal "target drop location" for such a system. But, we'd argue that there are a lot of places that don't do so (in the name of getting "the best image possible") so we'd want to focus any deployment efforts there.

    And we totally agree with your last point -- in our paper, we have a section devoted to what we need for better steganography to be possible through diffusion models. I don't necessarily think there's a tradeoff between reproducibility and steganography, though, depending on how future models can be designed. I think what you're proposing -- an image that's reproducible via traditional means without compromising the security of the underlying scheme -- would be an interesting direction for future work.

    Also, if anyone else has questions I'd be happy to answer them!