Unlocking Post-hoc Dataset Inference with Synthetic Data
Bihe Zhao, Pratyush Maini, Franziska Boenisch, Adam Dziedzic
ICML 2025
[Paper]
[Code]
- The remarkable capabilities of Large Language Models (LLMs) can be mainly attributed to their massive training datasets, which are often scraped from the internet without respecting data owners' intellectual property rights. Dataset Inference (DI) offers a potential remedy by identifying whether a suspect dataset was used in training, thereby enabling data owners to verify unauthorized use. However, existing DI methods require a private set-known to be absent from training-that closely matches the compromised dataset's distribution. Such in-distribution, held-out data is rarely available in practice, severely limiting the applicability of DI. In this work, we address this challenge by synthetically generating the required held-out set. Our approach tackles two key obstacles: (1) creating high-quality, diverse synthetic data that accurately reflects the original distribution, which we achieve via a data generator trained on a carefully designed suffix-based completion task, and (2) bridging likelihood gaps between real and synthetic data, which is realized through post-hoc calibration. Extensive experiments on diverse text datasets show that using our generated data as a held-out set enables DI to detect the original training sets with high confidence, while maintaining a low false positive rate. This result empowers copyright owners to make legitimate claims on data usage and demonstrates our method's reliability for real-world litigations.
BitMark for Infinity: Watermarking Bitwise Autoregressive Image Generative Models
Louis Kerner, Michel Meintz, Bihe Zhao, Franziska Boenisch, Adam Dziedzic
NeurIPS 2025
[Paper]
- State-of-the-art text-to-image models like Infinity generate photorealistic images at an unprecedented speed. These models operate in a bitwise autoregressive manner over a discrete set of tokens that is practically infinite in size. However, their impressive generative power comes with a growing risk: as their outputs increasingly populate the Internet, they are likely to be scraped and reused as training data-potentially by the very same models. This phenomenon has been shown to lead to model collapse, where repeated training on generated content, especially from the models' own previous versions, causes a gradual degradation in performance. A promising mitigation strategy is watermarking, which embeds human-imperceptible yet detectable signals into generated images-enabling the identification of generated content. In this work, we introduce BitMark, a robust bitwise watermarking framework for Infinity. Our method embeds a watermark directly at the bit level of the token stream across multiple scales (also referred to as resolutions) during Infinity's image generation process. Our bitwise watermark subtly influences the bits to preserve visual fidelity and generation speed while remaining robust against a spectrum of removal techniques. Furthermore, it exhibits high radioactivity, i.e., when watermarked generated images are used to train another image generative model, this second model's outputs will also carry the watermark. The radioactive traces remain detectable even when only fine-tuning diffusion or image autoregressive models on images watermarked with our BitMark. Overall, our approach provides a principled step toward preventing model collapse in image generative models by enabling reliable detection of generated outputs.
New Finding and Unified Framework for Fake Image Detection
Xin Deng*, Bihe Zhao*, Zhenyu Guan, Mai Xu
IEEE Signal Processing Letters
[Paper]
[Code]
- Recently, fake face images generated by generative adversarial network (GAN) have been widely spread in social networks, raising serious social concerns and security risks.
To identify the fake images, the top priority is to find what properties make the fake images different from the real images. In this letter, we reveal an important observation about real/fake images, i.e., the GAN generated fake images contain stronger non-local self-similarity than the real images.
Motivated by this observation, we propose a simple yet effective non-local attention based fake image detection network, namely NAFID, to distinguish GAN generated fake images from real images. Specifically, we develop a non-local feature extraction (NFE) module to extract the non-local features of the real/fake images, followed by a multi-stage classification module to distinguish the images with the extracted non-local features. Experimental results on various datasets demonstrate the superiority of our NAFID over state-of-the-art (SOTA) face forgery detection methods.
More importantly, since the NFE module is independent from classification, we can plug it into any other forgery detection models. The results show that the NFE module can consistently improve the detection accuracy of other models, which verifies the universality of the proposed method.