SEEKER: Query-Efficient Model Extraction via Semi-Supervised Public Knowledge Transfer
Bihe Zhao, Zhenyu Guan, Junpeng Jing, Yanting Zhang, Xianglun Leng, Song Bian
-
Model extraction attacks against Deep Neural Networks (DNN) aim at extracting DNN models without white-box access to the model internals and the training datasets. Currently, most existing model extraction methods require an excessive number of queries (up to millions) to reproduce a useful substitute model, and can be impractical in real-world scenarios. In this work, we propose SEEKER, a two-stage query-efficient model extraction framework that consists of an offline stage and an online stage. First, by using our proposed augmentation invariant unsupervised training scheme, the substitute model is trained to learn generalizable feature representations from the unannotated public dataset. Then, during the online stage, we design an aggregated query generator to craft information-rich queries merging multiple input data in the unannotated public datasets. By conducting thorough experiments, we show that our method can reduce the query budget by more than 50× for attaining the same level of attack success rate when compared to the state-of-the-art model extraction attacks. Additionally, SEEKER is able to achieve as high as 93.97% prediction accuracy while retaining high query-efficiency. In terms of stealthiness, our attack demonstrates the capability of bypassing distribution-based attack detection mechanisms.
New Finding and Unified Framework for Fake Image Detection
Xin Deng*, Bihe Zhao*, Zhenyu Guan, Mai Xu
IEEE Signal Processing Letters
[Paper]
[Code]
- Recently, fake face images generated by generative adversarial network (GAN) have been widely spread in social networks, raising serious social concerns and security risks.
To identify the fake images, the top priority is to find what properties make the fake images different from the real images. In this letter, we reveal an important observation about real/fake images, i.e., the GAN generated fake images contain stronger non-local self-similarity than the real images.
Motivated by this observation, we propose a simple yet effective non-local attention based fake image detection network, namely NAFID, to distinguish GAN generated fake images from real images. Specifically, we develop a non-local feature extraction (NFE) module to extract the non-local features of the real/fake images, followed by a multi-stage classification module to distinguish the images with the extracted non-local features. Experimental results on various datasets demonstrate the superiority of our NAFID over state-of-the-art (SOTA) face forgery detection methods.
More importantly, since the NFE module is independent from classification, we can plug it into any other forgery detection models. The results show that the NFE module can consistently improve the detection accuracy of other models, which verifies the universality of the proposed method.
PointSteal: Extracting Point Cloud Models
Zhenyu Guan*, Bihe Zhao*, Song Bian
- We study how to efficiently extract black-box neural network models for 3D point cloud recognition. Different from 2D images, we identify two main challenges in extracting point cloud models. First, since point clouds are sparsely distributed in a higher dimensional space, the relatively low information density limits one from efficiently extracting knowledge from the target model. Second, the number of public point cloud datasets are much more scarce compared to images, which renders the query generation process even more difficult. To address the above challenges, we present PointSteal, a model extraction attack against 3D point cloud models.To increase the amount of publicly available 3D point cloud data, we leverage the rich 2D public image datasets to reconstruct a diversified 3D point cloud surrogate dataset. Based on the surrogate dataset, we develop a multi-input query generation network that processes multiple surrogate data simultaneously to efficiently explore the high-dimensional point-cloud search space. We show that PointSteal can efficiently extract a point cloud model with high extraction accuracy and transfer attack success rate across different adversarial settings.
More importantly, since the NFE module is independent from classification, we can plug it into any other forgery detection models. The results show that the NFE module can consistently improve the detection accuracy of other models, which verifies the universality of the proposed method.
Adaptive Hyperparameter Optimization for Black-box Adversarial Attack
Zhenyu Guan, Lixin Zhang, Bohan Huang, Bihe Zhao, Song Bian
International Journal of Information Security
[Paper]
- The study of adversarial attacks is crucial in the design of robust neural network models. In this work, we propose a hyperparameter optimization framework for black-box adversarial attacks. We observe that hyperparameters are extremely important to enhance the query efficiency of many black-box adversarial attack methods. Hence, we propose an adaptive hyperparameter tuning framework such that, in each query iteration, the attacker can adaptively selects the hyperparameter configuration based on the feedback from the victim to improve the attack success rate and query efficiency of the attack algorithm. The experiment results show, by adaptively tuning the attack hyperparameters, our technique outperforms the original algorithm, where the query efficiency is improved by 33.63% on the NES algorithm for untargeted attacks, 44.47% on the Bandits algorithm for untargeted attacks, and 32.24% improvement on the Bandits algorithm for targeted attacks.
ARMOR: Differential Model Distribution for Adversarially Robust Federated Learning
Yanting Zhang, Jianwei Liu, Zhenyu Guan, Bihe Zhao, Xianglun Leng, Song Bian
Electronics
[Paper]
- In this work, we formalize the concept of differential model robustness (DMR), a new property for ensuring model security in federated learning (FL) systems. For most conventional FL frameworks, all clients receive the same global model. If there exists a Byzantine client who maliciously generates adversarial samples against the global model, the attack will be immediately transferred to all other benign clients. To address the attack transferability concern and improve the DMR of FL systems, we propose the notion of differential model distribution (DMD) where the server distributes different models to different clients. As a concrete instantiation of DMD, we propose the ARMOR framework utilizing differential adversarial training to prevent a corrupted client from launching white-box adversarial attack against other clients, for the local model received by the corrupted client is different from that of benign clients. Through extensive experiments, we demonstrate that ARMOR can significantly reduce both the attack success rate (ASR) and average adversarial transfer rate (AATR) across different FL settings. For instance, for a 35-client FL system, the ASR and AATR can be reduced by as much as 85% and 80% over the MNIST dataset.