Skip to content

DPBayes/impact-dataset-properties-MI-vulnerability-deep-TL

Repository files navigation

Impact of Dataset Properties on Membership Inference Vulnerability of Deep Transfer Learning (NeurIPS 2025)

Marlon Tobaben1*, Hibiki Ito2*, Joonas Jälkö1*, Yuan He1, and Antti Honkela1

1 University of Helsinki, Finland, 2 Kyoto University, Japan

*These authors contributed equally.

Illustration of the observed power-law relation between MIA vulnerability and examples per class when attacking a fine-tuned ViT-B Head using LiRA. Each colored line denotes a different fine-tuning dataset where C specifies the number of classes. Illustration of the observed power-law relation between MIA vulnerability and examples per class when attacking a fine-tuned ViT-B Head using LiRA. Each colored line denotes a different fine-tuning dataset where C specifies the number of classes.

Repository

This repository contains the code to reproduce the experiments carried out in Impact of Dataset Properties on Membership Inference Vulnerability of Deep Transfer Learning published at Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS 2025).

Dependencies

This code requires the following:

  • Python 3.8 or greater
  • PyTorch 1.11 or greater (most of the code is written in PyTorch)
  • opacus 1.3 or greater
  • optuna 3.0 or greater
  • TensorFlow 2.8 or greater (for reading VTAB datasets)
  • TensorFlow Datasets 4.5.2 or greater (for reading VTAB datasets)
  • statsmodels (for fitting the linear regression model)
  • pandas and numpy

Source Code Libraries

This code repository builds on-top of the code of On the Efficacy of Differentially Private Few-shot Image Classification.

In this work codebase, we rely on the following open source code libraries, some of which we have modified:

GPU Requirements

The experiments in the paper are executed on CPU (Head experiments) or a NVIDIA V100 GPU with 40 GB (FiLM experiments).

Installation for LiRA on ViT-B/R-50 (Head) (Section 4)

Installation

The following steps will take a considerable length of time and disk space.

  1. Clone or download this repository.

  2. Install the dependencies listed above.

  3. The experiments use datasets obtained from TensorFlow Datasets. The majority of these are downloaded and pre-processed upon first use. However, the Resisc45 dataset needs to be downloaded manually. Click on the links for details.

  4. Switch to the src directory in this repo and download the BiT pretrained model:

    wget https://storage.googleapis.com/bit_models/BiT-M-R50x1.npz

  5. Copy timm folder to section4/timm.

Cached head LiRA (Section 4)

It is more computationally efficient to cache feature representations and load them. Thus, only a final last layer has to be trained.

Cache feature representations

Use section4_training/feature_space_cache/map_to_feature_space.py to save representations from datasets in feature dimension. This has to be only done once for each dataset. E.g.,

python3 -m feature_space_cache.map_to_feature_space 
    --feature_extractor vit-b-16 
    --dataset cifar10 
    --examples_per_class -1 
    --download_path_for_tensorflow_datasets [PATH] 
    --feature_dim_path [feature_dim_path] 

Run LiRA on Head models

Use the functions in section4_training/lira/run_lira.py to load the data, train a head models and generate intermediate LiRA data. E.g.,

python3 -m lira.run_lira 
    --record_l2_norms
    --n_classes -1
    --data_seed 0 
    --shots 16
    --target_epsilon -1
    --seed 0 
    --feature_extractor vit-b-16
    --dataset cifar10 
    --num_shadow_models 256
    --number_of_trials 20 
    --data_path [data path]
    -c [checkpoint_dir]

Use the functions in section4_training/lira/process_lira.py to process the intermediate LiRA data files.

Run RMIA on Head models (Section 4.2)

We utilize the output of the LiRA training (including the logits) and use section4_training/rmia/split_into_files.py to build up the same folder structure (seperate folder by model) that RMIA requires.

We run the RMIA attack with the code by the authors at git hash 173d4ad. We added an example config for RMIA to section4_training/rmia/example_config_rmia.yaml. Follow the instructions in that repository to setup and run the attack.

LiRA on R-50 FiLM (Section 4.3)

Follow the instructions in the On the Efficacy of Differentially Private Few-shot Image Classification repository to train FiLM models.

Prediction Model (Section 4.3)

Use the functions in section4_prediction_model/predict_mia_dataset.py to train a prediction model.

The dataframe with the data needs to include at least the following columns:

  • n_classes, shots and feature_extractor
  • 0.1, 0.01, 0.001, 0.0001, 0.00001 specifying the TPR with the column name being the FPR.

Individual MIA vulnerability (Section 4.4)

Run produce_fig6.pyafter setting the path in the code to the outputs of LiRA to produce Figure 6.

Comparison between empirical models and universal DP bounds (Section 4.5)

Run produce_tab1.pyto produce Table 1.

Contact

To ask questions or report issues, please open an issue on the issues tracker.

Citation

If you use this code, please cite our paper .

@article{tobaben2025MIAPowerlaw,
    title = {Impact of Dataset Properties on Membership Inference Vulnerability of Deep Transfer Learning},
    author = {Marlon Tobaben and Hibiki Ito and Joonas J{\"{a}}lk{\"{o}} and Yuan He and Antti Honkela},
    booktitle = {Advances in Neural Information Processing Systems 39: AnnualConference on Neural Information Processing Systems 2025, NeurIPS 2025, San Diego, CA, United States of America, December 02 - 07, 2025},
    year = {2025},
  }

About

Code for the paper Impact of Dataset Properties on Membership Inference Vulnerability of Deep Transfer Learning (NeurIPS 2025)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages