Impact of Dataset Properties on Membership Inference Vulnerability of Deep Transfer Learning (NeurIPS 2025)

Marlon Tobaben^1*, Hibiki Ito^2*, Joonas Jälkö^1*, Yuan He¹, and Antti Honkela¹

¹ University of Helsinki, Finland, ² Kyoto University, Japan

^*These authors contributed equally.

Illustration of the observed power-law relation between MIA vulnerability and examples per class when attacking a fine-tuned ViT-B Head using LiRA. Each colored line denotes a different fine-tuning dataset where C specifies the number of classes.

Repository

This repository contains the code to reproduce the experiments carried out in Impact of Dataset Properties on Membership Inference Vulnerability of Deep Transfer Learning published at Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS 2025).

Dependencies

This code requires the following:

Python 3.8 or greater
PyTorch 1.11 or greater (most of the code is written in PyTorch)
opacus 1.3 or greater
optuna 3.0 or greater
TensorFlow 2.8 or greater (for reading VTAB datasets)
TensorFlow Datasets 4.5.2 or greater (for reading VTAB datasets)
statsmodels (for fitting the linear regression model)
pandas and numpy

Source Code Libraries

This code repository builds on-top of the code of On the Efficacy of Differentially Private Few-shot Image Classification.

In this work codebase, we rely on the following open source code libraries, some of which we have modified:

TIMM (for the PyTorch VIT-B implementation): Copyright 2020 Ross Wightman https://github.com/rwightman/pytorch-image-models
Big Transfer (for the R-50 implementation): Copyright 2020 Google LLC https://github.com/google-research/big_transfer
Tensorflow Privacy (for the LiRA implementation): Copyright 2022, The TensorFlow Authors https://github.com/tensorflow/privacy
cambridge-mlg/dp-few-shot (for the caching of features) https://github.com/cambridge-mlg/dp-few-shot
privacytrustlab/ml_privacy_meter (for RMIA): https://github.com/privacytrustlab/ml_privacy_meter

GPU Requirements

The experiments in the paper are executed on CPU (Head experiments) or a NVIDIA V100 GPU with 40 GB (FiLM experiments).

Installation for LiRA on ViT-B/R-50 (Head) (Section 4)

Installation

The following steps will take a considerable length of time and disk space.

Clone or download this repository.
Install the dependencies listed above.
The experiments use datasets obtained from TensorFlow Datasets. The majority of these are downloaded and pre-processed upon first use. However, the Resisc45 dataset needs to be downloaded manually. Click on the links for details.
Switch to the src directory in this repo and download the BiT pretrained model:

wget https://storage.googleapis.com/bit_models/BiT-M-R50x1.npz
Copy timm folder to section4/timm.

Cached head LiRA (Section 4)

It is more computationally efficient to cache feature representations and load them. Thus, only a final last layer has to be trained.

Cache feature representations

Use section4_training/feature_space_cache/map_to_feature_space.py to save representations from datasets in feature dimension. This has to be only done once for each dataset. E.g.,

python3 -m feature_space_cache.map_to_feature_space 
    --feature_extractor vit-b-16 
    --dataset cifar10 
    --examples_per_class -1 
    --download_path_for_tensorflow_datasets [PATH] 
    --feature_dim_path [feature_dim_path]

Run LiRA on Head models

Use the functions in section4_training/lira/run_lira.py to load the data, train a head models and generate intermediate LiRA data. E.g.,

python3 -m lira.run_lira 
    --record_l2_norms
    --n_classes -1
    --data_seed 0 
    --shots 16
    --target_epsilon -1
    --seed 0 
    --feature_extractor vit-b-16
    --dataset cifar10 
    --num_shadow_models 256
    --number_of_trials 20 
    --data_path [data path]
    -c [checkpoint_dir]

Use the functions in section4_training/lira/process_lira.py to process the intermediate LiRA data files.

Run RMIA on Head models (Section 4.2)

We utilize the output of the LiRA training (including the logits) and use section4_training/rmia/split_into_files.py to build up the same folder structure (seperate folder by model) that RMIA requires.

We run the RMIA attack with the code by the authors at git hash 173d4ad. We added an example config for RMIA to section4_training/rmia/example_config_rmia.yaml. Follow the instructions in that repository to setup and run the attack.

LiRA on R-50 FiLM (Section 4.3)

Follow the instructions in the On the Efficacy of Differentially Private Few-shot Image Classification repository to train FiLM models.

Prediction Model (Section 4.3)

Use the functions in section4_prediction_model/predict_mia_dataset.py to train a prediction model.

The dataframe with the data needs to include at least the following columns:

n_classes, shots and feature_extractor
0.1, 0.01, 0.001, 0.0001, 0.00001 specifying the TPR with the column name being the FPR.

Individual MIA vulnerability (Section 4.4)

Run produce_fig6.pyafter setting the path in the code to the outputs of LiRA to produce Figure 6.

Comparison between empirical models and universal DP bounds (Section 4.5)

Run produce_tab1.pyto produce Table 1.

Contact

To ask questions or report issues, please open an issue on the issues tracker.

Citation

If you use this code, please cite our paper .

@article{tobaben2025MIAPowerlaw,
    title = {Impact of Dataset Properties on Membership Inference Vulnerability of Deep Transfer Learning},
    author = {Marlon Tobaben and Hibiki Ito and Joonas J{\"{a}}lk{\"{o}} and Yuan He and Antti Honkela},
    booktitle = {Advances in Neural Information Processing Systems 39: AnnualConference on Neural Information Processing Systems 2025, NeurIPS 2025, San Diego, CA, United States of America, December 02 - 07, 2025},
    year = {2025},
  }

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
section4_prediction_model		section4_prediction_model
section4_training		section4_training
.gitignore		.gitignore
README.md		README.md
fig1.png		fig1.png
produce_fig6.py		produce_fig6.py
produce_tab1.py		produce_tab1.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Impact of Dataset Properties on Membership Inference Vulnerability of Deep Transfer Learning (NeurIPS 2025)

Repository

Dependencies

Source Code Libraries

GPU Requirements

Installation for LiRA on ViT-B/R-50 (Head) (Section 4)

Installation

Cached head LiRA (Section 4)

Cache feature representations

Run LiRA on Head models

Run RMIA on Head models (Section 4.2)

LiRA on R-50 FiLM (Section 4.3)

Prediction Model (Section 4.3)

Individual MIA vulnerability (Section 4.4)

Comparison between empirical models and universal DP bounds (Section 4.5)

Contact

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Impact of Dataset Properties on Membership Inference Vulnerability of Deep Transfer Learning (NeurIPS 2025)

Repository

Dependencies

Source Code Libraries

GPU Requirements

Installation for LiRA on ViT-B/R-50 (Head) (Section 4)

Installation

Cached head LiRA (Section 4)

Cache feature representations

Run LiRA on Head models

Run RMIA on Head models (Section 4.2)

LiRA on R-50 FiLM (Section 4.3)

Prediction Model (Section 4.3)

Individual MIA vulnerability (Section 4.4)

Comparison between empirical models and universal DP bounds (Section 4.5)

Contact

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages