Alaska#2


The follow-up steganalysis challenge "into the wild".
Because it is a long and perilous walk to move steganalysis...
from research labs to real life conditions.

About Alaska#2

ALASKA#2 is the follow-up steganalysis challenge, a year and a half after ALASKA#1 and almost a decade after BOSS, with a main target of offering a large and heterogeneous dataset of photographic images to address the difficulties that come from moving "from research labs" towards "into the wild".

While BOSS opens the doors to steganalysis with a high number of features and dedicated decision methods, ALASKA#1 aims at pushing the research community along the direction of "practical" steganalysis, using highly heterogeneous datasets to mimic the diversity one can find on the Internet. In fact the goal of ALASKA#2 is two-fold.

First it is aimed at offering the community a much larger updated dataset of photographic images.
On the one hand, we believe that BOSS dataset does not faithfully reflect the high diversity of media that can be found "in the real world". Indeed those media are rather homogeneous since it is based on a small set of 8 DLSR cameras, all images were processed from raw files in exactly the same way (that did not include any image processing tools), all images were uncompressed, in grayscale color, with the same size, etc.
On the opposite, the ALASKA#2 dataset offer a much larger dataset of 80.000 images, from more than 40 cameras (included smartphones, tablets, low-end cameras to high-end full frame DLSR) and processed in a realistic and highly heterogeneous way. We believe that this would prevent designing method specifically designed for a homogeneous dataset as observed recently for BOSS, and help researchers in designing way more general and robust steganographic and steganalysis methods.

Second, we have recently observed the used of Deep Learning techniques for steganalysis and steganography.
While those are very promising and start offering results that exceed everything that was known, it is questionable whether those methods are tailored a specific dataset and. More importantly, Deep Learning methods require a much larger dataset for training hence the interest of very large dataset as ALASKA#2 is.

Eventually, as for ALASKA#1, a challenge will be organized over the spring of 2020.Though all the details of the ALASKA#2 challenge are not set yet it is aimed at:
(i) organizing a special session in IEEE WIFS 2020 open to best competitors ;
(ii) offering a cash price, of at least $5,000 in total, to be shared among the best three competitors.

Timeline

Material

In order not to provide some elements regarding the ALASKA#2 challange ahead in advance, the present website only proposed offers two main types of materials.
  • First, and perhaps most important ones, we provide datasets. The main dataset is made of a set of 80,000 raw images ; the images are numbered by Digital still camera model as described in this simple text file.
    Besides, for easy use by the community we also provide also several processed dataset:
    Uncompressed color and grayscale image datasets (of size 512x512, 256x256 for easy use in Deep learning and various sizes).
    JPEG compressed images datasets (with quality factors: 100 , 95 , 90 , 85 , 80 , 75 and various QF).
  • Second we also provide the scripts that have been used to convert the raw images into jpeg format. Those python scripts use the main following library: numpy (version 1.14.5),  pillow the Python Imaging Library (version 5.2), and the open-source raw image processing program Rawtherapee(version 5.7).
    Those scripts are those that we have used to generate the various datasets from the raw image files.
  • For any question, regarding either image datasets and/or conversion scripts, contact us at alaska@utt.fr.
  • While all the materials are available under the Creative Commons BY-NC-ND license for use in any research works we kindly ask you to credit our (enormous) work by either refering to the alaska website URL or, more relevant, by simply citing one of the associated papers:
    The ALASKA Steganalysis Challenge: A First Step Towards Steganalysis "into the wild", Published in the 7th ACM IH&MMSec conference.

Uncompressed Grayscale images datasets:
Size 512x512, PGM  Size 256x256, PGM  Various size, PGM

Uncompressed COLOR images datasets:
Size 512x512, PPM  Size 256x256, PPM  Various size, PPM

JPEG Compressed, Grayscale, Size 512x512 images datasets:
QF100QF95QF90QF85QF80QF75Various QF

JPEG Compressed, Grayscale, Size 256x256 images datasets:
QF100QF95QF90QF85QF80QF75Various QF

JPEG Compressed, Color, Size 512x512 images datasets:
QF100QF95QF90QF85QF80QF75Various QF

JPEG Compressed, Color, Size 256x256 images datasets:
QF100QF95QF90QF85QF80QF75Various QF

Rules

The contest will take place over spring 2020 lasting about 3 months in total (more details to come). The rules are the following :
  • Submission can be made only throughout the Kaggle Competitions website.
  • Cash prices of at least $5000 will be provided for the best three users.
    Those winners will be asked to provide the source code of their detectors to get their price.
    We strongly encourage the five top-scoring teams to propose a paper to WIFS 2020 ; a $3000 travel grant will be offered for accepted papers.
  • Competitors cannot submit more than one trial every four hours.
    Each submission is evaluated over a randomly selected subset of 80% of the testing set. The final results, when contest closes, will be adjusted with evaluation over the whole testing set.
    The ranking is made using the empirical probability of missed detection for a fixed empirical probability of false alarm of 5%. We will count of many images with hidden data are incorrectly classified as covers when exactly 5% of cover images are incorrectly classified as containing hidden data.
  • Anyone can download the dataset from present website.
    The present website will also allow users to submit their proposal once the official challenge will be over (to avoid providing side informations) ; for this users must create an account.
    Those accounts will only be used for statistical purposes and communications will be made only regarding the ALASKA challenge.

Hall of Fame

The ALASKA#1 challenge is now over. We have been working hard to propose this updated and enlarged dataset "into the wild". We are still working to organize the second challenge test dataset ; this especially includes an embedding script along with a slightly modified conversion script. You can still submit your proposal to ALASKA#1 on the website ; however, we reproduce below the hall of fame of the first challenge (until the second ALASKA#2 challenge will be over)

Users / Teams
PMD005
(from Website)
PMD005
(over all results)
minPE
(Minimal Error Rate)
FP50
Binghamton University24.3725.2014.480.71
Shenzhen University5051.6025.205.86
318896000954.9353.826.337.67
375790798 / veyron_gz53.3554.225.787.67

Acknowledgements

As indicated in the Material section you are free to download and use for any non-commercial purposes all datasets that are made available on this website. This especially includes the raw and processed images dataset as well as all conversion scripts in order to create any custom dataset that fits your need. We however kindly ask you to credit our (enormous work) by either referng to the alaska website or, more relevant, to simply cite one of the associated paper :
The ALASKA contest has been, in part, inspired by the BOSS competition and has been jointly and proudly organized by:
  • Rémi Cogranne, from Troyes University of Technology.
  • Quentin Giboulot, from Troyes University of Technology (PhD student of Rémi Cogranne and Patrick Bas, who did not really choose to be part of the organization but enjoyed it).
  • Patrick Bas, from École central de Lille.
We would like to thank all the individuals that help us organizing this contest. Those are mainly (but not exclusively):
  • Antoine Prudhomme, for creating the present website.
  • Julien Flamant, Jean-Baptiste Gobin, Bertrand De La Morlais, Florent Pergoud, Luc Rodrigues, Pascal Royer and Emile Touron for kindly provide some of their raw images as well as Dirk Borghys for the joint work on assessment of impact of raw images development on steganalysis performance (that gives birth to ALASKA dataset and challenge).
  • The computer resources department of Troyes University of Technology who helped us with all their advices and suggestions.