Alaska#2


The follow-up steganalysis challenge "into the wild".
Because it is a long and perilous walk to move steganalysis...
from research labs to real life conditions.

About Alaska#2

ALASKA#2 is the follow-up steganalysis challenge, a year and a half after ALASKA#1 and almost a decade after BOSS, with a main target of offering a large and heterogeneous dataset of photographic images to address the difficulties that come from moving "from research labs" towards "into the wild".

While BOSS opens the doors to steganalysis with a high number of features and dedicated decision methods, ALASKA#1 aims at pushing the research community along the direction of "practical" steganalysis, using highly heterogeneous datasets to mimic the diversity one can find on the Internet. In fact the goal of ALASKA#2 is two-fold.

First it is aimed at offering the community a much larger updated dataset of photographic images.
On the one hand, we believe that BOSS dataset does not faithfully reflect the high diversity of media that can be found "in the real world". Indeed those media are rather homogeneous since it is based on a small set of 8 DLSR cameras, all images were processed from raw files in exactly the same way (that did not include any image processing tools), all images were uncompressed, in grayscale color, with the same size, etc.
On the opposite, the ALASKA#2 dataset offer a much larger dataset of 80.000 images, from more than 40 cameras (included smartphones, tablets, low-end cameras to high-end full frame DLSR) and processed in a realistic and highly heterogeneous way. We believe that this would prevent designing method specifically designed for a homogeneous dataset as observed recently for BOSS, and help researchers in designing way more general and robust steganographic and steganalysis methods.

Second, we have recently observed the used of Deep Learning techniques for steganalysis and steganography.
While those are very promising and start offering results that exceed everything that was known, it is questionable whether those methods are tailored a specific dataset and. More importantly, Deep Learning methods require a much larger dataset for training hence the interest of very large dataset as ALASKA#2 is.

Eventually, as for ALASKA#1, a challenge will be organized over between May and July 2020.
This contest will be organised throughout the Kaggle Competitions website
Since this challenge will start within a few days, we can now provide more details.
(i) The total cash price is of $25,000 to be shared between the three best performer.
(ii) A special session in IEEE WIFS 2020 will be organized such that the best competitors will be given the opportuinity to present their solutions. An addition $3,000 is offered by IEEE as travel grants.
(iii) The competition consists of a testing set of 5,000 images and four training set of 75,000 images. Images are color and compressed using JPEG with three different quality factors (95, 90 and 75). Three different embedding algorithms are used, namely J-UNIWARD, UERD and J-MiPOD, with an average payload 0.4 bpnzAC: the payload is different for each and every images such that most complex images have higher payload.
(iv) The competition ends on July 20, 2020 (11:59 PM UTC) and the paper submission deadline is July 31, 2020.

Timeline

Material

The materials provided are mostly of three kinds.
  • First, and perhaps most important ones, we provide datasets. The main dataset is made of a set of 80,000 raw images (NOTE : It will be update by the end of 2023 !!) ; the images are numbered by Digital still camera model as described in this simple text file.
    Besides, for easy use by the community we also provide also several processed datasets:
    Uncompressed color and grayscale image datasets (of size 512x512, 256x256 for easy use in Deep learning and various sizes).
    JPEG compressed images datasets (with quality factors: 100 , 95 , 90 , 85 , 80 , 75 and various QF).
  • Second we also provide the scripts that have been used to convert the raw images into jpeg format. Those python scripts use the main following library: numpy (version 1.14.5),  pillow the Python Imaging Library (version 5.2), and the open-source raw image processing program Rawtherapee(version 5.7).
    Those scripts are those that we have used to generate the various datasets from the raw image files.
  • For any question, regarding either image datasets and/or conversion scripts, contact us at alaska@utt.fr.
  • While all the materials are available under the Creative Commons BY-NC-ND license for use in any research works we kindly ask you to credit our (enormous) work by either refering to the alaska website URL or, more relevant, by simply citing one of the associated papers:
    The ALASKA Steganalysis Challenge: A First Step Towards Steganalysis "into the wild", Published in the 7th ACM IH&MMSec conference.
  • However, this license explicitly forbids (without prior authortisation from the authors) the used of any material for commercial purposes and the distribution of any material build upon the material provided, especially if you remix and transform the dataset.

Uncompressed Grayscale images datasets:
Size 512x512, PGM  Size 256x256, PGM  Various size, PGM

Uncompressed COLOR images datasets:
Size 512x512, PPM  Size 256x256, PPM  Various size, PPM

JPEG Compressed, Grayscale, Size 512x512 images datasets:
QF100QF95QF90QF85QF80QF75Various QF

JPEG Compressed, Grayscale, Size 256x256 images datasets:
QF100QF95QF90QF85QF80QF75Various QF

JPEG Compressed, Color, Size 512x512 images datasets:
QF100QF95QF90QF85QF80QF75Various QF

JPEG Compressed, Color, Size 256x256 images datasets:
QF100QF95QF90QF85QF80QF75Various QF

Rules

The ALASKAv2 steganalysis challenge has been held place over spring and summer 2020 lasting about 3 months in total. The rules are the following :
  • Official submission, counting for the final ranking, was only possible throughout the Kaggle Competitions website.
    Now that the official steganalysis challenge is over, you can use this webiste to submit your solution and get it evaluted but only for fun (the official ranking is closed).
  • Total cash price is $25,000 to be shared between the three best performers.
    The ranking of submission is made by weighting the area under the ROC curve to focus on detection accuracy for low false probability.
    Those winners will be asked to provide the source code of their detectors to get their price.
    We strongly encourage the top-scoring teams to propose a paper to WIFS 2020 ; a $3000 travel grant will be offered by the IEEE for accepted papers.
  • All informations about timeline, assessement of the solutions, and material are provided on the Kaggle Competitions website.
    The present website is mostly intended to provide the ALASKAv2 datasets, which can be used for any research and education purposes, for longterm storing. Registered users can also submit their proposal and get it evaluated for benchmarking purpose but this is an un-official ranking ; to do so, however, you must create an account with a valid email address that will be used to diplay your username .
    Those accounts will only be used for statistical purposes and communications will be made only regarding the ALASKA challenge.

Submit an answer

You must be authenticated in order to submit your steganalysis results.

LeaderBoard

As of date (dynamique) the current leaderbord is the following. Note that you can click on column headers to sort submissions according to the various scores. However the one used for the "official" ranking is only the first one, MD005 (Missed detection for False alarm of 0.05).

ALASKA#1 Official Ranking
ALASKA#2 Official Ranking
Stage 3
Submission Date
Missed Detection at 5% False Alarm Rate
minPE (Minimal Error Rate)
FP50 (False Positive rate at 50% Missed Detection)

Hall of Fame

The ALASKA#1 challenge is now over. We have been working hard to propose this updated and enlarged dataset "into the wild". While a second challenge will start very soon, xe reproduce below the hall of fame of the first challenge (until the second ALASKA#2 challenge will be over)

Users / Teams
PMD005
(from Website)
PMD005
(over all results)
minPE
(Minimal Error Rate)
FP50
Binghamton University24.3725.2014.480.71
Shenzhen University5051.6025.205.86
318896000954.9353.826.337.67
375790798 / veyron_gz53.3554.225.787.67

Acknowledgements

As indicated in the Material section you are free to download and use for any non-commercial purposes all datasets that are made available on this website. This especially includes the raw and processed images dataset as well as all conversion scripts in order to create any custom dataset that fits your need. We however kindly ask you to credit our (enormous work) by either referng to the alaska website or, more relevant, to simply cite one of the associated paper :
The ALASKA contest has been, in part, inspired by the BOSS competition and has been jointly and proudly organized by:
  • Rémi Cogranne, from Troyes University of Technology.
  • Quentin Giboulot, from Troyes University of Technology (PhD student of Rémi Cogranne and Patrick Bas, who did not really choose to be part of the organization but enjoyed it).
  • Patrick Bas, from École central de Lille.
We would like to thank all the individuals that help us organizing this contest. Those are mainly (but not exclusively):
  • Antoine Prudhomme, for creating the present website.
  • Julien Flamant, Jean-Baptiste Gobin, Bertrand De La Morlais, Florent Pergoud, Luc Rodrigues, Pascal Royer and Emile Touron for kindly provide some of their raw images as well as Dirk Borghys for the joint work on assessment of impact of raw images development on steganalysis performance (that gives birth to ALASKA dataset and challenge).
  • The computer resources department of Troyes University of Technology who helped us with all their advices and suggestions.