About Alaska#2
ALASKA#2 is the follow-up steganalysis challenge, a year and a half after ALASKA#1 and almost a decade after BOSS, with a main target of offering a large and heterogeneous dataset of photographic images to address the difficulties that come from moving "from research labs" towards "into the wild".
While BOSS opens the doors to steganalysis with a high number of features and dedicated decision methods, ALASKA#1 aims at pushing the research community along the direction of "practical" steganalysis, using highly heterogeneous datasets to mimic the diversity one can find on the Internet. In fact the goal of ALASKA#2 is two-fold.
First it is aimed at offering the community a much larger updated dataset of photographic images.
On the one hand, we believe that BOSS dataset does not faithfully reflect the high diversity of media that can be found "in the real world". Indeed those media are rather homogeneous since it is based on a small set of 8 DLSR cameras, all images were processed from raw files in exactly the same way (that did not include any image processing tools), all images were uncompressed, in grayscale color, with the same size, etc.
On the opposite, the ALASKA#2 dataset offer a much larger dataset of 80.000 images, from more than 40 cameras (included smartphones, tablets, low-end cameras to high-end full frame DLSR) and processed in a realistic and highly heterogeneous way. We believe that this would prevent designing method specifically designed for a homogeneous dataset as observed recently for BOSS, and help researchers in designing way more general and robust steganographic and steganalysis methods.
Second, we have recently observed the used of Deep Learning techniques for steganalysis and steganography.
While those are very promising and start offering results that exceed everything that was known, it is questionable whether those methods are tailored a specific dataset and. More importantly, Deep Learning methods require a much larger dataset for training hence the interest of very large dataset as ALASKA#2 is.
Eventually, as for ALASKA#1, a challenge will be organized over between May and July 2020.
This contest will be organised throughout the Kaggle Competitions website
Since this challenge will start within a few days, we can now provide more details.
(i) The total cash price is of $25,000 to be shared between the three best performer.
(ii) A special session in IEEE WIFS 2020 will be organized such that the best competitors will be given the opportuinity to present their solutions. An addition $3,000 is offered by IEEE as travel grants.
(iii) The competition consists of a testing set of 5,000 images and four training set of 75,000 images. Images are color and compressed using JPEG with three different quality factors (95, 90 and 75). Three different embedding algorithms are used, namely J-UNIWARD, UERD and J-MiPOD, with an average payload 0.4 bpnzAC: the payload is different for each and every images such that most complex images have higher payload.
(iv) The competition ends on July 20, 2020 (11:59 PM UTC) and the paper submission deadline is July 31, 2020.
Timeline
Material
- First, and perhaps most important ones, we provide datasets. The main dataset is made of a set of 80,000 raw images (NOTE : It will be update by the end of 2023 !!) ; the images are numbered by Digital still camera model as described in this simple text file.
Besides, for easy use by the community we also provide also several processed datasets:
Uncompressed color and grayscale image datasets (of size 512x512, 256x256 for easy use in Deep learning and various sizes).
JPEG compressed images datasets (with quality factors: 100 , 95 , 90 , 85 , 80 , 75 and various QF). - Second we also provide the scripts that have been used to convert the raw images into jpeg format. Those python scripts use the main following library: numpy (version 1.14.5), pillow the Python Imaging Library (version 5.2), and the open-source raw image processing program Rawtherapee(version 5.7).
Those scripts are those that we have used to generate the various datasets from the raw image files. - For any question, regarding either image datasets and/or conversion scripts, contact us at alaska@utt.fr.
- While all the materials are available under the Creative Commons BY-NC-ND license for use in any research works we kindly ask you to credit our (enormous) work by either refering to the alaska website URL or, more relevant, by simply citing one of the associated papers:
The ALASKA Steganalysis Challenge: A First Step Towards Steganalysis "into the wild", Published in the 7th ACM IH&MMSec conference. - However, this license explicitly forbids (without prior authortisation from the authors) the used of any material for commercial purposes and the distribution of any material build upon the material provided, especially if you remix and transform the dataset.
Uncompressed Grayscale images datasets:
Size 512x512, PGM Size 256x256, PGM Various size, PGM
Uncompressed COLOR images datasets:
Size 512x512, PPM Size 256x256, PPM Various size, PPM
JPEG Compressed, Grayscale, Size 512x512 images datasets:
QF100 QF95 QF90 QF85 QF80 QF75 Various QF
JPEG Compressed, Grayscale, Size 256x256 images datasets:
QF100 QF95 QF90 QF85 QF80 QF75 Various QF
JPEG Compressed, Color, Size 512x512 images datasets:
QF100 QF95 QF90 QF85 QF80 QF75 Various QF
JPEG Compressed, Color, Size 256x256 images datasets:
QF100 QF95 QF90 QF85 QF80 QF75 Various QF
Rules
- Official submission, counting for the final ranking, was only possible throughout the Kaggle Competitions website.
Now that the official steganalysis challenge is over, you can use this webiste to submit your solution and get it evaluted but only for fun (the official ranking is closed). - Total cash price is $25,000 to be shared between the three best performers.
The ranking of submission is made by weighting the area under the ROC curve to focus on detection accuracy for low false probability.
Those winners will be asked to provide the source code of their detectors to get their price.
We strongly encourage the top-scoring teams to propose a paper to WIFS 2020 ; a $3000 travel grant will be offered by the IEEE for accepted papers. - All informations about timeline, assessement of the solutions, and material are provided on the Kaggle Competitions website.
The present website is mostly intended to provide the ALASKAv2 datasets, which can be used for any research and education purposes, for longterm storing. Registered users can also submit their proposal and get it evaluated for benchmarking purpose but this is an un-official ranking ; to do so, however, you must create an account with a valid email address that will be used to diplay your username .
Those accounts will only be used for statistical purposes and communications will be made only regarding the ALASKA challenge.
Submit an answer
You must be authenticated in order to submit your steganalysis results.
LeaderBoard
As of date (dynamique) the current leaderbord is the following. Note that you can click on column headers to sort submissions according to the various scores. However the one used for the "official" ranking is only the first one, MD005 (Missed detection for False alarm of 0.05).
Submission Date | Missed Detection at 5% False Alarm Rate | minPE (Minimal Error Rate) | FP50 (False Positive rate at 50% Missed Detection) |
---|
Hall of Fame
The ALASKA#1 challenge is now over. We have been working hard to propose this updated and enlarged dataset "into the wild". While a second challenge will start very soon, xe reproduce below the hall of fame of the first challenge (until the second ALASKA#2 challenge will be over)
Users / Teams | PMD005 (from Website) | PMD005 (over all results) | minPE (Minimal Error Rate) | FP50 |
---|---|---|---|---|
Binghamton University | 24.37 | 25.20 | 14.48 | 0.71 |
Shenzhen University | 50 | 51.60 | 25.20 | 5.86 |
3188960009 | 54.93 | 53.8 | 26.33 | 7.67 |
375790798 / veyron_gz | 53.35 | 54.2 | 25.78 | 7.67 |
Acknowledgements
- The ALASKA Steganalysis Challenge: A First Step Towards Steganalysis ``into the wild''., Published in the 7th ACM IH&MMSec conference.
- Rémi Cogranne, from Troyes University of Technology.
- Quentin Giboulot, from Troyes University of Technology (PhD student of Rémi Cogranne and Patrick Bas, who did not really choose to be part of the organization but enjoyed it).
- Patrick Bas, from École central de Lille.
- Antoine Prudhomme, for creating the present website.
- Julien Flamant, Jean-Baptiste Gobin, Bertrand De La Morlais, Florent Pergoud, Luc Rodrigues, Pascal Royer and Emile Touron for kindly provide some of their raw images as well as Dirk Borghys for the joint work on assessment of impact of raw images development on steganalysis performance (that gives birth to ALASKA dataset and challenge).
- The computer resources department of Troyes University of Technology who helped us with all their advices and suggestions.