Create a database of hyperspectral imagery for use in MachineDeep Learning¶

This is one of the articles from forum Best VISION Application added by registered user.

Challenging task: Make hyperspectral imaging mainstream

Idea: Create a large database of hyperspectral imagery for use in Machine/Deep Learning competitions

Proposer: Igor Carron, Nuit Blanche, http://www.linkedin.com/in/IgorCarron

Author also posted this idea on Nuit Blanche

Background

Machine Learning is the field concerned with creating, training and using algorithms dedicated to making sense of data. These algorithms are taking advantage of training data (images, videos) as a way of improving for tasks such as detection, classification, etc. In recent years, we have witnessed a spectacular growth in this field thanks to the joint availability of large datasets originating from the internet and the attendant curating/labeling efforts of said images and videos.

Numerous labeled datasets available such as CIFAR [1], Imagenet [2], etc. routinely permit algorithms of increased complexity to be developed and compete in state of the art classification contests. For instance, the rise of deep learning algorithms comes from breaking all the state-of-the-art classification results in the “ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry” [3] More recent examples of this heated competition results were recently shown at the NIPS conference last week where teams at Microsoft Research produced breakthroughs in classification with an astounding 152 layer neural networks [4]. This intense competition between highly capable teams at universities and large internet companies is only possible because some large amount of training data is being made available.

Image or even video processing for hyperspectral imagery cannot follow the development of image processing that occurred for the past 40 years. The underlying reason stems from the fact that this development was performed at considerable expense by companies and governments alike and eventually yielded standards such as Jpegs, gif, Jpeg2000, mpeg, etc…Because such funding is no longer available we need to find ways of improving and making sense of this new imaging modality.

Technically, since hyperspectral imagery is still a niche market, most analysis performed in this field runs the risk of being seen as an outgrowth of normal imagery: i.e substandards tools such as JPEG or labor intensive computer vision tools are being used to classify and use this imagery without much thought into using the additional structure of the spectrum information. More sophisticated tools such as advanced matrix factorization (NMF, PCA, Sparse PCA, Dictionary learning, ….) in turn focus on the spectral information but seldomly use the spatial information. Both approaches suffer from not investigating more fully the inherent robust structure of this imagery.

For hyperspectral imagery to become mainstream, algorithms for compression and for its day-to-day use has to take advantage of the current very active and highly competitive development in Machine Learning algorithms. In short, creating large and rich hyperspectral imagery datasets beyond what is currently available ([5-8] is central for this technology to grow out its niche markets and become central in our everyday lives.

The proposal

In order to make hyperspectral imagery mainstream, I propose to use a XIMEA camera and shoot imagery and video of different objects, locations and label these datasets.

The datasets will then be made available on the internet for use by parties interested in performing classification competition based on them (Kaggle, academic competitions,...).

As a co-organizer of the meetup, I also intend on enlisting some of the folks in the Paris Machine Learning meetup group ( with close to 3000 members it is one of the largest Machine Learning meetup in the world [9]) to help in enriching this dataset.

The dataset should be available from servers probably colocated at a university or some non-profit organization (to be identified). A report presenting the dataset should be eventually academically citable.

References
[1]Learning Multiple Layers of Features from Tiny Images, Alex Krizhevsky, 2009, https://www.cs.toronto.edu/~kriz/cifar.html
[2] Imagenet dataset
[3] ImageNet Classification with Deep Convolutional Neural Networks, Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
[4] Microsoft researchers win ImageNet computer vision challenge, Microsoft researchers win ImageNet computer vision challenge - Next at Microsoft
[5] https://engineering.purdue.edu/~biehl/MultiSpec/hyperspectral.html
[6] T. Skauli and J. Farrell. “A collection of hyperspectral images for imaging systems research”. In Proceedings of the SPIE Electronic Imaging ‘2013
[7] Foster, D.H., Amano, K., Nascimento, S.M.C., & Foster, M.J. (2006). Frequency of metamerism in natural scenes. Journal of the Optical Society of America A, 23, 2359-2372., Hyperspectral images of natural scenes
[8] Parraga CA, Brelstaff G, Troscianko T, Moorhead IR, Journal of the Optical Society of America 15 (3): 563-569, 1998 or G. Brelstaff, A. Párraga, T. Troscianko and D. Carr, SPIE. Vol. 2587. Geog. Inf. Sys. Photogram. and Geolog./Geophys. Remote Sensing, 150-159, 1995
[9] Paris Machine Learning meetup, Paris Machine Learning Applications Group (Paris) - Meetup