Ifsttar PhD subject

 

French version

Detailed form :

Title : Discovery and modelling of interactions between users and transport infrastructure by the analysis of the video surveillance streams.

Main host Laboratory - Referent Advisor COSYS - LEOST  -  AMBELLOUIS Sébastien      tél. : +33 320438493 
Director of the main host Laboratory COCHERIL Yann  -  
PhD Speciality Traitement des images et du signal, apprentissage/modélisation
Axis of the performance contract 3 - COP2017 - Planning and protecting regions
Main location Lille-Villeneuve d'Ascq
Doctoral affiliation UNIVERSITE DES SCIENCES ET TECHNOLOGIE DE LILLE 1
PhD school SCIENCES POUR L'INGENIEUR (SPI)
Planned PhD supervisor LECOEUCHE Stéphane  -  MINES DOUAI  -  URIA
Planned financing Contrat doctoral extérieur   - Ecole des mines de douai

Abstract

Today, automatic solving transportation problem becomes active subject. In our PhD project, we aim to address a specific challenge in this domain: anomaly detection and tracking. Our ultimate goal is constructing a flexible and effective framework producing high performance on various public datasets. The context of our research is applying and improving previous successful approaches to achieve better results. We deal with two scenarios leading to two methods mentioned in following parts: (1) vehicles and road users segmentation and tracking by future predictions using classical hand-crafted generative methods based on optical flow estimation; (2) anomaly detection by future predictions using multi-channels deep generative frameworks and supervised learning.

Our first research is evaluating the performance of the classical hand-crafted generative approach in future prediction and its capability for improving segmentation and tracking. Recently, there existed various strong deep learning detectors \eg Mask R-CNN lead to an effective approach for tracking problem: tracking-by-detection. This very fast type of tracker considers only the Intersection-Over-Union (IOU) between bounding boxes to match objects without any other visual information. In contrast, the lack of visual information of IOU tracker combined with the failure detections of CNNs detectors create fragmented trajectories. We propose an enhanced tracker based on tracking by-detection and optical flow estimation in vehicle tracking scenarios. Our solution generates new detections or segmentations based on translating backward and forward results of CNNs detectors by optical flow vectors. This task can fill in the gaps of trajectories. The qualitative results show that our solution achieved stable performance with different types of flow estimation methods. Then we match generated results with fragmented trajectories by SURF features. DAVIS dataset is used for evaluating the best way to generate new detections. Finally, the entire process is tested on DETRAC dataset. The qualitative results show that our methods significantly improve the fragmented trajectories. For future work, we plan to apply CGANs streams of second work for the first task to propose a new competitive process of future prediction for segmentation and tracking.

Despite the moderate success of the first work, there is significant limitations of classical approaches to deal with our main task: anomaly detection. The lower frequency of abnormal events leads to an unbalanced scenario and the features of abnormal events usually do not follow any spatial or temporal relationship. It is also difficult to pre-define the structure or class of abnormal events. Facing to those challenge, most of state-of-the-art~(SOTA) anomaly detection methods are based on apparent motion and appearance reconstruction networks and use error estimation between generated and real information as detection features. These approaches achieve promising results by only using normal samples for training steps. In this thesis, our contributions are two-fold. On the one hand, we propose a flexible multichannel framework to generate multi-type frame-level features. On the other hand, we study how it is possible to improve the detection performance by supervised learning. The multi-channel framework is based on four Conditional GANs (CGANs) taking various types of appearance and motion information as input and producing prediction information as output. These CGANs provide a better feature space to represent the distinction between normal and abnormal events. Then, the difference between those generative and ground-truth pieces of information is encoded by Peak Signal-to Noise Ratio (PSNR). We propose to classify those features in a classical supervised scenario by building a small training set with some abnormal samples of the original test set of the dataset. The binary Support Vector Machine (SVM) is applied for frame-level anomaly detection. Finally, we use Mask R-CNN as a detector to perform object-centric anomaly localization. Our solution is largely evaluated on Avenue, Ped1, Ped2 and ShanghaiTech datasets. Our experiment results demonstrate that PSNR features combined with supervised SVM are better than error maps computed by previous methods. We achieve SOTA performance for frame-level AUC on Avenue, Ped1 and ShanghaiTech. Especially, for the most challenging Shanghaitech dataset, a supervised training model outperforms up to 9\% the SOTA on unsupervised strategy. Furthermore, we keep in progress several promising ways: building a new dataset for semi-supervised anomaly detection containing both normal and abnormal samples in its training set and applying one-class SVM to propose an end-to-end framework.

Keywords : Incremental Learning, Bayesian Inference, Data Mining, Dictionary learning, Image processing, Railway users security, VRU security, Activities and interactions discovering and modelling
List of topics
Applications closed