Ifsttar PhD subject

 

French version

Detailed form :

Title : Multi-source data fusion for mobility analysis

Main host Laboratory - Referent Advisor COSYS - GRETTIA  -  COME Etienne      tél. : +33 181668718 
Director of the main host Laboratory OUKHELLOU Latifa  -  
Laboratory 2 - Referent Advisor COSYS - LICIT  -  FURNO angelo  -  
PhD Speciality Mathématiques appliqués - Informatique
Axis of the performance contract 1 - COP2017 - Efficient transport and safe travel
Main location Marne-la-Vallée
Doctoral affiliation UNIVERSITE PARIS-EST
PhD school MATHEMATIQUES ET SCIENCES ET TECHNOLOGIES DE L'INFORMATION ET DE LA COMMUNICATION (MSTIC)
Planned PhD supervisor OUKHELLOU Latifa  -  Université Gustave Eiffel  -  COSYS - GRETTIA
Planned PhD co-supervisor EL FAOUZI Nour-Eddin  -  Université Gustave Eiffel  -  COSYS - LICIT-ECO7
Planned financing Contrat doctoral  - Ifsttar

Abstract

This thesis focuses on generating and using safe and accurate data to describe the mobility of people in an urban environment. The main focus of this work is the Origin-Destination (OD) matrices obtained from mobile phone data, which describe the flows of population between the zones of a city. These data are characterized by huge volumes, which call for light-weight processing solutions, and a high variety, which imply a privacy risk for outliers.

In a first part, we develop an algorithm to efficiently guarantee the k-anonymization of such OD matrices via generalization and suppression. Our method implements a hard constraint on the number of trips that can be suppressed, in order to maintain the representativity of the data.
The spatial generalization is formalized as a knapsack problem with a dependency tree, whose dual can be efficiently solved using the Some Breakpoints Algorithm.
We also study the solving of the relaxation of the problem, which does not guarantee a maximum number of suppressed trips but instead a maximum level of generalization. We compare our approaches to an extensive benchmark of the state of the art in anonymization on a collection of large-scale OD matrices.

In a second part, we propose two steps to generate more realistic synthetic travel demand using dynamic OD matrices.
In the first step, we calibrate the temporal distribution of trips made during the day by formalizing it as hierarchical population problem. In the second step, we draw activity locations using the OD matrices as transition probabilities in a probabilistic graph model. We illustrate a pitfall in the estimation of such a model when implementing basic agenda constraints, such as the fact that the "home" locations must all be equal. These added dependencies create cycles in the graph which invalidate the direct use of the OD matrices as maximum likelihood estimators. Instead, we propose an heuristic adaptation to estimate the parameters of the model. Then, we implement a variety of approaches corresponding to different trade-offs between matching the OD matrices and matching the surveys. This allows us to give a quantitative measure of the discrepancies between the OD matrices and the HTS, which are known to exist but hard to measure as the two sources do not describe the same objects.

This work is part of a context of a recent multiplication of available data sources for transportation studies. In particular, passive sources such as mobile phone data collect traveler's information without input on their part and mostly without their knowing. They carry invaluable insights into the dynamics of travel demand due to their unequaled penetration rate, but are also an ethic liability due to their monitoring potential. By guaranteeing a foolproof anonymization of the data and illustrating its use in travel demand synthesis, we aim at addressing its problem of privacy while at the same time leveraging it to produce a realistic, exhaustive overview of urban transportation.

Keywords : Mobility, multi-source data, data fusion, machine learning, data mining, Artificial intelligence
List of topics
Applications closed