Social networking sites are one of the most popular online services of our time. In consequence, social network owners are now in possession of an enormous volume of information in the form of social graphs, i.e. graphs where the vertices represent users and edges represent relations. The analysis of social graphs provides useful information for social scientists or other analysts who benefit from this data to extract global statistics and conduct behavioral studies. However, the sensitive nature of the personal information contained in the network (for example, the existence of relations between users, religious beliefs, etc.) raises important privacy concerns that need to be addressed before the information can be released.
This project focused on a family of privacy attacks on published social graphs, which are called active re-identification attacks. A re-identification attack allows to map the nodes of the published graph to the real users they represent. That allows the attacker to obtain potentially private information such as the existence of relations, community co-affiliation, etc. In particular, an active attacker is one with the capacity to enrol fake users (or sybils) in the network. The sybils interact with the targeted victims and create unique connection fingerprints which are later used to re-identify the victims in the published graph.
So far, the threat posed by active adversaries had only been studied in the scenario where one static view of the social graph is published. However, analysts need dynamic social graphs to conduct many tasks, e.g. analyzing the speed at which rumors or fake news spread. Consequently, the privacy risks in dynamic graph publication need to be better understood. In light of this need, this project contributed a foundational step in this direction. The core of the study was the extension of the active attack model to the scenario where snapshots of a dynamic social graph are periodically published. Under the extended adversary model, we proposed several new attack strategies allowing the adversary to re-identify users along the sequence of snapshots, increase the number of targeted victims over time, and use information from one snapshot to correct re-identification errors in the previous ones. We experimentally showed the increased capabilities of active attackers in the new scenario. As an auxiliary tool for these experiments, we developed a library of periodically released dynamic graph simulators, which can be easily extended with new network growth models, attacks, and perturbation mechanisms. The new attacks introduced in this work, as well as the experimental setting developed within it, will serve as an evaluation environment on which new privacy-preserving periodical graph publication methods will be designed and their effectiveness against active attackers will be tested.
This work contributed to the FNR-funded Junior-CORE project PrivDA, and was conducted by Ms. Ema Këpuska as part of her Master's thesis, under the co-direction of Prof. Sjouke Mauw, Dr. Xihui Chen, and myself. Ms. Këpuska successfully defended her thesis on September 5th, 2019. The thesis report can be found here and the library developed within the thesis can be found here.
The pervasive use of mobile technologies, GPS-equipped vehicles, etc., has resulted in the availability of massive amounts of trajectory data. Trajectory databases offer a vast application potential for researchers, enterprises, public administrations, among others. For example, transport authorities can use trajectory databases for designing better public transportation networks, optimizing routes and resource consumption, which in turn benefits the environment by reducing pollution and improves the quality of life of the population by reducing traffic jams, commute times, etc. Despite their usefulness, the use of these databases raise important privacy concerns due to the sensitive nature of location information. For example, most persons will probably have no objection with others knowing that they pass by the central train station on their every-day commutes, but most would prefer not to disclose information regarding whether, and how often, they visit some place of worship, or health institution, etc.
In view of these concerns, it is important to properly sanitise trajectory data before its use. Numerous anonymisation techniques have been proposed for trajectory data. Within this project, we focused on techniques based on the notion of k-anonymity. A dataset satisfying k-anonymity guarantees that every record is indistinguishable from at least other k - 1 different records, ensuring that the probability of an entry being uniquely identifiable is at most 1/k. In particular, we focused on techniques based on clustering trajectories by proximity, aligning sets of points along clustered trajectories, and applying micro-aggregation on these sets of points. As a result of this project, we obtained three new trajectory clustering algorithms and a new micro-aggregation method for trajectory obfuscation.
This project was conducted by Mr. Xhulio Zekaj as part of his Master's thesis, under the co-direction of Prof. Sjouke Mauw and myself. Mr. Zekaj successfully defended his thesis on February 4th, 2019.