Student Projects

photo

Extending graph autoencoders to model link prediction in heterogeneous graphs

Graph autoencoders have been successfully used to model various types of graphs (social networks, knowledge graphs, etc.) in order to conduct analysis tasks such as link prediction and vertex classification. Existing models focus on homogeneous graphs, i.e. graphs where no semantics is assigned to nodes and relations. However, real-life graphs contain much richer information. For example, recommendation networks contain nodes representing persons (the recommenders) and nodes representing products or brands, as well as person-product relations such as recommend, like or dislike, and person-person relations such as trust or distrust. Additionally, nodes and/or edges in these networks contain specific features such as the number of stars given to a product, the text associated to reviews, etc.

The goal of this project was to propose extensions of graph autoencoders for heterogeneous social networks and knowledge graphs, with a focus on link prediction as the end task. In particular, we proposed a new graph autoencoder featuring an enhanced variant of the relational graph convolutional network (R-GCN) model as the encoder. The proposed enhancement consisted in equipping R-GCNs with an attention mechanism allowing it to better discriminate between different types of nodes and relations.

The outcomes of this project, in particular those concerning social graphs, contributed to FNR-funded project PrivDA. This work was conducted by Ms. Guina Weng as part of her Master's thesis, under the co-direction of Prof. Sjouke Mauw, Dr. Xihui Chen and myself. Ms. Weng successfully defended her thesis on December 16th, 2021.

Active re-identification attacks on periodically released social graphs

Social networking sites are one of the most popular online services of our time. In consequence, social network owners are now in possession of an enormous volume of information in the form of social graphs, i.e. graphs where the vertices represent users and edges represent relations. The analysis of social graphs provides useful information for social scientists or other analysts who benefit from this data to extract global statistics and conduct behavioral studies. However, the sensitive nature of the personal information contained in the network (for example, the existence of relations between users, religious beliefs, etc.) raises important privacy concerns that need to be addressed before the information can be released.

This project focused on a family of privacy attacks on published social graphs, which are called active re-identification attacks. A re-identification attack allows to map the nodes of the published graph to the real users they represent. That allows the attacker to obtain potentially private information such as the existence of relations, community co-affiliation, etc. In particular, an active attacker is one with the capacity to enrol fake users (or sybils) in the network. The sybils interact with the targeted victims and create unique connection fingerprints which are later used to re-identify the victims in the published graph.

So far, the threat posed by active adversaries had only been studied in the scenario where one static view of the social graph is published. However, analysts need dynamic social graphs to conduct many tasks, e.g. analyzing the speed at which rumors or fake news spread. Consequently, the privacy risks in dynamic graph publication need to be better understood. In light of this need, this project contributed a foundational step in this direction. The core of the study was the extension of the active attack model to the scenario where snapshots of a dynamic social graph are periodically published. Under the extended adversary model, we proposed several new attack strategies allowing the adversary to re-identify users along the sequence of snapshots, increase the number of targeted victims over time, and use information from one snapshot to correct re-identification errors in the previous ones. We experimentally showed the increased capabilities of active attackers in the new scenario. As an auxiliary tool for these experiments, we developed a library of periodically released dynamic graph simulators, which can be easily extended with new network growth models, attacks, and perturbation mechanisms. The new attacks introduced in this work, as well as the experimental setting developed within it, will serve as an evaluation environment on which new privacy-preserving periodical graph publication methods will be designed and their effectiveness against active attackers will be tested.

This work contributed to the FNR-funded project PrivDA, and was conducted by Ms. Ema Këpuska as part of her Master's thesis, under the co-direction of Prof. Sjouke Mauw, Dr. Xihui Chen, and myself. Ms. Këpuska successfully defended her thesis on September 5th, 2019. The work conducted within this student project was the basis of a paper presented at ESORICS 2020.

Improving micro-aggregation based trajectory anonymisation

The pervasive use of mobile technologies, GPS-equipped vehicles, etc., has resulted in the availability of massive amounts of trajectory data. Trajectory databases offer a vast application potential for researchers, enterprises, public administrations, among others. For example, transport authorities can use trajectory databases for designing better public transportation networks, optimizing routes and resource consumption, which in turn benefits the environment by reducing pollution and improves the quality of life of the population by reducing traffic jams, commute times, etc. Despite their usefulness, the use of these databases raises important privacy concerns due to the sensitive nature of location information. For example, most persons will probably have no objection with others knowing that they pass by the central train station on their every-day commutes, but most would prefer not to disclose information regarding whether, and how often, they visit some place of worship, or health institution, etc.

In view of these concerns, it is important to properly sanitise trajectory data before its use. Numerous anonymisation techniques have been proposed for trajectory data. Within this project, we focused on techniques based on the notion of k-anonymity. A k-anonymous dataset guarantees that every record is indistinguishable from at least other k - 1 different records, ensuring that the probability of an entry being uniquely identifiable is at most 1/k. In particular, we devised techniques implementing the following three-step process:

  1. Cluster trajectories by proximity
  2. Align sets of points along clustered trajectories
  3. Apply micro-aggregation operators on each set of aligned points

The outcomes of the project were three new trajectory clustering algorithms and a new micro-aggregation operator for sets of points.

This project was conducted by Mr. Xhulio Zekaj as part of his Master's thesis, under the co-direction of Prof. Sjouke Mauw and myself. Mr. Zekaj successfully defended his thesis on February 4th, 2019.