Held in conjunction with KDD'19
Aug 5, 2019 - Anchorage, Alaska
15th International Workshop on
Mining and Learning with Graphs

## Introduction

There is a great deal of interest in analyzing data that is best represented as a graph. Examples include the WWW, social networks, biological networks, communication networks, transportation networks, energy grids, and many others. These graphs are typically multi-modal, multi-relational and dynamic. In the era of big data, the importance of being able to effectively mine and learn from such data is growing, as more and more structured and semi-structured data is becoming available. The workshop serves as a forum for researchers from a variety of fields working on mining and learning from graphs to share and discuss their latest findings.
There are many challenges involved in effectively mining and learning from this kind of data, including:

• Understanding the different techniques applicable, including graph mining algorithms, network embeddings, graphical models, latent variable models, matrix factorization methods and more.
• Dealing with the heterogeneity of the data.
• The common need for information integration and alignment.
• Handling dynamic and changing data.
• Addressing each of these issues at scale.

Traditionally, a number of subareas have contributed to this space: communities in graph mining, learning from structured data, statistical relational learning, inductive logic programming, and, moving beyond subdisciplines in computer science, social network analysis, and, more broadly network science.

## Schedule

### TBD All accepted papers will be presented in the poster session. Based on scheduling constraints, some papers are selected as contributed talks or spotlight presentations.

Morning Sessions
8:45 amOpening Remarks & Best Paper Announcement
8:55 am
Keynote 1
Lise Getoor
The Power of Relational Learning
9:30 amCoffee Break
10:00 am
Contributed Talk
Improving Robustness to Attacks Against Vertex Classification
10:15 am
Contributed Talk
Node Embedding via Adaptive Similarities
10:30 am
Keynote 2
Austin Benson
Higher-order Link Prediction
11:05 am Poster Spotlights
11:40 am
Sponsors Overview
11:55 am Lunch Break + Poster Setup
Afternoon Sessions
1:05 pm
Keynote 3
Lada Adamic
Reflections of Social Networks
1:40 pm
Contributed Talk
Discovering Robustly Connected Subgraphs with Simple Descriptions
1:55 pm
Keynote 4
Vagelis Papalexakis
Tensor Decompositions for Multi-aspect Graph Analytics and Beyond
2:30 pm Coffee Break (+ Poster Session)
3:00 pm
Keynote 5
Huan Liu
Big Social Media Data and Challenges for KDD
3:35 pm
Contributed Talk
Graph-Based Recommendation with Personalized Diffusion
3:50 pm Closing Remarks
4:00 pm Poster Session

## Keynote Speakers

#### Vagelis Papalexakis

Assistant Professor
UC Riverside

#### Lada Adamic

Computational Social Scientist
Facebook

#### Austin Benson

Assistant Professor
Cornell

#### Huan Liu

Professor
Arizona State University

Professor
UC Santa Cruz

## Accepted Papers

### All accepted papers will present a poster, spotlights and talks are marked with and

The Sparse + Low Rank trick for Matrix Factorization-Based Graph Algorithms
Nathan De Lara

Abstract:Matrix factorization is a central block in many graph algorithms for embedding, clustering and many other tasks. This block is usually the computational bottleneck of these algorithms and a naive implementation can prevent them from scaling to large datasets. However, the matrices to factorize often have a closed-form "sparse~+~low rank" structure. In this paper, we show how to adapt state-of-the-art matrix factorization techniques to this class of matrices. We demonstrate that the method is highly competitive with respect to the naive implementation and that it comes at a very small extra cost compared to the decomposition of the sparse component alone.
@inproceedings{mlg2019_1,
title={The Sparse + Low Rank trick for Matrix Factorization-Based Graph Algorithms},
author={Nathan De Lara},
booktitle={Proceedings of the 15th International Workshop on Mining and Learning with Graphs (MLG)},
year={2019}
}

Deep Reinforcement Learning-based Approach to Tackle Competitive Influence Maximization
Tzu-Yuan Chung, Khurshed Ali and Chih-Yu Wang

Abstract:Competitive Influence Maximization (CIM) problem studies the competition among multiple parties where each party aims to maximize their profit while competing against other parties. Recently, Reinforcement-Learning based models have been proposed to address the CIM problem. However, such models are unscalable and incapable of handling changes in the network structure. Motivated by the recent success of Deep Reinforcement Learning models and their capability to handle complex problems, we propose a novel Deep Reinforcement learning based framework (DRIM) to address the multi-round competitive influence maximization problem. DRIM framework considers the community structure of the social network for budget allocation and feature extraction with deep Q network in order to reduce the computational time of seed selection. The proposed framework employs the quota-based ϵ-greedy policy to explore the optimality of influence maximization strategies and budget allocation for each community. Experimental results show that the proposed DRIM framework performs better than the state-of-art algorithms to tackle the multi-round CIM problem.
@inproceedings{mlg2019_3,
title={Deep Reinforcement Learning-based Approach to Tackle Competitive Influence Maximization},
author={Tzu-Yuan Chung, Khurshed Ali and Chih-Yu Wang},
booktitle={Proceedings of the 15th International Workshop on Mining and Learning with Graphs (MLG)},
year={2019}
}

Opinion Dynamics with Backfire Effect and Biased Assimilation
Xi Chen, Panayiotis Tsaparas, Jefrey Lijffijt and Tijl De Bie

Abstract:The democratization of AI tools for content generation, combined with unrestricted access to mass media for all (e.g. through microblogging and social media), makes it increasingly hard for people to distinguish fact from fiction. This raises the question of how individual opinions evolve in such a networked environment without grounding in a known reality. The dominant approach to studying this problem uses simple models from the social sciences on how individuals change their opinions when exposed to their social neighborhood, and applies them on large social networks. We propose a novel model that incorporates two known social phenomena: (i) Biased Assimilation: the tendency of individuals to adopt other opinions if they are similar to their own; (ii) Backfire Effect: the fact that an opposite opinion may further entrench someone in their stance, making their opinion more extreme instead of moderating it. To the best of our knowledge this is the first DeGroot-type opinion formation model that captures the Backfire Effect. A thorough theoretical and empirical analysis of the proposed model reveals intuitive conditions for polarization and consensus to exist, as well as the properties of the resulting opinions.
@inproceedings{mlg2019_4,
title={Opinion Dynamics with Backfire Effect and Biased Assimilation},
author={Xi Chen, Panayiotis Tsaparas, Jefrey Lijffijt and Tijl De Bie},
booktitle={Proceedings of the 15th International Workshop on Mining and Learning with Graphs (MLG)},
year={2019}
}

Analysis of Core and Truss Decompositions on Real-World Networks
Penghang Liu and A. Erdem Sarıyüce

Abstract:Finding the dense region in a graph is a crucial problem in network analysis. Core decomposition and truss decomposition address this problem from two different perspectives. The core decomposition is a vertex-driven approach that gives each vertex a core number based on the degree, while the truss decomposition is an edge- driven approach that gives each edge a truss number based on the triangles count. Some previous works explored the common patterns in real-world networks through a vertex-driven approach. Our ongoing research aims to explore the pervasive patterns and anomalies in real-world networks from both the vertex and edge perspective. We introduce an analysis of truss decomposition and its relation to core decomposition in various types of real-world networks. We first investigate the characteristics of the core and truss degeneracy of real-world networks as well as random graphs and check how the clique counts relate to those properties. Then we analyze the interplay between core and truss decomposition by checking the truss numbers (and triangle counts) of edges with respect to the core numbers (and degrees) of their end points.
@inproceedings{mlg2019_5,
title={Analysis of Core and Truss Decompositions on Real-World Networks},
author={Penghang Liu and A. Erdem Sarıyüce},
booktitle={Proceedings of the 15th International Workshop on Mining and Learning with Graphs (MLG)},
year={2019}
}

Image classification using topological features automatically extracted from graph representation of images
Liping Yang, Diane Oyen and Brendt Wohlberg

Abstract:The performance of machine learning methods is strongly dependent on the data representation (features) to which they are applied. For drawings in particular, we cannot rely on texture, color or shading information; there is little information present in a drawing beyond the spatial relationships and topology. A topological graph is an intuitive and powerful data structure and data representation framework that can capture the topological relations and spatial arrangement among entities present in images. In this paper, we use topological features automatically extracted from graph representations of images for image classification. Our approach is simple, intuitive, and generic. We compare our method against a traditional feature descriptor, histogram of oriented gradients (HOG) on the MNIST data set. The results demonstrate the effectiveness of our graph approach, especially when applied to small sets of training data. In addition, our method is very fast to train, and also much less sensitive to hyperparameters, requiring little hyperparameter fine tuning.
@inproceedings{mlg2019_7,
title={Image classification using topological features automatically extracted from graph representation of images},
author={Liping Yang, Diane Oyen and Brendt Wohlberg},
booktitle={Proceedings of the 15th International Workshop on Mining and Learning with Graphs (MLG)},
year={2019}
}

A Time-Aware Inductive Representation Learning Strategy for Heterogeneous Graphs
Bo Yan, Matthew Walker and Krzysztof Janowicz

Abstract:Graphs are versatile data structures that have permeated a large number of application fields, such as biochemistry, knowledge graphs, and social networks. As a result, different graph representation learning models have been proposed as effective approaches to represent graph components in downstream machine learning tasks, such as node classification and recommendation. However, most representation learning models in graphs do not natively work on heterogeneous graphs and consequently are not able to learn embeddings for different relations in the graph. In this paper, we extend and improve existing models by enabling an edge-based transformation procedure in order to learn embeddings for different relations in heterogeneous graphs. In addition, we show that by incorporating a sequential model to learn more expressive representations, temporal dynamics in social networks can be captured. Finally, we examine our model within the context of two very disparate heterogeneous graphs, a knowledge graph dataset and a professional social network dataset, to illustrate our point and show the effectiveness of our approach. By learning edge-based transformations, our model yields a Mean Reciprocal Rank score that is more than 4 times higher than the homogeneous counterpart for the knowledge graph dataset. By incorporating the temporal dynamics, our model improves the HITS@1 score by more than 15% compared with the baseline model for the professional social network dataset.
@inproceedings{mlg2019_9,
title={A Time-Aware Inductive Representation Learning Strategy for Heterogeneous Graphs},
author={Bo Yan, Matthew Walker and Krzysztof Janowicz},
booktitle={Proceedings of the 15th International Workshop on Mining and Learning with Graphs (MLG)},
year={2019}
}

When to Remember Where You Came from: Node Representation Learning in Higher-order Networks
Caleb Belth, Fahad Kamran, Donna Tjandra and Danai Koutra

Abstract:For trajectory data (e.g., flight itineraries) that tend to have beyond first-order (i.e., non-Markovian) dependencies, higher-order networks have been shown to accurately capture details lost with a standard (aggregate) network representation. At the same time, representation learning has shown success on a wide range of network tasks, removing the need to hand-craft features for these tasks. In this work, we propose a node representation learning framework called EVO or Embedding Variable Orders, which captures non-Markovian dependencies by combining work on higher-order networks with work on node embeddings. We show that EVO outperforms baselines in tasks where high-order dependencies are likely to matter, demonstrating the benefits of considering high-order dependencies in node embeddings. We also provide insights into when it does or does not help to capture these dependencies. To the best of our knowledge, this is the first work on representation learning for higher-order networks.
@inproceedings{mlg2019_10,
title={When to Remember Where You Came from: Node Representation Learning in Higher-order Networks},
author={Caleb Belth, Fahad Kamran, Donna Tjandra and Danai Koutra},
booktitle={Proceedings of the 15th International Workshop on Mining and Learning with Graphs (MLG)},
year={2019}
}

GroupINN: Grouping-based Interpretable Neural Network for Classification of Limited, Noisy Brain Data
Yujun Yan, Jiong Zhu, Marlena Duda, Eric Solarz, Chandra Sripada and Danai Koutra

Abstract:Mapping the human brain, or understanding how certain brain regions relate to specific aspects of cognition, has been and remains an active area of neuroscience research. Functional magnetic resonance imaging (fMRI) data—in the form of images, time series or graphs—are central in this research, but pose many challenges in phenotype prediction tasks (e.g., noisy, small training samples). Standardly employed handcrafted models and newly proposed neural network methods pose limitations in the expressive power and interpretability, respectively, in this context. In this work, focusing on functional graphs, a modality that partially handles some challenges of fMRI data, we propose a grouping-based interpretable neural network model, GroupINN, that effectively classifies cognitive performance from noisy fRMI-derived brain networks with 85% fewer model parameters than baseline deep models, while also identifying the most predictive brain sub-networks within several task-specific contexts. Our method incorporates the idea of node grouping into the design of the neural network. That way, unlike other methods that employ clustering as a pre-processing step to reorder nodes, GroupINN learns the node grouping and extracts graph features jointly. Experiments on task-based fMRI datasets show that our method is 2.6−69× faster than other neural network-based methods, while achieving comparable or better accuracy and providing interpretability.
@inproceedings{mlg2019_12,
title={GroupINN: Grouping-based Interpretable Neural Network for Classification of Limited, Noisy Brain Data},
author={Yujun Yan, Jiong Zhu, Marlena Duda, Eric Solarz, Chandra Sripada and Danai Koutra},
booktitle={Proceedings of the 15th International Workshop on Mining and Learning with Graphs (MLG)},
year={2019}
}

The Boundary Coefficient: a Vertex Measure for Visualizing and Finding Structure in Weighted Graphs
Robin Vandaele, Yvan Saeys and Tijl De Bie

Abstract:Graphs have emerged as powerful representations of data, from obvious examples such as social networks, to proximity graphs of high-dimensional metric data. Many of such real-world data sets share a common property: they have a well-hidden and much simpler graph-structured core, from which all data points emerge. Uncovering these core structures, in this paper termed backbones, often offers great insight into these data sets. However, standard methods for identifying these are computationally inefficient, sensitive to outliers, and lead to topological bias, prioritizing low-weight edges in dense regions, instead of spreading out smoothly across the topology underlying the graph. Furthermore, for high-dimensional metric data, standard methods for dimensionality reduction often fail to reveal the hidden topology. We resolve these issues by introducing the boundary coefficient (BC), a powerful vertex measure for locating core structure in data sets with an underlying graph-structured topology. Combining the BC with the newly proposed concept of f-pines, we propose a generally applicable method for revealing these structures in such data. We evaluate our method on a number of artificial and real-life data sets, demonstrating its wide range of applicability, superior effectiveness, robustness against noise, and scalability.
@inproceedings{mlg2019_13,
title={The Boundary Coefficient: a Vertex Measure for Visualizing and Finding Structure in Weighted Graphs},
author={Robin Vandaele, Yvan Saeys and Tijl De Bie},
booktitle={Proceedings of the 15th International Workshop on Mining and Learning with Graphs (MLG)},
year={2019}
}

Sensor Fusion and Structured Prediction for Cyberattack Event Networks
Alex Memory and Graham Mueller

Abstract:Early detection of cyberattacks – such as data breaches or ransomware – is critical to mitigate their effects. Despite advances in automated cyberattack sensors, many attacks are still detected many days or months after they occur. We propose a new approach using statistical relational learning to fuse cyberattack sensor outputs and generate attack predictions. Leveraging the graphical structures of both sensor outputs and cyberattack events themselves, we achieve higher accuracy than individual sensors by reasoning collectively over both sensors and attacks. Our predictions also are more useful to analysts because they are structured objects containing details of the predicted attacks. We conduct an extensive empirical evaluation of our approach using a database of real cyberattacks against a US corporation. We show that, relative to a sensors-only baseline, our approach increases accuracy by up to seven percent and doubles the lift of high-confidence predictions.
@inproceedings{mlg2019_14,
title={Sensor Fusion and Structured Prediction for Cyberattack Event Networks},
author={Alex Memory and Graham Mueller},
booktitle={Proceedings of the 15th International Workshop on Mining and Learning with Graphs (MLG)},
year={2019}
}

Unsupervised Construction of Knowledge Graphs From Text and Code
Kun Cao and James Fairbanks

Abstract:The scientific literature is a rich source of information for data mining with conceptual knowledge graphs; the open science movement has enriched this literature with complementary source code that implements scientific models. To exploit this new resource, we construct a knowledge graph using unsupervised learning methods to identify conceptual entities. We associate source code entities to these natural language concepts using word embedding and clustering techniques. Practical naming conventions for methods and functions tend to reflect the concept they implement. We take advantage of this specificity by presenting a novel process for joint clustering text concepts that combines word-embeddings, nonlinear dimensionality reduction, and clustering techniques to assist in understanding, organizing, and comparing software in the open science ecosystem. With our pipeline, we aim to assist scientists in building on existing models in their discipline when making novel models for new phenomena. By combining source code and conceptual information, our knowledge graph enhances corpus-wide understanding of scientific literature.
@inproceedings{mlg2019_16,
title={Unsupervised Construction of Knowledge Graphs From Text and Code},
author={Kun Cao and James Fairbanks},
booktitle={Proceedings of the 15th International Workshop on Mining and Learning with Graphs (MLG)},
year={2019}
}

Temporal Link Prediction in Dynamic Networks
Hemant Kasat, Sanket Markan, Manish Gupta and Vikram Pudi

Abstract:Link Prediction is an important task for evolutionary analysis of dynamic networks, where the goal is to predict links over time based on sequence of previous graph snapshots. Given a sequence of previous snapshots, we propose a temporal link function, SiameseLSTM, to predict probability of link formation for any pair of nodes in near future. We assume that nodes lie in a temporal latent space, gradually move as the network evolves over time and co-evolve with their neighbors in near future. The proposed model uses LSTM to project the temporal latent space embeddings of nodes to a hidden state latent space optimized for the downstream link prediction task. We use a Siamese adaption of LSTM for the hidden state embeddings to follow Structural Homophily : two nodes which are close to each other in the latent space interact with one another more frequently than two faraway nodes. We empirically show that our model outperforms state of the art algorithms for link prediction when evaluated on real world dynamic networks. We also show how varying number of previous snapshots used to exploit historical information affects the performance of the model to predict future links.
@inproceedings{mlg2019_22,
title={Temporal Link Prediction in Dynamic Networks},
author={Hemant Kasat, Sanket Markan, Manish Gupta and Vikram Pudi},
booktitle={Proceedings of the 15th International Workshop on Mining and Learning with Graphs (MLG)},
year={2019}
}

Improving Robustness to Attacks Against Vertex Classification
Benjamin Miller, Mustafa Çamurcu, Alexander Gomez, Kevin Chan and Tina Eliassi-Rad

Abstract:Vertex classification—the problem of identifying the class labels of nodes in a graph—has applicability in a wide variety of domains. Examples include classifying subject areas of papers in citation networks or roles of machines in a computer network. Recent work has demonstrated that vertex classification using graph convolutional networks is susceptible to targeted poisoning attacks, in which both graph structure and node attributes can be changed in an attempt to misclassify a target node. This vulnerability decreases users’ confidence in the learning method and can prevent adoption in high-stakes contexts. This paper presents work in progress aiming to make vertex classification robust to these types of attacks. We investigate two aspects of this problem: (1) the classification model and (2) the method for selecting training data. Our alternative classifier is a support vector machine (with a radial basis function kernel), which is applied to an augmented node feature-vector obtained by appending the node’s attributes to a Euclidean vector representing the node based on the graph structure. Our alternative methods of selecting training data are (1) to select the highest-degree nodes in each class and (2) to iteratively select the node with the most neighbors minimally connected to the training set. In the datasets on which the original attack was demonstrated, we show that changing the training set can make the network much harder to attack. To maintain a given probability of attack success, the adversary must use far more perturbations; often a factor of 2–4 over the random training baseline. Even in cases where success is relatively easy for the attacker, we show that the classification and training alternatives allow classification performance to degrade much more gradually, with weaker incorrect predictions for the attacked nodes.
@inproceedings{mlg2019_23,
title={Improving Robustness to Attacks Against Vertex Classification},
author={Benjamin Miller, Mustafa Çamurcu, Alexander Gomez, Kevin Chan and Tina Eliassi-Rad},
booktitle={Proceedings of the 15th International Workshop on Mining and Learning with Graphs (MLG)},
year={2019}
}

On-Device Algorithms for Public-Private Data with Absolute Privacy
Alessandro Epasto, Hossein Esfandiari and Vahab Mirrokni

Abstract:Motivated by the increasing need to preserve privacy in digital devices, we introduce the on-device public-private model of computation. Our motivation comes from social-network based recommender systems in which the users want to receive recommendations based on the information available on their devices, as well as the suggestions of their social contacts, without sharing such information or the social network with the central recommendation system. Our model allows us to solve many algorithmic problems while providing absolute (deterministic) guarantees of the privacy of on-device data and the user's social network. In fact, we ensure that the private data and private contacts are never revealed to the central system. Our restrictive model of computation presents several interesting algorithmic challenges because any computation based on the private information and the private social network must be performed on local devices of limited capabilities. Despite these challenges, under realistic assumptions of inter-device communication, we show several efficient algorithms for fundamental data mining and machine learning problems, ranging from k-means clustering to heavy hitters. We complement this analysis with strong impossibility results for efficient private algorithms without allowing inter-device communication. In our experimental evaluation, we show that our private algorithms provide results almost as accurate as those of the non-private ones while speeding up the on-device computations by orders of magnitude.
@inproceedings{mlg2019_24,
title={On-Device Algorithms for Public-Private Data with Absolute Privacy},
author={Alessandro Epasto, Hossein Esfandiari and Vahab Mirrokni},
booktitle={Proceedings of the 15th International Workshop on Mining and Learning with Graphs (MLG)},
year={2019}
}

Is a Single Embedding Enough? Learning Node Representations that Capture Multiple Social Contexts
Alessandro Epasto and Bryan Perozzi

Abstract:Recent interest in graph embedding methods has focused on learning a single representation for each node in the graph. But can nodes really be best described by a single vector representation? In this work, we propose a method for learning multiple representations of the nodes in a graph (e.g., the users of a social network). Based on a principled decomposition of the ego-network, each representation encodes the role of the node in a different local community in which the nodes participate. These representations allow for improved reconstruction of the nuanced relationships that occur in the graph -- a phenomenon that we illustrate through state-of-the-art results on link prediction tasks on a variety of graphs, reducing the error by up to $90\%$. In addition, we show that these embeddings allow for effective visual analysis of the learned community structure.
@inproceedings{mlg2019_25,
title={Is a Single Embedding Enough? Learning Node Representations that Capture Multiple Social Contexts},
author={Alessandro Epasto and Bryan Perozzi},
booktitle={Proceedings of the 15th International Workshop on Mining and Learning with Graphs (MLG)},
year={2019}
}

A distributable convex approach for graph structure discovery
Chao Han, Nouf Albarakati, Xi Hang Cao and Zoran Obradovic

Abstract:Discovering the interdependent structure among variables plays a key role in knowledge discovery in many real-world applications. Given a sequence of $p$ real-valued variables, the task is to estimate the entire graph structure with $p^2$ pairwise relationships. This problem is computationally challenging since the number of unknown relationships to estimate grows quadratically with respect to the number of variables. In order to solve this problem, many methods have been proposed to cast the structure discovery as an inverse covariance estimation problem by modeling the high dimensional data using a Gaussian graphical model. They focus on how to efficiently estimate the entire precision matrix by developing more advanced optimization algorithms in a sequential manner. A number of methods are also developed to select neighborhood or perform structure learning on categorical data, which are out of this study. A tuning-insensitive approach was proposed to estimate the precision matrix of Gaussian Graphical model in a distributed manner. But it over-parametrized the problem to achieve the tuning-insensitivity. In this study, we proposed a novel framework to discover the underlying graph structure in a distributed manner with a straightforward parametrization. The idea is to decompose the structure discovery task into multiple sub-tasks, such that a column of the precision matrix is estimated in a sub-task. We also proved the distributability and the convexity of the global task. Additionally, we empirically demonstrated the effectiveness and efficiency of our proposed framework by conducting extensive experiments with comparison to a number of state-of-the-art methods on synthetic datasets. Our case study on a real-world application is also demonstrated to be reliable. The preliminary work we presented here is novel, but it is still in progress as many extensions can be developed based on the proposed distributed framework.
@inproceedings{mlg2019_26,
title={A distributable convex approach for graph structure discovery},
author={Chao Han, Nouf Albarakati, Xi Hang Cao and Zoran Obradovic},
booktitle={Proceedings of the 15th International Workshop on Mining and Learning with Graphs (MLG)},
year={2019}
}

Discovering Interesting Cycles in Graphs
Florian Adriaens, Cigdem Aslay, Tijl De Bie, Aristides Gionis and Jefrey Lijffijt

Abstract:Cycles in graphs often signify interesting processes. For example, cyclic trading patterns can indicate inefficiencies or economic dependencies in trade networks, cycles in food webs can identify fragile dependencies in ecosystems, and cycles in financial transaction networks can be an indication of money laundering. Identifying such interesting cycles, which can also be constrained to contain a given set of query nodes, although not extensively studied, is thus a problem of considerable importance. In this paper, we introduce the problem of discovering interesting cycles in graphs. We first address the problem of quantifying the extent to which a given cycle is interesting for a particular analyst. We then show that finding cycles according to this interestingness measure is related to the longest cycle and maximum mean-weight cycle problems (in the unconstrained setting) and to the maximum Steiner cycle and maximum mean Steiner cycle problems (in the constrained setting). We show that the problems of finding the most interesting cycle and Steiner cycle are both NP-hard, and are NP-hard to approximate within a constant factor in the unconstrained setting, and within a factor polynomial in the input size for the constrained setting. We also show that the latter inapproximability result implies a similar result for the maximum Steiner cycle and maximum mean Steiner cycle problems. Motivated by these hardness results, we propose a number of efficient heuristic algorithms. Through extensive experiments, we verify the effectiveness of proposed methods and demonstrate their practical utility on real-world use cases.
@inproceedings{mlg2019_27,
title={Discovering Interesting Cycles in Graphs},
author={Florian Adriaens, Cigdem Aslay, Tijl De Bie, Aristides Gionis and Jefrey Lijffijt},
booktitle={Proceedings of the 15th International Workshop on Mining and Learning with Graphs (MLG)},
year={2019}
}

Explainable Subgraphs with Surprising Densities: A Subgroup Discovery Approach
Junning Deng, Bo Kang, Jefrey Lijffijt and Tijl De Bie

Abstract:The connectivity structure of graphs is typically related to the attributes of the nodes. In social networks for example, the probability of a friendship between any pair of people depends on a range of attributes, such as their age, residence location, workplace, and hobbies. The high-level structure of a graph can thus possibly be described well by means of patterns of the form `the subgroup of all individuals with a certain properties X are often (or rarely) friends with individuals in another subgroup defined by properties Y', in comparison to what is expected. Such rules present potentially actionable and generalizable insight into the graph. We present a method that finds node subgroup pairs between which the edge density is interestingly high or low, using an information-theoretic definition of interestingness. Additionally, the interestingness is quantified subjectively, to contrast with prior information an analyst may have about the connectivity. This view immediately enables iterative mining of such patterns.This is the first method aimed at graph connectivity relations between different subgroups. Our method generalizes prior work on dense subgraphs induced by a subgroup description. Although this setting has been studied already, we demonstrate for this special case considerable practical advantages of our subjective interestingness measure with respect to a wide range of (objective) interestingness measures.
@inproceedings{mlg2019_28,
title={Explainable Subgraphs with Surprising Densities: A Subgroup Discovery Approach},
author={Junning Deng, Bo Kang, Jefrey Lijffijt and Tijl De Bie},
booktitle={Proceedings of the 15th International Workshop on Mining and Learning with Graphs (MLG)},
year={2019}
}

MixHop: Higher-Order Graph Convolutional Architecturesvia Sparsified Neighborhood Mixing
Amol Kapoor, Sami Abu-El-Haija, Bryan Perozzi, Aram Galstyan, Greg Ver Steeg, Hrayr Harutyunyan, Kristina Lerman and Nazanin Alipourfard

Abstract:Existing popular methods for semi-supervisedlearning with Graph Neural Networks (such as theGraph Convolutional Network) provably cannotlearn a general class of neighborhood mixing rela-tionships. To address this weakness, we propose anew model, MixHop, that can learn these relation-ships, including difference operators, by repeat-edly mixing feature representations of neighborsat various distances. MixHop requires no addi-tional memory or computational complexity, andoutperforms on challenging baselines. In addition,we propose sparsity regularization that allows usto visualize how the network prioritizes neighbor-hood information across different graph datasets.Our analysis of the learned architectures revealsthat neighborhood mixing varies per datasets.
@inproceedings{mlg2019_29,
title={MixHop: Higher-Order Graph Convolutional Architecturesvia Sparsified Neighborhood Mixing},
author={Amol Kapoor, Sami Abu-El-Haija, Bryan Perozzi, Aram Galstyan, Greg Ver Steeg, Hrayr Harutyunyan, Kristina Lerman and Nazanin Alipourfard},
booktitle={Proceedings of the 15th International Workshop on Mining and Learning with Graphs (MLG)},
year={2019}
}

Latent Network Summarization
Di Jin, Ryan Rossi, Danai Koutra, Eunyee Koh, Sungchul Kim and Anup Rao

Abstract:Node representation learning is prevalent thanks to its success in a variety of downstream tasks. However, for real-world graphs with billions of nodes, handling dense node embeddings comes with computational and storage challenges, as embeddings often require space that is orders of magnitude larger than the original graph. In this work, we introduce the problem of latent network summarization that aims to learn a compact, latent representation of the graph structure with dimensionality that is independent of the input graph size (i.e., #nodes and #edges), while retaining the ability to derive node representations on the fly. To solve this problem, we propose Multi-LENS, an inductive multi-level latent network summarization approach that leverages a set of relational operators and relational functions (compositions of operators) to capture the structure of egonets and higher-order subgraphs, respectively. The structure is stored in low-rank, size-independent structural feature matrices, which along with the relational functions comprise our latent network summary. Multi-LENS is general and naturally supports both homogeneous and heterogeneous graphs with or without directionality, weights, attributes or labels. Extensive experiments on large real graphs show 2 − 89% improvement in AUC for link prediction, while requiring 79−2152× less output storage space than baseline embedding methods. As application areas, we show the effectiveness of Multi-LENS summaries in detecting anomalies and events in the Enron email communication graph and Twitter co-mention graph.
@inproceedings{mlg2019_31,
title={Latent Network Summarization},
author={Di Jin, Ryan Rossi, Danai Koutra, Eunyee Koh, Sungchul Kim and Anup Rao},
booktitle={Proceedings of the 15th International Workshop on Mining and Learning with Graphs (MLG)},
year={2019}
}

Graph Embeddings at Scale
C. Bayan Bruss, Anish Khazane, Jonathan Rider, Richard Serpe, Saurabh Nagrecha and Keegan Hines

Abstract:Graph embedding is a popular algorithmic approach for creating vector representations for individual vertices in networks. Training these algorithms at scale is important for creating embeddings that can be used for classification, ranking, recommendation and other common applications in industry. While industrial systems exist for training graph embeddings on large datasets, many of these distributed architectures are forced to partition copious amounts of data and model logic across many worker nodes. In this paper, we propose a distributed infrastructure that completely avoids graph partitioning, dynamically creates size constrained computational graphs across worker nodes, and uses highly efficient indexing operations for updating embeddings that allow the system to function at scale. We show that our system can scale to handle the open-source Friendster network (68 million vertices) and on an internal heterogeneous graph (50 million vertices) with a differing number of worker nodes in order to measure performance against two key quantitative metrics: link-prediction accuracy and rate of convergence. We conclude this work by analyzing how a greater number of worker nodes actually improves our system's performance on the aforementioned metrics and discuss our next steps for rigorously evaluating the embedding vectors produced by our system.
@inproceedings{mlg2019_32,
title={Graph Embeddings at Scale},
author={C. Bayan Bruss, Anish Khazane, Jonathan Rider, Richard Serpe, Saurabh Nagrecha and Keegan Hines},
booktitle={Proceedings of the 15th International Workshop on Mining and Learning with Graphs (MLG)},
year={2019}
}

End-to-end learning and optimization on graphs
Bryan Wilder, Eric Ewing, Bistra Dilkina and Milind Tambe

Abstract:Real-world applications often combine learning and optimization problems on graphs. For instance, our objective may be to cluster the graph in order to detect meaningful communities (or solve other common graph optimization problems such as maxcut, vertex cover, and so on). However, graphs or related attributes are often only partially observed, introducing learning problems such as link prediction which must be solved prior to optimization. We propose an approach to integrate a differentiable proxy for common graph optimization problems into training of machine learning models for tasks such as link prediction. This allows the model to focus specifically on the downstream task that its predictions will be used for. Experimental results show that our end-to-end system obtains better performance on example optimization tasks than can be obtained by combining state of the art link prediction methods with expert-designed graph optimization algorithms.
@inproceedings{mlg2019_35,
title={End-to-end learning and optimization on graphs},
author={Bryan Wilder, Eric Ewing, Bistra Dilkina and Milind Tambe},
booktitle={Proceedings of the 15th International Workshop on Mining and Learning with Graphs (MLG)},
year={2019}
}

Covariant Compositional Networks for Learning Graphs
Truong Son Hy, Shubhendu Trivedi, Horace Pan, Brandon Anderson and Risi Kondor

Abstract:We propose Covariant Compositional Networks (CCNs), a novel neural network architecture for learning on graphs. CCNs use tensor representations for vertex features which can then be manipulated with permutation covariant tensor operations as opposed to the standard symmetric operations used in other graph neural network models. These permutation covariant operations allow us to build more expressive graph representations while still maintaining permutation invariance. For learning small-scale molecular graphs, we investigate the efficacy of CCNs in estimating Density Functional Theory (DFT), a widely used but expensive approach to compute the electronic structure of matter. We obtain promising results in for this task and outperform other graph learning models on the Harvard Clean Energy Project and QM9 molecular datasets.
@inproceedings{mlg2019_36,
title={Covariant Compositional Networks for Learning Graphs},
author={Truong Son Hy, Shubhendu Trivedi, Horace Pan, Brandon Anderson and Risi Kondor},
booktitle={Proceedings of the 15th International Workshop on Mining and Learning with Graphs (MLG)},
year={2019}
}

Heterogeneous Graphlets
Ryan Rossi, Nesreen Ahmed, Aldo Carranza, David Arbour, Anup Rao, Sungchul Kim and Eunyee Koh

Abstract:In this work, we generalize the notion of network motifs (graphlets) to heterogeneous networks by introducing the notion of a small induced typed subgraph called typed graphlet. Typed graphlets generalize graphlets to rich heterogeneous networks as they explicitly capture the higher-order typed connectivity patterns in such networks. To address this problem, we describe a general framework for counting the occurrences of such typed graphlets. The proposed algorithms leverage a number of combinatorial relationships for different typed graphlets. For each edge, we count a few typed graphlets, and with these counts along with the combinatorial relationships, we obtain the exact counts of the other typed graphlets in o(1) constant time. Notably, the worst-case time complexity of the proposed approach matches the best known untyped algorithm. In addition, the approach lends itself to an efficient lock-free and asynchronous parallelization. The experiments confirm the approach is orders of magnitude faster and more space-efficient than existing methods. Unlike existing methods that take hours on small networks, the proposed approach takes only seconds on large networks with millions of edges. This gives rise to new opportunities and applications for typed graphlets on large real-world networks.
@inproceedings{mlg2019_38,
title={Heterogeneous Graphlets},
author={Ryan Rossi, Nesreen Ahmed, Aldo Carranza, David Arbour, Anup Rao, Sungchul Kim and Eunyee Koh},
booktitle={Proceedings of the 15th International Workshop on Mining and Learning with Graphs (MLG)},
year={2019}
}

Node Embedding via Adaptive Similarities
Dimitris Berberidis and Georgios B. Giannakis

Abstract:Node embedding is the task of extracting informative and descriptive features over the nodes of a graph. The importance of node embedding for graph analytics as well as learning tasks, such as node classification, link prediction, and community detection, has led to a growing interest and a number of recent advances. Nonetheless, node embedding faces several major challenges. Practical embedding methods have to deal with real-world graphs that arise from different domains, with inherently diverse underlying processes as well as similarity structures and metrics. On the other hand, similar to principal component analysis in feature vector spaces, node embedding is an inherently unsupervised task. Lacking metadata for validation, practical schemes motivate standardization and limited use of tunable hyperparameters. Finally, node embedding methods must be scalable in order to cope with large-scale real-world graphs of networks with ever-increasing size. The present work puts forth an adaptive node embedding framework that adjusts the embedding process to a given underlying graph, in a fully unsupervised manner. This is achieved by leveraging the notion of a tunable node similarity matrix that assigns weights on multihop paths. The design of multihop similarities ensures that the resultant embeddings also inherit interpretable spectral properties. The proposed model is thoroughly investigated, interpreted, and numerically evaluated using stochastic block models. Moreover, an unsupervised algorithm is developed for training the model parameters effieciently. Extensive node classification, link prediction, and clustering experiments are carried out on many real-world graphs from various domains, along with comparisons with state-of-the-art scalable and unsupervised node embedding alternatives. The proposed method enjoys superior performance in many cases, while also yielding interpretable information on the underlying graph structure.
@inproceedings{mlg2019_40,
title={Node Embedding via Adaptive Similarities},
author={Dimitris Berberidis and Georgios B. Giannakis},
booktitle={Proceedings of the 15th International Workshop on Mining and Learning with Graphs (MLG)},
year={2019}
}

A Fast Approximate Algorithm for k-Median Problem on a Graph
Keisuke Todo, Atsuyoshi Nakamura and Mineichi Kudo

Abstract:We propose a fast approximate algorithm for k-median problem on a graph, which is a problem of finding a set S of k vertices that minimizes the length sum of the shortest paths from all the vertices to their nearest vertex in S. Starting from an initial set S of k vertices, our algorithm iteratively updates S so as to improve the shortest-path-length sum for S. In each iteration, the algorithm calculates the shortest-path forest whose roots are vertices in S and replace S with S' that are the centers of the component tree in the forest. According to our experiments using pmed datasets, our algorithm is significantly faster than CPLEX and achieves better approximation ratio than the degree-centrality or betweenness-centrality based methods.
@inproceedings{mlg2019_41,
title={A Fast Approximate Algorithm for k-Median Problem on a Graph},
author={Keisuke Todo, Atsuyoshi Nakamura and Mineichi Kudo},
booktitle={Proceedings of the 15th International Workshop on Mining and Learning with Graphs (MLG)},
year={2019}
}

Graph-Based Recommendation with Personalized Diffusions
Athanasios N. Nikolakopoulos, Dimitris Berberidis, George Karypis and Georgios B Giannakis

Abstract:The present work introduces PerDif; a novel framework for learning personalized diffusions over item-to-item graphs for top-n recommendation. PerDif learns the teleportation probabilities of a time-inhomogeneous random walk with restarts capturing a user-specific underlying item exploration process. Such approach can lead to significant improvements in recommendation accuracy, while also providing useful information about the users in the system. Per-user fitting can be performed in parallel and very efficiently even in large-scale settings. A comprehensive set of experiments on real-world datasets demonstrate the scalability as well as the qualitative merits of the proposed framework. PerDif achieves high recommendation accuracy, outperforming state-of-the-art competing approaches—including several recently proposed methods relying on deep neural networks.
@inproceedings{mlg2019_43,
title={Graph-Based Recommendation with Personalized Diffusions},
author={Athanasios N. Nikolakopoulos, Dimitris Berberidis, George Karypis and Georgios B Giannakis},
booktitle={Proceedings of the 15th International Workshop on Mining and Learning with Graphs (MLG)},
year={2019}
}

Forecasting Social Interactions from Dynamic Graphs: A Case Study of Twitter, GitHub, and YouTube
Prasha Shresha, Suraj Maharjan, Dustin Arendt and Svitlana Volkova

Abstract:Most of the existing graph analytics focuses on learning from static rather than dynamic graphs using hand-crafted network features or recently emerged graph embeddings learned independently from a downstream predictive task, solving predictive rather than forecasting tasks directly. To address these limitations, we propose (1) a novel task -- forecasting over dynamic graphs, and (2) a novel deep learning, multi-task, node-aware attention model that focuses on forecasting social interactions, going beyond recently emerged approaches for learning dynamic graph embeddings. Our model relies on graph convolutions and recurrent layers to forecast social interactions in large samples of {\it real-world dynamic graphs\footnote{We use the term {\it graph} in this study, even though it is common to refer to these structures as {\it social networks}, to avoid any ambiguity with neural network terminology.}} -- Twitter, GitHub, and YouTube up to seven days in advance. Our model can successfully forecast (a) retweets and mentions of a specific news source on Twitter (focusing on deceptive and credible news sources), (b) user-repository interactions on GitHub (focusing on cryptocurrency ecosystems), (c) comments to a specific video on YouTube within the next day with mean absolute error less than 2\% and $R^2$ exceeding 0.78. We demonstrate that learning from connectivity information over time in combination with node embeddings yields better forecasting results than when we use a two-step approach with Node2Vec and DeepWalk. Moreover, by evaluating model generalizability across three social platforms with different types of interactions we provide novel insights on how the size of the training and forecasting windows, and graph topological properties influence forecasting performance.
@inproceedings{mlg2019_45,
title={Forecasting Social Interactions from Dynamic Graphs: A Case Study of Twitter, GitHub, and YouTube},
author={Prasha Shresha, Suraj Maharjan, Dustin Arendt and Svitlana Volkova},
booktitle={Proceedings of the 15th International Workshop on Mining and Learning with Graphs (MLG)},
year={2019}
}

A Process Graph Pattern Mining Algorithm for Discovering Structured Information Control Nets
Kyoung-Sook Kim, Dinh-Lam Pham and Kwang-Hoon Kim

Abstract:This paper develops an algorithm that is able to discover graph process patterns from process enactment event logs, implements the algorithm as a process graph mining system, and carries out a couple of operational experiments with process event log datasets. The core is the rho-Algorithm that ought to be a novel approach for rediscovering all the structured process graph patterns, such as linear (sequential), disjunctive (selective-OR), conjunctive (parallel-AND), and repetitive (iterative-LOOP) process patterns, of the information control nets from an XES-formatted process enactment event log dataset. We prove that the proposed process graph pattern mining algorithm is able to complete the process rediscovery functionalities successfully through fulfilling a series of experimental studies. Additionally, we discuss those future issues of the process graph pattern discovery and rediscovery.
@inproceedings{mlg2019_48,
title={A Process Graph Pattern Mining Algorithm for Discovering Structured Information Control Nets},
author={Kyoung-Sook Kim, Dinh-Lam Pham and Kwang-Hoon Kim},
booktitle={Proceedings of the 15th International Workshop on Mining and Learning with Graphs (MLG)},
year={2019}
}

Discovering Robustly Connected Subgraphs with Simple Descriptions
Janis Kalofolias, Mario Boley and Jilles Vreeken

Abstract:We study the problem of discovering robustly connected subgraphs that have simple descriptions. That is, our aim is to discover sets of nodes for which the induced subgraph is not only difficult to shatter into disconnected components, but, for which the nodes can also be selected from the entire graph with just a simple conjunctive query on the vertex attributes. As many subgraphs do not have a simple description, first mining robust subgraphs, and then post-hoc discovering their description leads to suboptimal results. Instead, we hence propose to optimise over describable subgraphs only. To do so efficiently, we propose a non-redundant iterative deepening approach, which we equip with a linear-time tight optimistic estimator that allows us to prune large parts of the search space. Through extensive empirical evaluation we show that our method can consider large real-world graphs, and discovers not only easily interpretable but also meaningful subgraphs.
@inproceedings{mlg2019_49,
title={Discovering Robustly Connected Subgraphs with Simple Descriptions},
author={Janis Kalofolias, Mario Boley and Jilles Vreeken},
booktitle={Proceedings of the 15th International Workshop on Mining and Learning with Graphs (MLG)},
year={2019}
}

Is PageRank All You Need for Scalable Graph Neural Networks?
Aleksandar Bojchevski, Johannes Klicpera, Bryan Perozzi, Martin Blais, Amol Kapoor, Michal Lukasik and Stephan Günnemann

Abstract:Graph neural networks (GNNs) have emerged as a powerful approach for solving many network mining tasks. However, efficiently utilizing them on web-scale data remains a challenge despite related advances in research. Most recently proposed scalable GNNs rely on an expensive recursive message-passing procedure to propagate information through the graph. We circumvent this limitation by leveraging connections between GNNs and personalized PageRank and we develop a model that incorporates multi-hop neighborhood information in a single (non-recursive) step. Our work-in-progress approach PPRGo is significantly faster than multi-hop models while maintaining state-of-the-art prediction performance. We demonstrate the strengths and scalability of our approach on graphs orders of magnitude larger than typically considered in the literature.
@inproceedings{mlg2019_50,
title={Is PageRank All You Need for Scalable Graph Neural Networks?},
author={Aleksandar Bojchevski, Johannes Klicpera, Bryan Perozzi, Martin Blais, Amol Kapoor, Michal Lukasik and Stephan Günnemann},
booktitle={Proceedings of the 15th International Workshop on Mining and Learning with Graphs (MLG)},
year={2019}
}

## Call for Papers

This workshop is a forum for exchanging ideas and methods for mining and learning with graphs, developing new common understandings of the problems at hand, sharing of data sets where applicable, and leveraging existing knowledge from different disciplines. The goal is to bring together researchers from academia, industry, and government, to create a forum for discussing recent advances graph analysis. In doing so, we aim to better understand the overarching principles and the limitations of our current methods and to inspire research on new algorithms and techniques for mining and learning with graphs.

To reflect the broad scope of work on mining and learning with graphs, we encourage submissions that span the spectrum from theoretical analysis to algorithms and implementation, to applications and empirical studies. As an example, the growth of user-generated content on blogs, microblogs, discussion forums, product reviews, etc., has given rise to a host of new opportunities for graph mining in the analysis of social media. We encourage submissions on theory, methods, and applications focusing on a broad range of graph-based approaches in various domains.

Topics of interest include, but are not limited to:

• Theoretical aspects:
• Computational or statistical learning theory related to graphs
• Theoretical analysis of graph algorithms or models
• Sampling and evaluation issues in graph algorithms
• Analysis of dynamic graphs
• Algorithms and methods:
• Graph mining
• Probabilistic and graphical models for structured data
• Heterogeneous/multi-model graph analysis
• Network embedding models
• Statistical models of graph structure
• Combinatorial graph methods
• Semi-supervised learning, active learning, transductive inference, and transfer learning in the context of graph
• Applications and analysis:
• Analysis of social media
• Analysis of biological networks
• Knowledge graph construction
• Large-scale analysis and modeling

We welcome many kinds of papers, such as, but not limited to:

• Novel research papers
• Demo papers
• Work-in-progress papers
• Visionary papers (white papers)
• Appraisal papers of existing methods and tools (e.g., lessons learned)
• Relevant work that has been previously published
• Work that will be presented at the main conference

Authors should clearly indicate in their abstracts the kinds of submissions that the papers belong to, to help reviewers better understand their contributions.
All papers will be peer reviewed, single-blinded. Submissions must be in PDF, no more than 8 pages long — shorter papers are welcome — and formatted according to the standard double-column ACM Proceedings Style.
The accepted papers will be published on the workshop’s website and will not be considered archival for resubmission purposes.
Authors whose papers are accepted to the workshop will have the opportunity to participate in a spotlight and poster session, and some set will also be chosen for oral presentation.

For paper submission, please proceed to the submission website.

Please send enquiries to chair@mlgworkshop.org.

To receive updates about the current and future workshops and the Graph Mining community, please join the Mailing List, or follow the Twitter Account.

## Important Dates

Paper Submission Deadline: May 12, 2019

Author Notification: June 7, 2019

Camera Ready: June 22, 2019

Workshop: August 5, 2019

## Workshop Organizers

#### Shobeir Fakhraei

Research Scientist
U. of Southern California (ISI)

#### Aude Hofleitner

Research Scientist Manager
Facebook

#### Danai Koutra

Assistant Professor
University of Michigan Ann Arbor

#### Julian McAuley

Assistant Professor
University of California San Diego

#### Bryan Perozzi

Research Scientist
Google Research

#### Tim Weninger

Assistant Professor
University of Notre Dame

## Program Committee

Zhongfei Zhang (Binghamton University)
Jan Ramon (INRIA)
Neil Shah (Snap Research)
Stefan Wrobel (Fraunhofer IAIS and Univ. of Bonn)
Yizhou Sun (University of California, Los Angeles)
Xin-Zeng Wu (Information Sciences Institute)
Kristian Kersting (TU Darmstadt)
Ali Pinar (Sandia National Laboratories)
Christian Bauckhage (Fraunhofer)
Xiang Ren (University of Southern California)
Leto Peel (Universite catholique de Louvain)
Ana Paula Appel (IBM Research Brazil)
Stefano Leucci (Max Planck Institute für Informatik)
Bailey Fosdick (Colorado State University)
Larry Holder (Washington State University)

Alessandro Epasto (Google Research)
Leman Akoglu (Carnegie Mellon University)
Jundong Li (Arizona State University)
Aris Anagnostopoulos (Sapienza University of Rome)
Acar Tamersoy (Symantec Research Labs)
Sucheta Soundarajan (Syracuse University)
Emilio Ferrara (University of Southern California)
Hocine Cherifi (University of Burgundy)
Yinghui Wu (Washington State University)
Ivan Brugere (Salesforce Research)
William Hamilton (McGill University)
Stratis Ioannidis (Northeastern University)
David Gleich (Purdue University)
Hanghang Tong (Arizona State University)