Fan-Yun (Willy) Sun
CS Ph.D. student at Stanford
fanyun [at]
Another me.
I love movies and sports, especially basketball and table tennis.


I am a third year Computer Science Ph.D. student at Stanford advised by the amazing Nick Haber and Jiajun Wu. My research interests are (graph) relational learning and semantic scene understanding. My ultimate goal is to enable (virtual) embodied AI agents to interact with their physical surroundings and other agents seamlessly.

Previously, I have had the privilege to work with Dan Yamins, Jure Leskovec, and Stefano Ermon during my first year at Stanford, Jian Tang at MILA, and Shou-De Lin at NTU. If you would like to learn more about me, here is my [ CV ] (updated Oct. 2021). Feel free to reach out to me at fanyun [at]!

Timeline & Experiences

2022 June. -
Research Scientist Intern @ Nvidia
2020 Sep. -
CS Ph.D. student @ Stanford
2019 Jan. - 2019 May
Visiting Researcher @ MILA
First authored two papers on (graph) ML.
See publications.
Supervisor: Prof. Jian Tang
2018 Mar. - 2018 Sep.
ML Engineer Intern @ Appier
Work on recommendation algorithms.
Supervisor: Prof. Hsuan-tien Lin
2018 Jan. - 2018 Feb.
Research intern @ WorldQuant
Conduct quantitative research in finance.
2017 - 2018
Microsoft Student Partner @ Microsoft
Workshop lecturer.
2017 July - 2017 Aug.
Software Engineering Intern @ Google
Android Taipei Team. [ poster ]
Supervisor: Logan Chien, Hung-ying Tyan
2015 - 2019
Undergraduate student & researcher @ NTU
Computer Science Department
Machine Discovery and Social Network Mining Laboratory & Multimedia indexing, Retrieval, and Analysis Lab
- (multi-agent) reinforcement learning
- computer vision
- time series prediction
He was Born


Partial-View Object View Synthesis via Filtering Inversion
We propose a framework that combines the strengths of generative modeling and network finetun-ing to generate photorealistic 3D renderings of real-world objects from sparse and sequential RGB inputs.
Interaction Modeling with Multiplex Attention
We present a forward prediction model that uses a multiplex latent graph to represent multiple independent types of interactions and attention to account for relations of different strengths and a progressive training strategy for the proposed model.
NeurIPS 2022 New Orleans, LA
Physion: Evaluating Physical Prediction from Vision in Humans and Machines
We presented a visual and physical prediction benchmark that measures ML algorithms' capabilities of predicting real world physics demonstrated how our benchmark can identify areas for improvement in physical understanding.

NeurIPS 2021, Datasets and Benchmarks Track
Equivariant Neural Network for Factor Graphs
In this paper, we identify factor graph isomorphism and introduce two neural network-based inference models that takes advantages of such inductive biases.
InfoGraph: Unsupervised and Semi-supervised Graph-Level Representation Learning via Mutual Information Maximization
This paper studies learning the representations of whole graphs in both unsupervised and semi-supervised scenarios using mutual information maximization.
ICLR 2020 (Spotlight) Addis Ababa, Ethiopia
vGraph: A Generative Model for Joint Community Detection and Node Representation Learning
In current literature, community detection and node representation learning are usually independently studied while they are actually highly correlated. We propose a probabilistic generative model called vGraph to learn community membership and node representation collaboratively. We also show that the framework of vGraph is quite flexible and can be easily extended to detect hierarchical communities.
NeurIPS, 2019 Vancouver, BC
A Regulation Enforcement Solution for Multi-agent Reinforcement Learning
In this paper, we proposed a framework to solve the following problem: In a decentralized environment, given that not all agents are compliant to regulations at first, can we develop a mechanism such that it is in the self-interest of non-compliant agents to comply after all. We utilized empirical game-theoretic analysis to justify our method.
AAMAS, 2019 Montreal, QC
Designing Non-greedy Reinforcement Learning Agents with Diminishing Reward Shaping
This paper intends to address an issue in multi-agent RL that when agents possessing varying capabilities. We introduce a simple method to train non-greedy agents with nearly no extra cost. Our model can achieve the following goals: non-homogeneous equality, only need local information, cost-effective, generalizable and configurable.
AAAI/ACM conference on AI, Ethics, Society 2018 (Oral) New Orleans, LA
A Memory-Network Based Solution for Multivariate Time-Series Forecasting
Inspired by Memory Network for solving the question-answering tasks, we proposed a deep learning based model named Memory Time-series network (MTNet) for time series forecasting. Additionally, the attention mechanism designed enable MTNet to be interpretable.
ANS: Adaptive Network Scaling for Deep Rectifier Reinforcement Learning Models
This work provides a thorough study on how reward scaling can affect performance of deep reinforcement learning agents. We also propose an Adaptive Network Scaling framework to find a suitable scale of the rewards during learning for better performance. We conducted empirical studies to justify the solution.
Organ At Risk Segmentation with Multiple Modality
In real world scenario, doctors often utilize multiple modalities. In this paper, we propose to use Generative Adversarial Network to perform CT to MR transformation to synthesize MR images instead of aligning two modalities. The synthesized MR can be jointly trained with CT to achieve better performance.

Selected Other Projects

Neural Network as Neural Network Input (2019)
In this paper, we extended the realm of graph benchmark dataset to computation graphs of neural networks. We web scraped tensorflow model files using github API and annotated them with associated meta data, and we benchmarked them using existing graph classification methods.
[ pdf ]
Intelligent Conversational Bot of TV / Movie (2017)
Designed and implemented an AI chatbot of TV / Movie. Involved in crawling data, training RNN-NLU for language understanding, experimenting RL-based dialogue tracker and seq2seq model for natural language generation.
Collaborative Editing of Vim Enhanced (2018)
Enhanced CoVim by eliminating the need of public IPs or a central server(utilize TCP Hole Punching).
E-monitor (2017)
MAIN IDEA: An android app integrated with IOT device to monitor electronic devices in real-time. To register a electronic device, simply scan QR code attached to the device. This project include hardward realization.