João Pereira

Amsterdam ❌❌❌

The Netherlands

I am a Senior Data Scientist at adidas in the Global Digital Data Science team in Amsterdam where I am responsible for the technical lead of two demand forecasting products. Prior to joining adidas, I completed my Engineering Doctorate (EngD) in Data Science at Eindhoven University of Technology, where I worked on projects for companies like ASML (2x), Van Lanschot, TE Connectivity, Heijmans, and FIOD. Before the EngD, I conducted deep learning research on anomaly detection in time series data and completed my Master’s and Bachelor’s degree in Electrical & Computer Engineering at Instituto Superior Técnico (Portugal). Meanwhile, I was lucky to have spent a semester at Université Catholique de Louvain (Belgium). I am broadly interested in the field of machine learning and its endless applications and passionate about turning cutting-edge research into products.

In my free time, I enjoy working out, meeting friends, traveling, and cooking new dishes.

Feel free to reach out to me with any queries!

News

Sep 22, 2023	Our adidas X AWS paper on "Probabilistic Demand Forecasting with Graph Neural Networks" was presented at ECML-PKDD23'!
Jun 20, 2023	Published a new Medium article: “Fine-tune MPT-7B on Amazon SageMaker”
Mar 20, 2023	I joined the AWS weekly series Build on Generative AI.
Mar 3, 2023	My Medium article on “Fast and scalable hyperparameter tuning and cross-validation in AWS SageMaker” is out!
Jan 15, 2023	I took up a new role at adidas as Senior Data Scientist!
May 20, 2020	I gave a talk on "Anomaly Detection with Variational Autoencoders".

Selected Publications

Probabilistic Demand Forecasting with Graph Neural Networks citations

Kozodoi, Nikita, Zinovyeva, Liza, Valentin, Simon and 2 more authors

In ECML-PKDD 2023 International Workshop on Machine Learning for Irregular Time Series 2023

Bib HTML PDF

@inproceedings{Kozodoi2023,
  author = {Kozodoi, Nikita and Zinovyeva, Liza and Valentin, Simon and Pereira, João and Agundez, Rodrigo},
  title = {Probabilistic Demand Forecasting with Graph Neural Networks},
  year = {2023},
  url = {https://www.amazon.science/publications/probabilistic-demand-forecasting-with-graph-neural-networks},
  booktitle = {ECML-PKDD 2023 International Workshop on Machine Learning for Irregular Time Series},
  citation_count_index = {6}
}

Unsupervised Anomaly Detection in Energy Time Series Data Using Variational Recurrent Autoencoders with Attention 162 citations

Pereira, João, and Silveira, Margarida

In 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA) Dec 2018

Abs Bib HTML PDF Slides

In the age of big data, time series are being generated in massive amounts. In the energy field, smart grids are enabling a unprecedented data acquisition with the integration of sensors and smart devices. In the context of renewable energies, there has been an increasing interest in solar photovoltaic energy generation. These installations are often integrated with smart sensors that measure the energy production. Such amount of data collected makes the quest for developing smart monitoring systems that can detect anomalous behaviour in these systems, trigger alerts and enable maintenance operations. In this paper, we propose a generic, unsupervised and scalable framework for anomaly detection in time series data, based on a variational recurrent autoencoder. Furthermore, we introduce attention in the model, by means of a variational self-attention mechanism (VSAM), to improve the performance of the encoding-decoding process. Afterwards, we perform anomaly detection based on the probabilistic reconstruction scores provided by our model. Our results on solar energy generation time series show the ability of the proposed approach to detect anomalous behaviour in time series data, while providing structured and expressive representations. Since it does not need labels to be trained, our methodology enables new applications for anomaly detection in energy time series data and beyond.
@inproceedings{Pereira2018ICMLA, author = {Pereira, João and Silveira, Margarida}, booktitle = {2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA)}, year = {2018}, month = dec, publisher = {IEEE}, volume = {}, number = {}, pages = {1275-1282}, doi = {10.1109/ICMLA.2018.00207}, url = {https://ieeexplore.ieee.org/document/8614232}, citation_count_index = {0} }
Learning Representations from Healthcare Time Series Data for Unsupervised Anomaly Detection 74 citations

Pereira, João, and Silveira, Margarida

In 2019 IEEE International Conference on Big Data and Smart Computing (BigComp) Feb 2019

Abs Bib HTML PDF Slides

The amount of time series data generated in Healthcare is growing very fast and so is the need for methods that can analyse these data, detect anomalies and provide meaningful insights. However, most of the data available is unlabelled and, therefore, anomaly detection in this scenario has been a great challenge for researchers and practitioners. Recently, unsupervised representation learning with deep generative models has been applied to find representations of data, without the need for big labelled datasets. Motivated by their success, we propose an unsupervised framework for anomaly detection in time series data. In our method, both representation learning and anomaly detection are fully unsupervised. In addition, the training data may contain anomalous data. We first learn representations of time series using a Variational Recurrent Autoencoder. Afterwards, based on those representations, we detect anomalous time series using Clustering and the Wasserstein distance. Our results on the publicly available ECG5000 electrocardiogram dataset show the ability of the proposed approach to detect anomalous heartbeats in a fully unsupervised fashion, while providing structured and expressive data representations. Furthermore, our approach outperforms previous supervised and unsupervised methods on this dataset.
@inproceedings{Pereira2019BigComp, author = {Pereira, João and Silveira, Margarida}, citation_count_index = {1}, booktitle = {2019 IEEE International Conference on Big Data and Smart Computing (BigComp)}, title = {Learning Representations from Healthcare Time Series Data for Unsupervised Anomaly Detection}, year = {2019}, month = feb, publisher = {IEEE}, volume = {}, number = {}, pages = {1-7}, doi = {10.1109/BIGCOMP.2019.8679157}, url = {https://ieeexplore.ieee.org/document/8679157}, }

EngD Thesis

FIOD Image Intelligence: An Application for Large-Scale Object Detection and Analysis

Pereira, João

Feb 2021

Bib HTML

@phdthesis{Pereira2021EngDThesis,
  title = {FIOD Image Intelligence: An Application for Large-Scale Object Detection and Analysis},
  institution = {Eindhoven University of Technology},
  author = {Pereira, João},
  year = {2021},
  type = {{EngD} thesis},
  note = {Permanently Confidential.},
}