About me


I am a Data Scientist at adidas in the Digital Data Science team in Amsterdam where I work on developing demand forecasting models and productionizing them in highly scalable, cloud-native data pipelines. Prior to joining adidas, I completed my Professional Doctorate (PDEng) in Data Science at Eindhoven University of Technology, where I worked on data science projects for multiple companies (ASML, Van Lanschot, TE Connectivity, Heijmans, FIOD). Before the PDEng, I conducted deep learning research focusing on anomaly detection in time series data and completed my Masters and Bachelors degree in Electrical & Computer Engineering. I am broadly interested in the field of machine learning and its endless applications and passionate about turning cutting-edge research into products. Here you can find a list of my previous projects and here a list of publications. Feel free to contact me!

News

  • March 2021I joined adidas as a Data Scientist in Amsterdam ///.
  • May 2020 Gave a talk on "Anomaly Detection with Variational Autoencoders" (Video & Slides)
  • January 2020 Started my final project with the Belastingdienst on object detection.
  • December 2019 Gave a workshop on deep learning for the PDEng Data Science. Check out the materials (slides & code) here.
  • October 2019 We introduce Solar Scan! (blog post)
  • July 2019 I'll be lab monitor at the Lisbon Machine Learning School - LxMLS'19
  • June 2019 Participating in the DeepFake Detection Challenge - Presentation and Code

Experience


Logo adidas

Data Scientist

March. 2021 - Now

adidas

    In the Digital Data Science team, my main responsibilities have been:
  • Developing production-ready machine learning models that forecast demand for thousands of adidas products (PyTorch).
  • Building and maintaining highly scalable, robust, cloud-native data pipelines deployed on AWS (SageMaker Training, Inference, Processing, Pipelines; EMR; Step and Lambda functions)
  • Integrating MLOps components into the forecasting products (such as automated model monitoring)
  • Optimizing big data processing workflows (Spark, Hadoop)
  • Implementing SW engineering best practices (unit and integration testing, linting, CI pipelines, code reviews)
  • Working in an international team responsible for creating data products that provide better trading for adidas eCom business (.com and apps)

Logo JADS

Data Scientist

Jan. 2020 - Jan. 2021

FIOD-Belastingdienst

    In my 12-month-project with FIOD-Belastingdienst, I worked on the following tasks in collaboration with FIOD's data science team (CoDE):
  • Developed custom object detection and image classification models using TensorFlow and deployed them as REST APIs using Flask and Docker.
  • Created a web application for the end-users - FIOD Image Intelligence - written in JavaScript, NGINX, D3.JS, MapBox.

Logo JADS

Professional Doctorate Candidate in Data Science

Jan. 2019 - Jan. 2021

Jheronimus Academy of Data Science (JADS)

    In the PDEng Data Science I worked on 7 projects for several companies: 2x ASML, Van Lanschot, TE Connectivity, Heijmans. Check project here.
  • Coached/supervised students and professionals.

Logo LXMLS

Lab Monitor (Volunteer)

July 2019

Lisbon Machine Learning School (LxMLS)

  • Part of the organizing team of LxMLS'19 as a monitor. I had responsibilities in the organization of the school and helped the students solving the exercises during the lab sessions, implementing machine learning algorithms such as naive Bayes, hidden Markov models, conditional random fields, recurrent neural networks, and reinforcement learning. The school is mostly focused on Natural Language Processing (NLP).

Logo ISR

Student Researcher

Jan. 2018 - Dec. 2018

Institute for Systems and Robotics (ISR-Lisbon)

  • Within the Signal and Image Processing group of ISR my research focused on deep learning and anomaly detection in time series data.
  • Proposed an approach for time series anomaly detection based on variational autoencoders, recurrent neural networks, and attention mechanisms. This approach is unsupervised, making it suitable for applications where obtaining labels is expensive or time-consuming (fraud detection, medical diagnosis, fault detection, ...).
  • Publications

Logo EDP

Intern

{July 2017, July 2016}

EDP Group

  • Developed a data science project on load forecasting using neural networks, within the Planning Department of EDP Distribuição.
  • Worked on design of medium/low voltage networks within the Network Studies team

Education

Logo TUE

Professional Doctorate (PDEng) in Data Science

2019 - 2021

Eindhoven University of Technology

Logo IST

Masters and Bachelors in Electrical and Computer Engineering

Sept. 2013 - Nov. 2018

Instituto Superior Técnico

Logo UCLouvain

Exchange Student

Sept. 2016 - Jan. 2017

Université Catholique de Louvain

Projects

Object Detection and Image Classification

Object Detection Image Classification Neural Networks
TensorFlow Flask Python JavaScript

Solar Scan

Solar Potential Image Segmentation CNNs
Python JavaScript C# Google Cloud Platform Microsoft SQL GIS Leaflet

Prediction of Occupancy in Working Spaces

Neural Networks Embeddings Regression
TensorFlow Python

Detection of Suspicious Cases of Money Laundering

Graph Analysis Community Detection
Microsoft Azure PowerBI

Unsupervised Particle Identification in Reticle Images

Image Processing Peak Detection
OpenCV Scikit-image

Recommendation System for Product Cross-Sell

Matrix Factorization SVD
Python Plotly & Dash

DeepFake Detection

Video Classification CNNs RNNs
TensorFlow Python JavaScript

Peak Power Forecasting in Electrical Substations

Neural Networks Time Series Forecasting
Matlab

Analyzing the Part Lifecycle in the Supply Chain Process at ASML

Process Mining
Disco Python & Dash

Anomaly Detection in Solar Generation Time Series

Time Series Data Variational Autoencoders Neural Networks Unsupervised Learning
TensorFlow Python

Publications


Unsupervised Anomaly Detection in Energy Time Series Data using Variational Recurrent Autoencoders
Conference Paper
Oral Presentation
João Pereira
, Margarida Silveira
Orlando, Florida, USA

17th IEEE International Conference on Machine Learning and Applications (ICMLA'18)

Solar Energy Generation Photovoltaic Panels Deep Learning Time Series Data Variational Autoencoders Reconstruction Probability Recurrent Neural Networks Variational Self-Attention Mechanism

Learning Representations from Healthcare Time Series Data for Unsupervised Anomaly Detection
Conference Paper
Oral Presentation
João Pereira
, Margarida Silveira
Kyoto, Japan

2019 IEEE International Conference on Big Data and Smart Computing (BigComp'19)

Electrocardiogram Deep Learning Heartbeats Variational Autoencoders Wasserstein Distance Latent Space Detection

Logo Thesis
Unsupervised Anomaly Detection in Time Series Data Using Deep Learning
M.Sc. Thesis
João Pereira
Lisbon, Portugal

Instituto Superior Técnico, University of Lisbon

Anomaly Detection Time Series Data Variational Autoencoders Recurrent Neural Networks

Unsupervised Representation Learning and Anomaly Detection in ECG Sequences
Journal Paper
João Pereira
, Margarida Silveira

International Journal of Data Mining and Bioinformatics (IJDMB)

Electrocardiogram Heartbeats Variational Autoencoders Wasserstein Distance Latent Space Detection

Reviewing and Program Committees

  • Journals: IEEE Access; Wireless Communications and Mobile Computing (Wiley).
  • Conferences: International Conference on Artificial Intelligence, Information Processing, and Cloud Computing; International Conference on Time Series and Forecasting; ALLDATA2020.

Talks


"Anomaly Detection with Variational Autoencoders"

May 2020

Deep Learning Sessions Lisbon

Slides

Logo TUE

"On Deep Learning - An Overview"

Oct. - Nov. 2019

Jheronimus Academy of Data Science

Slides

Get in Touch!

mail@joao-pereira.pt