The Self-Supervised Learning Paradigm in Computer Vision
Abstract
In the evolving landscape of machine learning, self-supervised learning has recently surfaced as a compelling paradigm, challenging the conventional dependence on vast labeled datasets. This presentation aims to provide an introduction into the realm of self-supervised learning within the field of computer vision. To that end, we will trace the evolution of self-supervised learning from the early pioneering approaches that paved the way for its adoption to more recent state-of-the-art techniques. Our focus will be on uncovering the fundamental principles and core ideas that underpin these diverse self-supervised learning methods for visual data.
Bio
Nikos Komodakis holds the position of Assistant Professor at the Computer Science Department, University of Crete. He is also affiliated with the Institute of Applied & Computational Mathematics, FORTH, and is a researcher at the Archimedes Center for Research in Artificial Intelligence. Previously, he served as an Associate Professor at Ecole des Ponts ParisTech and held roles as a research scientist at the French National Centre for Scientific Research (CNRS) and a visiting Professor at Ecole Normale Superieure de Cachan. His research focuses on Computer Vision/Image Analysis, Machine Learning, and Artificial Intelligence. Furthermore, he contributes as an Editorial Board Member/Associate Editor for journals like Computer Vision and Image Understanding, International Journal of Computer Vision, and Computational Intelligence Journal.
Equivariance in Learning for Perception
Abstract
Equivariant representations are crucial in various scientific and engineering domains because they encode the inherent symmetries present in physical and biological systems, thereby providing a more natural and efficient way to model them. In the context of machine learning and perception, equivariant representations ensure that the output of a model changes in a predictable way in response to transformations of its input, such as 2D or 3D rotation or scaling. In this talk, we will show a systematic way of how to achieve equivariance by design and how such an approach can yield efficiency in training data and model capacity. We will present examples on spherical networks, equivariant representation for point clouds, and a novel definition of convolution and attention on lightfields.
Bio
Kostas Daniilidis is the Ruth Yalom Stone Professor of Computer and Information Science at the University of Pennsylvania where he has been faculty since 1998. He is an IEEE Fellow. He was the director of the GRASP laboratory from 2008 to 2013, Associate Dean for Graduate Education from 2012-2016, and Faculty Director of Online Learning from 2013- 2017. He obtained his undergraduate degree in Electrical Engineering from the National Technical University of Athens, 1986, and his PhD in Computer Science from the University of Karlsruhe, 1992, under the supervision of Hans-Hellmut Nagel. He received the Best Conference Paper Award at ICRA 2017. He co-chaired ECCV 2010 and 3DPVT 2006. His most cited works have been on event-based vision, equivariant learning, 3D human pose, and hand-eye calibration.
Deep Learning and Computer Vision for Surface Anomaly Detection
Abstract
In recent years, the data-driven, learning-based approach has begun to penetrate the more conservative engineering discipline of machine vision, particularly in tackling the critical challenge of visual inspection and surface defect detection. The deep learning approach offers a promising alternative to traditional, hand-engineered solutions by providing a more general, efficient, and cost-effective strategy for developing, deploying, and maintaining machine vision systems for surface defect and anomaly detection. In this talk, we will explore this new development paradigm, presenting a variety of deep learning approaches, ranging from fully supervised methods to models trained with mixed supervision and unsupervised methods. Our discussion will cover a spectrum of deep learning architectures, including autoencoder-like reconstructive models, discriminative methods that leverage synthetic data generation both in image and feature space, and the application of diffusion models. The talk will therefore aim to cover a wide range of deep learning paradigms, demonstrated through the specific challenge of surface anomaly detection.
Bio
Danijel Skočaj is a full professor at the University of Ljubljana, Faculty of Computer and Information Science, where he leads the Visual Cognitive Systems Laboratory. His main research interests are in the fields of computer vision, pattern recognition, deep learning, and cognitive robotics, with a current focus on developing data-driven, deep-learning methods for practical challenges involving visual information processing, such as surface anomaly detection. He has led or collaborated on a variety of projects in these areas, including EU projects, national research projects, and industry-funded applied projects. Through these research and development projects, he facilitates the transfer of research findings into practical applications. He is also interested in the ethical aspects of artificial intelligence, machine learning, and robotics, and the impact of these technologies on society. He has served as the president of the IEEE Slovenia Computer Society and the Slovenian Pattern Recognition Society, demonstrating a broader commitment to bringing the advancements and the potential of these fields closer to the general public.
Can Large Language Models Reason and Plan?
Abstract
Large Language Models (LLMs) are on track to reverse what seemed like an inexorable shift of AI from explicit to tacit knowledge tasks. Trained as they are on everything ever written on the web, LLMs exhibit "approximate omniscience"--they can provide answers to all sorts of queries, but with nary a guarantee. This could herald a new era for knowledge-based AI systems--with LLMs taking the role of (blowhard?) experts. But first, we have to stop confusing the impressive form of the generated knowledge for correct content, and resist the temptation to ascribe reasoning, planning, self-critiquing etc. powers to approximate retrieval by these n-gram models on steroids. We have to focus instead on LLM-Modulo techniques that complement the unfettered idea generation of LLMs with careful vetting by model-based AI systems. In this talk, I will reify this vision and attendant caveats in the context of the role of LLMs in planning tasks.
Bio
Subbarao Kambhampati is a professor of computer science at Arizona State University. Kambhampati studies fundamental problems in planning and decision making, motivated in particular by the challenges of human-aware AI systems. He is a fellow of Association for the Advancement of Artificial Intelligence, American Association for the Advancement of Science, and Association for Computing machinery, and was an NSF Young Investigator. He served as the president of the Association for the Advancement of Artificial Intelligence, a trustee of the International Joint Conference on Artificial Intelligence, the chair of AAAS Section T (Information, Communication and Computation), and a founding board member of Partnership on AI. Kambhampati’s research as well as his views on the progress and societal impacts of AI have been featured in multiple national and international media outlets. He can be followed on Twitter @rao2z.
The Power of Graph Learning
Abstract
Graph neural networks (GNNs) have become a prominent technique for graph learning tasks such as vertex and graph classification, link prediction and graph regression. It was recently shown that classical GNNs have limited expressive power. This resulted in the proposal of a plenitude of new - more expressive - graph learning architectures. In this course we will present a systematic investigation in the expressive power of these different architectures. We here use techniques from areas such as graph algorithms, logic and query languages. The goal is to introduce various ways of boosting the expressive power of GNNs and to provide techniques to estimate the expressive power of GNNs.
Bio
Floris Geerts is professor at the University of Antwerp, Belgium. Previously, he was a senior research fellow at the University of Edinburgh and a postdoctoral researcher at the University of Helsinki. He received his PhD in 2001 from the University of Hasselt, Belgium. His research interests include the theory and practice of databases, the study of data quality, and more recently, the interaction between linear algebra, relational databases and graph neural networks. He has written a book on data quality and published over 130 technical papers. His awards include three best paper awards, the PODS Alberto O. Mendelzon Test-of-Time award, an ACM SIGMOD Research Highlight Award and an ICLR outstanding paper award. He is an ACM Distinguished Member, was program chair of PODS and ICDT, the general chair of EDBT/ICDT and is currently the general chair of PODS. He served on the editorial boards of ACM TODS and IEEE TKDE, and was editor of various proceedings and special journal issues in the area of databases.
Complex Event Recognition
Abstract
Complex Event Recognition (CER) refers to the activity of detecting patterns in streams of continuously arriving “event” data over (geographically) distributed sources. CER is a key ingredient of many contemporary Big Data applications that require the processing of such event streams in order to obtain timely insights and implement reactive and proactive measures. Examples of such applications include the recognition of human activities on video content, emerging stories and trends on the Social Web, traffic and transport incidents in smart cities, error conditions in smart energy grids, violations of maritime regulations, cardiac arrhythmias and epidemic spread. In each application, CER allows to make sense of streaming data, react accordingly, and prepare for counter-measures. In this course, we will present formal methods for CER, as they have been developed in the artificial intelligence community. To illustrate the reviewed approaches, we will use the domain of maritime situational awareness.
Bio
Alexander Artikis is an Associate Professor in the University of Piraeus, in Athens, Greece. He is also a Senior Research Associate in the Institute of Informatics & Telecommunications at the National Centre for Scientific Research (NCSR) Demokritos, in Athens, Greece, where he leads the Complex Event Recognition lab (https://cer.iit.demokritos.gr). Alexander holds a PhD from Imperial College London on the topic of multi-agent systems, while his research interests lie in the area of Artificial Intelligence. He has published over 100 papers in related journals and conferences. According to Google Scholar, his h-index is 38. Alexander has been developing complex event recognition techniques in the context of several EU-funded Big Data projects, while he was the scientific coordinator in some of them. Furthermore, Alexander has been serving as a member of the (senior) programme committees of several international conferences, including AAAI, IJCAI, ECAI, AAMAS, KR, VLDB and CIKM. In 2020, he co-organised the Dagstuhl seminar on the “Foundations of Composite Event Recognition”.
Generative AI in Computer Vision
Abstract
Diffusion models have revolutionized the field of Computer Vision. Conditional diffusion models are powerful generative models that can leverage various types of conditional information, such as class labels, segmentation masks, or text captions. This cours will cover the basics of diffusion models and focus on recent advances in text-to-image image generation with an emphasis on modern algorithms such as Stable Diffusion.
Bio
Vicky Kalogeiton (F) is an early career tenured Assistant Professor in the VISTA team of École Polytechnique since 2020. She received the M.Sc degree in Computer Science from DUTh, Greece, 2013. She obtained her PhD in Computer Vision from the University of Edinburgh and Inria, Grenoble, advised by Prof. Vittorio Ferrari and Dr. Cordelia Schmid. In 2019, she joined the University of Oxford as a research fellow to work with Prof. Andrew Zisserman. Part of the work won the honorable mention award at ACCV 2022, the best paper award at ICCV-W 2021, the best poster award at the University Grenoble Alpes in 2017, and the best thesis award of DUTh in 2013. She has been serving regularly as Area Chair in Computer Vision conferences since 2021 and is the Associate Editor for CMBBE since 2017. She has been awarded five times outstanding reviewer for top vision conferences and outstanding Area Chair for ACCV 2022. Furthermore, she is the recipient of numerous awards, including the ANR JCJC 2022 for junior researchers in France, the Archimedes 2023, and Microsoft Academic gifts in 2022-2025. Her research expertise lies in multimodality in computer vision with deep learning in three axes: generativeAI (text-to-image generation), video understanding using text and audio, and multimodal medical applications and she regularly publishes papers in the most prestigious conferences and journals of her field.
Computationally Efficient Learning under Noisy Data
Abstract
Machine Learning algorithms are notoriously fragile in the presence of noise or errors in the dataset. In this tutorial, we will examine the computational challenges around learning in noisy settings. We will consider popular theoretical models for noise from adversarial to purely random and examine how state-of-the-art techniques can provably tolerate large amounts of errors in the data. We will focus on the fundamental problem of learning a perceptron, i.e. binary classification with a linear separator, and discuss how these techniques extend to more learning complex models like multi-layer neural networks.
Bio
Christos Tzamos is an Associate Professor in the Informatics and Telecommunications Department at University of Athens. His research interests lie in the interface of Theory of Computation with Economics and Game Theory, Machine Learning, Statistics and Probability Theory. Prior to his current role, Christos was an Assistant Professor in the Computer Sciences Department at UW-Madison and a postdoctoral researcher at Microsoft Research. Christos holds a PhD in Computer Science from MIT and received the George M. Sprowls award for the best thesis at MIT. He is also the recipient of a Simons Foundation award, an NSF CAREER award, the best paper award in EC 2013 and the best paper award in NeurIPS 2019.
Self-designing AI Systems
Abstract
As we increasingly need to apply AI in ever more diverse and critical applications with higher accuracy, AI systems have become the bottleneck. AI systems power data management and computation for the various steps involved in the development and deployment of AI pipelines: from data exploration and data engineering, to feature engineering, ML engineering, model training, and model inference at scale. AI systems today are confronted with increasingly larger datasets and models which makes the development and deployment of AI solutions increasingly more expensive both in terms of time and cloud cost. With state-of-the-art systems, it takes several months or even more than a year to develop just a single AI solution. We pinpoint the problem to the fact that AI systems today are “fixed-design”. However, the performance of AI systems depends on the exact data, model, hardware, and available cloud budget for the specific AI solution developed every time. Knowing the right system design for any given scenario is a notoriously hard problem; there is a massive space of possible designs, while no single design is perfect across all data, models, and hardware contexts. In addition, building a new system may take several years for any given (fixed) design. As a result, modern AI development relies on off-the-self, fixed systems that end up doing excessive computation and data movement leading to slow and lesser-quality AI. We will discuss our quest for the first principles of AI system design. We will show that it is possible to reason about their massive design space. This allows us to create self-designing AI systems that can take drastically different shapes to optimize for the data, models, hardware, and available cloud budget using a grammar for systems. These shapes include data structures, algorithms, models, and overall system designs which are discovered automatically and do not (always) exist in the literature or industry, yet they can be more than 10x faster. We will show performance examples for up to 1000x faster data management, up to 10x small data footprint, and up to 10x faster neural network training in AI pipelines. This tutorial will introduce background on AI systems for data management and deep learning. We will then introduce the self-designing systems concept in detail and how it can lead to automatically generating the right AI systems for each context. We will see examples from deep learning and Image AI.
Bio
Stratos Idreos is a Gordon McKay Professor of Computer Science at the Harvard John A. Paulson School of Engineering and Applied Sciences. He leads DASlab, the Data Systems Laboratory at Harvard. His research focuses on building a grammar for systems with the goal of making it dramatically easier or even automating in many cases the design of workload and hardware-conscious systems for diverse applications including relational data analytics, NoSQL, machine learning, and Blockchain. For his doctoral work on Database Cracking, Stratos was awarded the 2011 ACM SIGMOD Jim Gray Doctoral Dissertation award and the 2011 ERCIM Cor Baayen award. In 2015 he was awarded the IEEE TCDE Rising Star Award from the IEEE Technical Committee on Data Engineering for his work on adaptive data systems. In 2020 he received the ACM SIGMOD Contributions award for his work on reproducible research and in 2022 he received the ACM SIGMOD Test of Time Award for his work on raw data processing. Stratos was PC Chair of ACM SIGMOD 2021 and IEEE ICDE 2022, he is the founding editor of the ACM/IMS Journal of Data Science and the chair of the ACM SoCC Steering Committee.
Exploring the Intersection of Voting Theory and AI
Abstract
This tutorial will explore the potential and challenges of using neural networks to learn and improve voting rules. Given the opacity of modern AI methods, we will discuss ways to ensure that the induced voting rules are transparent and interpretable, setting up a safety net for their widespread adoption. We will first provide a brief introduction to the field of social choice theory, defining common voting rules, their normative properties, and their provided incentives for strategic manipulation. Then, we will foccus on recent research spanning across three directions, studying the abilities of neural networks to learn how to: (i) adopt the principles behind a voting rule given examples about its choices, (ii) design novel voting rules that adhere to certain democratic ideals, and (iii) manipulate elections.
Bio
Zoi Terzopoulou is a junior professor at the Saint-Etienne School of Economics, in France. Her research lies at the intersection of Economics and Computer Science, with a focus on formal models of computational social choice capturing realistic conditions of collective decision making. She previously was a postdoctoral researcher at the University of Paris-Dauphine under an individual European grant. Before that, she was awarded a PhD diploma from the University of Amsterdam in the Netherlands, where she also obtained a MSc degree in Logic. She holds a BSc degree in Mathematics from the University of Athens, in Greece. She has published extensively in highly esteemed venues both in Computer Science and Economics—notably in the IJCAI, AAAI, and AAMAS conferences, the journal of Social Choice and Welfare, and the Journal of Mathematical Economics.
Generative Models for Robot Control
Abstract
In this tutorial, we will present today's generative models of language, vision and action and discuss their connections to robotics in learning robot policies from optimal and sub-optimal demonstrations, learning to plan at test-time, guiding intelligent exploration and learning from human explanations and feedback during deployment.
Bio
Katerina Fragkiadaki is the JPMorgan Chase Associate Professor in the Machine Learning Department in Carnegie Mellon University. She received her undergraduate diploma from Electrical and Computer Engineering in the National Technical University of Athens. She received her Ph.D. from University of Pennsylvania and was a postdoctoral fellow in UC Berkeley and Google research after that. Her work focuses on combining forms of common sense reasoning, such as spatial understanding and 3D scene understanding, with deep visuomotor learning. The goal of her work is to enable few-shot learning and continual learning for perception, action and language grounding. Her group develops methods for computer vision for mobile agents, 2D and 3D visual parsing, 2D-to-3D perception, vision-language grounding, learning of object dynamics, navigation and manipulation policies. Pioneering innovations of her group’s research include 2D-to-3D geometry-aware neural networks for 3D understanding from 2D video streams, analogy-forming networks for memory-augmented few-shot visual parsing, and language-grounding in 2D and 3D scenes with bottom-up and top-down attention. Her work has been awarded with a best Ph.D. thesis award, an NSF CAREER award, AFOSR Young Investigator award, a DARPA Young Investigator award, Google, TRI, Amazon, UPMC and Sony faculty research awards. She is a program chair for ICLR 2024.
Advancements in Self-Supervised Learning for Speech Technologies
Abstract
Self-supervised learning has revolutionized speech technologies, driving significant advancements in speech recognition, speaker identification, and emotion detection. This talk will explore key architectures like Wav2Vec and HuBERT, which utilize large-scale unlabelled audio data to learn powerful representations, setting new performance benchmarks. We will also discuss emerging trends such as the use of discrete units, which enhance the naturalness and intelligibility of generated speech. These units enable more detailed manipulation of speech features and facilitate the application of methods inspired by large language models, leading to more sophisticated speech synthesis and direct speech-to-speech translation. Finally, the talk will cover methods for fine-tuning these models for specific downstream tasks, ensuring optimal performance and adaptability in various applications.
Bio
Themos Stafylakis is an elected Associate Professor at the Department of Informatics of Athens University of Economics and Business, the Head of Machine Learning and Voice Biometrics at Omilia, and an Affiliated Researcher with Archimedes Unit of Athena R.C. He has served as a post-doc, Marie-Curie Fellow, and visiting scholar at Computer Research Institute of Montreal (Canada), University of Nottingham (UK), and University of Brno (Czechia), respectively. His research interests are speech and speaker recognition, multimodal machine learning models for speech and language, and dialog systems.
Diffusion models in medical imaging and analysis
Abstract
There has been an explosion of developments in generative models in machine learning (including Variational Auto-Encoders or VAEs, Generative Adversarial Networks or GANs, Normalizing Flows or NFs) that enable us to generate high-quality, realistic synthetic data such as high-dimensional images, volumes, or tensors. Recently a (re)newed breed of generative models, Diffusion Models have shown impressive ability in generating high-quality imaging data. Applications of diffusion models in medical image analysis are already appearing in the context of image reconstruction, denoising, anomaly detection, segmentation, generation of data, and causality. This tutorial presents an overview of generative modelling, focusing on diffusion models (theory and learning tricks). We will discuss applications in the medical imaging field and overview existing open-ended challenges. It builds on the highly successful and sold-out tutorial at MICCAI 2023 and ISBI 2024.
Bio
Sotirios A. Tsaftaris is currently Chair (Full Professor) in
Machine Learning and Computer Vision at the University of Edinburgh. He also holds the
Canon Medical/Royal Academy of Engineering Research Chair in Healthcare AI.
He is the Director for the EPSRC-funded AI Hub for Causality in Healthcare AI with
Real Data (CHAI). He is an ELLIS Fellow of the European Lab for Learning and
Intelligent Systems (ELLIS) of Edinburgh’s ELLIS Unit. Since 2023 he is a visiting
researcher with Archimedes RC a research centre of excellence in AI in Athens, Greece.
Between 2016 and 2023 he was a Turing Fellow with the Alan Turing Institute.
Ms Nefeli Gkouti, is a phd student at Archimedes RU / Athena RC and at the National
and Kapodistrian University of Athens, supervised by Prof. Yannis Panagakis and Prof.
Sotirios Tsaftaris. Her main research interests include causality and interpretability
in machine learning, representation learning, generative models and their application
in healthcare.
Artificial Intelligence for Earth Observation - ESA Φ-lab
Abstract
ESA’s Φ-lab mission is to accelerate the future of Earth Observation (EO). We will look at the current research at the Φ-lab, European Space Agency (ESA), ESRIN, Italy. We will present the current projects and we will focus on Artificial Intelligence for Earth monitoring, satellite data and Foundation Models.
Bio
Nikolaos Dionelis received his Masters MEng degree (including Bachelor’s level study) in Electrical and Electronic Engineering from Imperial College London, United Kingdom, in 2015, and the PhD degree in Signal Processing from Imperial College London, in 2019. He worked as a Postdoctoral Research Associate in Machine Learning for four years at the University of Edinburgh and the University Research Collaboration in Signal Processing conducting research on Robust Generative Neural Networks, in 2019-2023. Nikolaos has experience in deep generative models, discriminative classifiers, Out-of-Distribution (OoD)/ anomaly detection, object of interest detection and classification, and novelty detection in the real Open World setting. His methods are based on semi- and self-supervised learning, contrastive similarity learning, and representation learning. He also uses few-shot learning techniques, data augmentation, probability density estimation, and confidence assignment and assessment methods. He joined the Φ-lab team at the European Space Agency (ESA) in 2023 as a Research Fellow to contribute to Earth Observation (EO) science and remote sensing research projects related to deep learning, computer vision, and signal processing. Nikolaos will develop innovative Artificial Intelligence (AI) solutions, including deep generative models and discriminative classifiers, to more accurately quantify and assess the impact of climate change and predict climate change applications.
The European AI-on-Demand Platform for AI Researchers
Abstract
This tutorial will provide a comprehensive introduction to the AI-on-Demand (AIoD) platform, a central pillar of the European AI infrastructure and technical ecosystem. AIoD is designed to facilitate collaborative and reproducible AI research by aggregating metadata on AI methods and datasets, and by providing researchers with tools for collaborative development, experimentation, and knowledge sharing, all while adhering to Open Science principles. This resource endeavours to provide an environment that encourages scientific excellence and ethical AI development within the European research community.
Bio
Iraklis Klampanos, Ph.D., is a Principal Researcher at the National Centre for Scientific Research (NCSR) “Demokritos”. He is head of the Intelligent Data-Intensive Systems group of the Institute of Informatics and Telecommunications. His research includes machine-learning-based data engineering, intelligent data-intensive systems and applications, human-in-the-loop AI for scientists, and e-Science. He is involved in the design and implementation of the European AI-on-Demand platform.