MBZUAI Workshop 2025: Foundations and Advances in Generative AI: Theory and Methods

Program on Wednesday, February 12

09:00am	Registration and Coffee & Tea!

09:30am	Opening Remarks
	Eric Xing (MBZUAI & Carnegie Mellon University)

10:10am	Statistical Methods for Assessing the Factual Accuracy of Large Language Models
	Emmanuel Candès (Stanford University)
	We present new statistical methods for obtaining validity guarantees on the output of large language models (LLMs). These methods enhance conformal prediction techniques to filter out claims/remove hallucinations while providing a finite-sample guarantee on the error rate of what it being presented to the user. This error rate is adaptive in the sense that it depends on the prompt to preserve the utility of the output by not removing too many claims. We demonstrate performance on real-world examples. This is joint work with John Cherian and Isaac Gibbs.

10:50am	Coffee & Tea Break

11:00am	The ChatGLM's Road to AGI
	Jie Tang (Tsinghua University)
	Large language models have substantially advanced the state of the art in various AI tasks, such as natural language understanding and text generation, and image processing, multimodal modeling. In this talk, we will first introduce the development of AI in the past decades, in particular from the angle of China. We will also talk about the opportunities, challenges, and risks of AGI in the future. In the second part of the talk, we will use ChatGLM, an alternative but open sourced model to ChatGPT, as an example to explain our understandings and insights derived during the implementation of the model.

11:40am	Exploiting Knowledge for Model-based Deep Music Generation
	Gaël Richard (Télécom Paris)
	We will describe and illustrate the concept of hybrid (or model-based) deep learning for music generation. This paradigm refers here to models that associates data-driven and model-based approaches in a joint framework by integrating our prior knowledge about the data in more controllable deep models. In the music domain, prior knowledge can relate for instance to the production or propagation of sound (using an acoustic or physical model) or how music is composed or structured (using a musicological model). In this presentation, we will first illustrate the concept and potential of such model-based deep learning approaches and then describe in more details its application to unsupervised music separation with source production models, music timbre transfer with diffusion and symbolic music generation with transformers using structured informed positional encoding.

12:20pm	Auditing and Mitigating Biases in (compressed) Language Models
	Julien Velcin (University of Lyon)
	The size of language models plays a critical role in their ability to address complex tasks in NLP. However such big LMs can be hard to deploy on edge devices which leads to the need of compressing LLMs. Recent studies have shown that compressing pretrained models can significantly influence the way they deal with various biases, such as biases related to fairness and model calibration. In this talk, I will provide an overview of recent research conducted at the ERIC Lab as part of the DIKé project. In particular, We will see how important quantization can lead to calibration errors and alter the model's confidence in its predictions. Additionnally, I will discuss ongoing work on the alignement of LLMs with moral values.

13:00pm	Lunch

14:00pm	Intricacies of Game-theoretical LLM Alignment
	Michal Valko (INRIA & Stealth Startup)
	Ensuring alignment of language models' outputs with human preferences is critical to guarantee a useful, safe, and pleasant user experience. Thus, human alignment has been extensively studied recently and several methods such as Reinforcement Learning from Human Feedback (RLHF), Direct Policy Optimisation (DPO) and Sequence Likelihood Calibration (SLiC) have emerged. In this paper, our contribution is two-fold. First, we show the equivalence between two recent alignment methods, namely Identity Policy Optimisation (IPO) and Nash Mirror Descent (Nash-MD). Second, we introduce a generalisation of IPO, named IPO-MD, that leverages the regularised sampling approach proposed by Nash-MD. @This equivalence may seem surprising at first sight, since IPO is an offline method whereas Nash-MD is an online method using a preference model. However, this equivalence can be proven when we consider the online version of IPO, that is when both generations are sampled by the online policy and annotated by a trained preference model. Optimising the IPO loss with such a stream of data becomes equivalent to finding the Nash equilibrium of the preference model through self-play. Building on this equivalence, we introduce the IPO-MD algorithm that generates data with a mixture policy (between the online and reference policy) similarly as the general Nash-MD algorithm. We compare online-IPO and IPO-MD to different online versions of existing losses on preference data such as DPO and SLiC on a summarisation task.

14:40pm	Moshi: A Speech-text Foundation Model for Real-time Dialogue
	Alexandre Défossez (Kyutai)
	We will discuss Moshi, our recently released model. Moshi is capable of full-duplex dialogue, e.g. it can both speak and listen at any time, offering the most natural speech interaction to date. Besides, Moshi is also multimodal, in particular it is able to leverage its inner text monologue to improve the quality of its generation. We will cover the design choices behind Moshi in particular the efficient joint sequence modeling permitted by RQ-Transformer, and the use of large scale synthetic instruct data.

15:20pm	Coffee & Tea Break

15:30pm	Towards the Alignment of Geometric and Text Latent Spaces
	Maks Ovsjanikov (Google DeepMind & École Polytechnique)
	Recent works have shown that, when trained at scale, uni-modal 2D vision and text encoders converge to learned features that share remarkable structural properties, despite arising from different representations. However, the role of 3D encoders with respect to other modalities remains unexplored. Furthermore, existing 3D foundation models that leverage large datasets are typically trained with explicit alignment objectives with respect to frozen encoders from other representations. In this talk I will discuss some results on the alignment of representations obtained from uni-modal 3D encoders compared to text-based feature spaces. Specifically, I will show that it is possible to extract subspaces of the learned feature spaces that have common structure between geometry and text. This alignment also leads to improvement in downstream tasks, such as zero shot retrieval. Overall, this work helps to highlight both the shared and unique properties of 3D data compared to other representations.

16:10pm	Redefining AI Reasoning: From Self-Guided Exploration to Causal Loops, and Transformer-GNN Fusion
	Martin Takáč (MBZUAI)
	In this talk, we explore three intertwined directions that collectively redefine how AI systems reason about complex tasks. First, we introduce Self-Guided Exploration (SGE), a prompting strategy that enables Large Language Models (LLMs) to autonomously generate multiple “thought trajectories” for solving combinatorial problems. Through iterative decomposition and refinement, SGE delivers significant performance gains on NP-hard tasks—showcasing LLMs’ untapped potential in reasoning, logistics and resource management problems. Next, we delve into the Self-Referencing Causal Cycle (ReCall), a mechanism that sheds new light on LLMs’ ability to recall prior context from future tokens. Contrary to the common belief that unidirectional token generation fundamentally restricts memory, ReCall illustrates how “cycle tokens” create loops in the training data, enabling models to overcome the notorious “reversal curse.” Finally, we present a Transformer-GNN fusion architecture that addresses Transformers’ limitations in processing graph-structured data.

18:00pm	Poster Session with Buffet at MBZUAI France Lab
	To present a poster, please fill out the Google form for review.
	Workshop participants are invited to join the poster session at MBZUAI France Lab. Address: 42 Rue Notre Dame des Victoires, 75002 Paris