Introducing

dvps

Advancing Multimodal Foundation Models

DVPS (Diversibus Viis Plurima Solvo, Latin for “Through diverse paths, I solve many issues”) builds on the success of large language models by exploring the future of AI through multimodal foundation models.

Unlike today’s systems, which learn from representations of the world via text, images, and video, these next-generation models are designed to learn across multiple input channels, including visual, auditory, linguistic, and sensory signals, to gain a grounded understanding of the physical world. This multimodal approach enables them to interpret meaning in parallel, manage complexity, and adapt to real-world scenarios where today’s single-modal AI often fails.

What Makes DVPS Models Transformative?

Beyond performance benchmarks, MMFMs offer a new approach to AI development with three key advantages:

Label efficiency

The ability to learn from limited labelled data through transfer learning and few-shot adaptation.

Compute reusability

Leveraging pre-training on large-scale data to reduce the computational cost of developing downstream task models.

Engineering efficiency

Reducing the development effort and expertise required to create specialised models for each new task or domain.

Project Objectives

Developing and disseminating to the research community scientific foundations and methodology; 

Releasing impactful open-source assets to the world, for developers to exploit; 

Delivering concrete innovations from our use case with medical, social, and industrial benefits.

We will achieve our objective by developing toolkit named AutoDVPS that will be released as an open-source software after being used in three planned application domains (Cardiology, Geo-Intelligence, Language Communication) and tested in two surprise application domains introduced in the second part of the research project to force our methods to generalize beyond the initial assumptions, driving innovation.

Technical detals

Create and release AutoDVPS, an open-source toolkit for MMFM design, pre-training (PT), fine-tuning (FT), and modality expansion

Enable recycling and composition of pre-trained models to reduce development time and cost.

Validate the toolkit on 3 Planned Application Domains (AD), covering 60% of MMFM creation steps.

Deploy the toolkit on 2 Surprise AD, supporting new modalities within 16 weeks of receiving necessary components.

Publish MMFM scaling laws and ex-ante performance predictors.

Release on GitHub and HuggingFace with documentation, tutorials, and case studies.

Demonstrate applicability of DVPS methods to 3 Planned AD (Geo-Intelligence, Cardiology, Language Communication) and 2 Surprise AD

Develop domain-specific MMFM with at least 5 modalities per AD, achieving state-of-the-art on at least 10 DST

Open-source all models and, where feasible, benchmark data and scripts

Release foundation models and evaluations for 5 DST-adapted models in 2 Surprise AD within 20 weeks

Combine MMFM with LLMs to enable instructability, prompting, interactivity, and AD-specific knowledge

Deliver 4 use cases at TRL 5 per Planned AD and TRL 4 for Surprise AD

General Goals Across All ADs

Demonstrate MMFM-supported informed decision-making;

Enable cross-modality conversion to reduce cost;

Develop new benchmarks for robustness, reliability, and safety.

Geo-Intelligence

Develop Copernicus sensor-agnostic EO MMFM;

Integrate 2D–3D data and GIS modalities;

Validate models on disaster management, urban planning, ecosystem modeling, and Earth systems forecasting;

Build conversational interfaces with geospatial semantic understanding;

Create forecasting models with text interfaces validated by domain experts.

Cardiology

Create privacy-preserving federated MMFM;

Use physics-informed models to improve plausibility;

Map 3D/4D data to a modality-agnostic space;

Reduce required modalities using multimodal correlations;

Generate user-optimized reports and conversational interfaces;

Develop Decision Transformers for cardiac treatment recommendations.

Language Communication

Establish personalization metrics and track progress across education, accessibility, document editing, and translation use cases;

Improve reasoning and instruction-following by 30% using diverse document-based knowledge;

Achieve no performance degradation using 50% or less of task-specific data or modalities.

Open-source all DVPS tools, pre-trained modules, and results

Publish the MMFM textbook Principles and Practice of MMFM and launch a MOOC reaching over 1,500 learners

Collaborate with at least 15 EU AI initiatives

Found an industrial-academic co-innovation lab to apply project outcomes

Host annual DVPS workshops and hackathons with more than 200 participants

Provide model cards for all MMFM releases

Publish AI risk reports and a taxonomy of attacks and vulnerabilities

Host ethics and safety workshops and hackathons with civil society and domain experts

Implement red teaming, alignment methods, and safety-focused fine-tuning

Evaluate and reduce bias by at least 30% using mitigation strategies

Create and release DVPSBench to evaluate MMFM capabilities, performance, and ethical soundness

Build automated pipelines for reproducible evaluations across domains and modalities

Define multi-metric evaluation protocols for robustness, factuality, hallucination, and bias

Incorporate transparency tools such as watermark detection and explainable AI

Launch ModalBoard, a public MMFM leaderboard to support model selection and comparison

What we are building

AutoDVPS

An open-source toolkit for automated MMFM design, pre-training, fine-tuning, and modality expansion.

DVPSBench

A comprehensive benchmarking suite specifically designed to evaluate the performance, robustness, and ethical implications of MMFM.

DVPS-FM

An MMFM trained on hundreds of modalities.