Design, optimize, and deploy AI systems combining text and images
This advanced training course allows participants to design, optimize, and evaluate systems based on multimodal LLMs. It covers the entire creation cycle of an intelligent AI product: data preparation, modern architectures, multimodal pipelines, product integration, and governance.
Is it for you ?
Experienced data scientists. ML engineers and AI product engineers. Technical product owners working on AI products. AI/ML architects looking to integrate multimodal LLMs.
Prerequisites
Excellent command of Python. Strong knowledge of deep learning and NLP. Familiarity with LLMs, embeddings, and Transformer architectures.
What You'll Walk Away With
- ✓ Design comprehensive multimodal pipelines
- ✓ Optimize multimodal LLM models for various use cases
- ✓ Evaluate model performance, risks, and behaviors
- ✓ Design an intelligent AI product integrating vision, text, and other modalities
- ✓ Implement robust governance and monitoring
Training content
1 Day 1 – Foundations of Multimodal LLMs and Advanced Pipelines
- Multimodal architectures: CLIP, Flamingo, Qwen-VL, Gemini-like
- Vision-text alignment: embeddings, joint training, cross-attention
- Preparation and structuring of multimodal data
- Building a vision–text pipeline in Python
- Lightweight fine-tuning (LoRA, adapters) on internal data
- Workshops: integrating a visual encoder with an LLM
Lab / Exercise: Designing a multimodal classification pipeline
2 Key points & takeaways:
- Understanding of key architectures
- Implementation of a complete pipeline
- Modular approach for multimodal extension
3 Day 2 – Optimization and Evaluation of Multimodal Models
- Model optimization: quantization, distillation, caching
- Dedicated multimodal evaluation methods (text-to-image, VQA, grounding)
- Risk assessment: hallucinations, biases, visual inconsistencies
- Setting up a comprehensive evaluation benchmark
- A/B testing on prompts and architectures
- Workshop: stress tests and robustness audit
Lab / Exercise: Building a multimodal evaluation dashboard
4 Key points & takeaways:
- Advanced evaluation of multimodal LLMs
- Audit and monitoring methodology
- Risk mitigation strategies
5 Day 3 – From Model to Intelligent AI Product
- AI product design: multimodal UX, decision flows
- Integration model → API → app (patterns, limits, security)
- AI governance: logs, continuous monitoring, compliance
- Workshop: building a mini-multimodal AI product
- Integrating images, text, and advanced prompts
- Delivering a functional MVP
Lab / Exercise: Creating a complete multimodal MVP for a business use case
6 Key points & takeaways:
- Comprehensive view of the AI product cycle
- Responsible and governed integration
- Ability to deliver an advanced prototype
📌 Practical information
Our training sessions are offered in Montreal or Quebec City, in person or in a virtual classroom. Dates and locations are specified when you select your session below. If you have any questions, check out our FAQ.