click to view more

Multimodal Foundation Models From Specialists to General Purpose Assistants Foundations and Trend

by [Li, Chunyuan, Gan, Zhe, Yang, Zhengyuan, Yang, Jianwei, Li, Linjie, Wang, Lijuan, Gao, Jianfeng]

$95.78

List Price: $99.00
Save: $3.22 (3%)
add to favourite
  • In Stock - Guaranteed to ship in 24 hours with Free Online tracking.
  • FREE DELIVERY by Monday, April 14, 2025 2:19:42 AM UTC
  • 24/24 Online
  • Yes High Speed
  • Yes Protection
Last update:

Description

This monograph presents a comprehensive survey of the taxonomy and evolution of multimodal foundation models that demonstrate vision and vision-language capabilities, focusing on the transition from specialist models to general-purpose assistants.
The focus encompasses five core topics, categorized into two classes; (i) a survey of well-established research areas: multimodal foundation models pre-trained for specific purposes, including two topics - methods of learning vision backbones for visual understanding and text-to-image generation; (ii) recent advances in exploratory, open research areas: multimodal foundation models that aim to play the role of general-purpose assistants, including three topics - unified vision models inspired by large language models (LLMs), end-to-end training of multimodal LLMs, and chaining multimodal tools with LLMs.
The target audience of the monograph is researchers, graduate students, and professionals in computer vision and vision-language multimodal communities who are eager to learn the basics and recent advances in multimodal foundation models.

Last updated on

Product Details

  • Now Publishers Brand
  • May 6, 2024 Pub Date:
  • 9781638283362 ISBN-13:
  • 1638283362 ISBN-10:
  • English Language
  • 9.21 in * 0.48 in * 6.14 in Dimensions:
  • 1 lb Weight: