This book is a self-contained and code-free introduction to the world of Transformer networks. The first chapter introduces the foundational technical concepts of the Transformer architecture along with the intuition behind many of the design choices. It also provides a deep insight into the architecture by following the data through the various stages of processing using a simple machine translation example. The second chapter introduces the popular BERT and GPT language models that are built upon the Transformer architecture. It also details the progression of language models from the simple GPT-1 to the high-capacity GPT model that is at the heart of ChatGPT. The third chapter introduces the Vision Transformer and discusses the architectural features that enable it to process images. Chapter four explores the Swin Transformer architecture which reduces the computational complexity of the Vision Transformer and makes it realizable for large image resolutions. The last chapter analyzes network architectures that employ the Swin Transformer backbone in common vision tasks such as classification, segmentation and image enhancement. It is assumed that the reader has some basic knowledge of neural networks and convolutional neural networks. The topics covered in this book should equip the reader with the fundamental tools needed to understand the latest developments in the field of Transformers.