We are presently observing a paradigm change in designing complex SoC as it occurs roughly every twelve years due to the exponentially increasing number of transistors on a chip. The present design discontinuity, as all previous ones, is characterized by a move to a higher level of abstraction. This is required to cope with the rapidly increasing design costs. While the present paradigm change shares the move to a higher level of abstraction with all previous ones, there exists also a key difference. For the first time advances in semiconductor manufacturing do not lead to a corresponding increase in performance. At 65 nm and below it is predicted that only a small portion of performance increase will be attributed to shrinking geometries while the lion share is due to innovative processor architectures. To substantiate this assertion it is instructive to look at major drivers of the semiconductor industry: wireless communications and multimedia. Both areas are characterized by an exponentially increasing demand of computational power to process the sophisticated algorithms necessary to optimally utilize the limited resource bandwidth. The computational power cannot be provided in an energy-efficient manner by traditional processor architectures, but only by a massively parallel, heterogeneous architecture. The promise of parallelism has fascinated researchers for a long time; however, in the end the uniprocessor has prevailed. What is different this time? In the past few years computing industry changed course when it announced that its high performance processors would henceforth rely on multiple cores. However, switching from sequential to modestly parallel computing will make programming much more difficult without rewarding this effort with dramatic improvements. A valid question is: Why should massive parallel computing work when modestly parallel computing is not the solution? The answer is:It will work only if one restricts the application of the multiprocessor to a class of applications. In wireless communications the signal processing task can be naturally partitioned and is (almost) periodic. The first property allows to employ the powerful technique of task level parallel processing on different computational elements. The second property allows to temporally assign the task by an (almost) periodic scheduler, thus avoiding the fundamental problems associated with multithreading. The key building elements of the massively parallel SoC will be clusters of application specific processors (ASIP) which make use of instructionlevel parallelism, data-level parallelism and instruction fusion. This book describes the automatic ASIP implementation from the architecture description language LISA employing the tool suite ”Processor Designer” of CoWare. The single most important feature of the approach presented in this book is the efficient ASIP implementation while preserving the full architectural design space at the same time. This is achieved by introducing an intermediate representation between the architectural description in LISA and the Register Transfer Level commonly accepted as entry point for hardware implementation. The LISA description allows to explicitly describing architectural properties which can be exploited to perform powerful architectural optimizations. The implementation efficiency has been demonstrated by numerous industrial designs. We hope that this book will be useful to the engineer and engineering manager in industry who wants to learn about the implementation efficiency of ASIPs by performing architectural optimizations. We also hope that this book will be useful to academia actively engaged in this fascinating research area.