![]() |
Dr.-Ing. Lars Middendorf
|
RG Technologies GmbH | Software Developer for a CAD System | June | 2007 | ⟶ | April | 2009 |
University of Potsdam | Research Assistant | May | 2009 | ⟶ | July | 2011 |
University of Rostock | Research Assistant | August | 2011 | ⟶ | May | 2018 |
Bosch Sensortec GmbH | Software Architect and Project Manager | June | 2018 | ⟶ | Today |
Consulting
Firmware Development
Hardware Design
FPGA
Compilers and Tools
Linux
Programming Languages
Flexible GPU Architecture Despite the computational power and memory bandwidth of modern GPUs, a main limitation of these architectures is often the lack of efficient on-chip communication between different shader cores. Hence, individual stages of the Direct3D and OpenGL rendering pipeline are programmable but its topology and data flow remain fixed. As a consequence, a novel hardware and software architecture has been designed to support an arbitrary amount of dynamic and recursive shader stages as well as complex data flow and light-weight synchronization within the rendering pipeline. The architecture has been evaluated using an FPGA prototype with partial OpenGL support.
PCIe between FPGA and PC The PCIe interface offers both a low-latency and high-throughput connection to peripheral devices. In this project, a bi-directional FIFO has been implemented in system memory that can be read and written autonomously from the external device. A corresponding Linux driver in the kernel provides access to this FIFO via the file system and employs both interrupts as well as wake queues for optimal performance.
Dynamic Task Management in Many-Core Systems When designing the software and hardware architecture of many-core systems with hundreds of processors on a single chip, a central problem is the scheduling and binding of work-items to execution units. In my execution model, tasks are self-timed and do not require explicit book-keeping by a central scheduler, so that also dynamic and recursive tasks can be managed and synchronized by local rewriting operations on the stream. An FPGA prototype consisting of 128 self-designed 32-bit RISC processors, a Network on Chip, and a hardware scheduler has been built. The system is able to manage several millions of dynamically created threads as a token stream and permits recursive programs to scale from 1 to 128 cores.
Compiler: Hardware Synthesis of Recursive Functions Current high-level synthesis tools based on C/C++ offer only limited support for recursion and function pointers. I presented a novel approach for high-level synthesis that maps the program into a term rewriting system. Based on this concept, dynamic creation of threads, parallel recursive tasks and data-dependent branching can be supported in hardware.
Mobile Camera-Based Evaluation Method of Inertial Measurement Units In order to support navigation, gesture detection, and augmented reality, modern smartphones contain inertial measurement units (IMU) consisting of accelerometers and gyroscopes. Although the accuracy of these sensors directly affects the soundness of mobile applications, no standardized tests exist to verify the correctness of the retrieved sensor data. For this purpose, a novel benchmark has been developed, which utilizes the camera of the phone as a reference to estimate the quality of its sensor data fusion.
Dynamic Task Scheduling and Binding for Many-Core Systems through Stream Rewriting Multi and many-core systems offer numerous benefits like reduced energy consumption and latency, as well as improved throughput for both high-performance and low-power applications. Scalability is achieved by parallelization instead of high clock rates and can therefore overcome technological limitations like thermal heat or power issues. While functional units and also processor cores can be replicated at the expense of increased area, the utilization of the additional resources remains a more fundamental problem. Hence, beside the design of the actual hardware architecture, also the programming of a many-core system raises several challenges. Especially in case of irregular tasks and dynamic parallelism, the binding and scheduling of tasks to processor cores as well as the efficient communication and synchronization at the system level are not yet solved in general. As a consequence, the majority of static optimizations, which are based on an extensive knowledge of the behavior at design time, cannot be applied in this case. The model of computation, called stream rewriting, for the specification and implementation of highly current applications. Several many-core systems with up to 128 general purpose processors have been implemented and show the scalability of stream rewriting for complex examples and recursive algorithms.
Papers (Selection)
Patents
Dr. Lars Middendorf Eisenbahnstr. 5 72072 Tübingen