Software Development and Consulting

Dr.-Ing. Lars Middendorf

Skills:
  • Computer Architecture and Embedded Systems
  • Firmware Development in C/C++ and Assembly
  • FPGA Designs based on VHDL and Verilog
  • Simulation and Verification with RTL and TLM
  • Design Automation, Tools and Compiler Construction
  • Computer Graphics, Geometric Modeling, Rendering, and CAD
  • Coaching, Talks, and Consulting
  • Contact:

    mail@lars-middendorf.de

    LinkedIn:

    Dr. Lars Middendorf

    Dissertation:

    Dynamic task scheduling and binding for many-core systems through stream rewriting with summa cum laude

    Experience:

    More than 15 years of professional experience in academia, industry and free-lancing.

    Professional Experience
    RG Technologies GmbH Software Developer for a CAD System June 2007 April 2009
    University of Potsdam Research Assistant May 2009 July 2011
    University of Rostock Research Assistant August 2011 May 2018
    Bosch Sensortec GmbH Software Architect and Project Manager June 2018 Today
    Skills

    Consulting

    Firmware Development

    Hardware Design

    FPGA

    Compilers and Tools

    Linux

    Programming Languages

    Projects (Selection)

    Flexible GPU Architecture
    Despite the computational power and memory bandwidth of modern GPUs, a main limitation of these architectures is often the lack of efficient on-chip communication between different shader cores. Hence, individual stages of the Direct3D and OpenGL rendering pipeline are programmable but its topology and data flow remain fixed. As a consequence, a novel hardware and software architecture has been designed to support an arbitrary amount of dynamic and recursive shader stages as well as complex data flow and light-weight synchronization within the rendering pipeline. The architecture has been evaluated using an FPGA prototype with partial OpenGL support.

    PCIe between FPGA and PC
    The PCIe interface offers both a low-latency and high-throughput connection to peripheral devices. In this project, a bi-directional FIFO has been implemented in system memory that can be read and written autonomously from the external device. A corresponding Linux driver in the kernel provides access to this FIFO via the file system and employs both interrupts as well as wake queues for optimal performance.

    Dynamic Task Management in Many-Core Systems
    When designing the software and hardware architecture of many-core systems with hundreds of processors on a single chip, a central problem is the scheduling and binding of work-items to execution units. In my execution model, tasks are self-timed and do not require explicit book-keeping by a central scheduler, so that also dynamic and recursive tasks can be managed and synchronized by local rewriting operations on the stream. An FPGA prototype consisting of 128 self-designed 32-bit RISC processors, a Network on Chip, and a hardware scheduler has been built. The system is able to manage several millions of dynamically created threads as a token stream and permits recursive programs to scale from 1 to 128 cores.

    Compiler: Hardware Synthesis of Recursive Functions
    Current high-level synthesis tools based on C/C++ offer only limited support for recursion and function pointers. I presented a novel approach for high-level synthesis that maps the program into a term rewriting system. Based on this concept, dynamic creation of threads, parallel recursive tasks and data-dependent branching can be supported in hardware.

    Mobile Camera-Based Evaluation Method of Inertial Measurement Units
    In order to support navigation, gesture detection, and augmented reality, modern smartphones contain inertial measurement units (IMU) consisting of accelerometers and gyroscopes. Although the accuracy of these sensors directly affects the soundness of mobile applications, no standardized tests exist to verify the correctness of the retrieved sensor data. For this purpose, a novel benchmark has been developed, which utilizes the camera of the phone as a reference to estimate the quality of its sensor data fusion.

    Dynamic Task Scheduling and Binding for Many-Core Systems through Stream Rewriting
    Multi and many-core systems offer numerous benefits like reduced energy consumption and latency, as well as improved throughput for both high-performance and low-power applications. Scalability is achieved by parallelization instead of high clock rates and can therefore overcome technological limitations like thermal heat or power issues. While functional units and also processor cores can be replicated at the expense of increased area, the utilization of the additional resources remains a more fundamental problem. Hence, beside the design of the actual hardware architecture, also the programming of a many-core system raises several challenges. Especially in case of irregular tasks and dynamic parallelism, the binding and scheduling of tasks to processor cores as well as the efficient communication and synchronization at the system level are not yet solved in general. As a consequence, the majority of static optimizations, which are based on an extensive knowledge of the behavior at design time, cannot be applied in this case. The model of computation, called stream rewriting, for the specification and implementation of highly current applications. Several many-core systems with up to 128 general purpose processors have been implemented and show the scalability of stream rewriting for complex examples and recursive algorithms.

    Publications

    Papers (Selection)

    Patents

    Impressum

    Dr. Lars Middendorf
    Eisenbahnstr. 5
    72072 Tübingen