Dr.-Ing. Lars Middendorf

Skills:	Computer Architecture and Embedded Systems Firmware Development in C/C++ and Assembly FPGA Designs based on VHDL and Verilog Simulation and Verification with RTL and TLM Design Automation, Tools and Compiler Construction Computer Graphics, Geometric Modeling, Rendering, and CAD Coaching, Talks, and Consulting
Contact:	mail@lars-middendorf.de
LinkedIn:	Dr. Lars Middendorf
Dissertation:	Dynamic task scheduling and binding for many-core systems through stream rewriting with summa cum laude
Experience:	More than 15 years of professional experience in academia, industry and free-lancing.

Consulting

Invited talks, presentations and surveys of embedded HW/SW architectures
Rapid prototyping and initial setup for embedded projects
Hands-on trainings, remote or on-site, small groups or one-to-one
Performance evaluation and benchmarking to support design decisions
Scientific expertise and reviews

Firmware Development

Ultra Low-Power Devices
C/C++, Assembly
ARM, Atmel
HW/SW Optimizations

Hardware Design

System-on-Chip Architectures
Application-specific Instruction Set Processors (ASIP)
Floating-Point Arithmetic and DSP Cores
On-chip Networks and Interconnects

FPGA

Xilinx and Altera
External Memory Interface
Communication: Ethernet, RS232, USB, PCIe, ...
Video Output: VGA and HDMI
Resource Optimizations
RTL Design using Verilog/VHDL

Compilers and Tools

LLVM/Clang Setup
Domain-specific Languages
Simulation and Verification

Linux

Driver and Kernel Development
Classic User Interfaces: GTK and Qt

Programming Languages

C/C++
C#
Assembly: x86/ARM/RISC-V
SystemC
VHDL
Verilog
Python
Java
MATLAB/Octave
...and many more

Flexible GPU Architecture
Despite the computational power and memory bandwidth of modern GPUs, a main limitation of these architectures is often the lack of efficient on-chip communication between different shader cores. Hence, individual stages of the Direct3D and OpenGL rendering pipeline are programmable but its topology and data flow remain fixed. As a consequence, a novel hardware and software architecture has been designed to support an arbitrary amount of dynamic and recursive shader stages as well as complex data flow and light-weight synchronization within the rendering pipeline. The architecture has been evaluated using an FPGA prototype with partial OpenGL support.

PCIe between FPGA and PC
The PCIe interface offers both a low-latency and high-throughput connection to peripheral devices. In this project, a bi-directional FIFO has been implemented in system memory that can be read and written autonomously from the external device. A corresponding Linux driver in the kernel provides access to this FIFO via the file system and employs both interrupts as well as wake queues for optimal performance.

Dynamic Task Management in Many-Core Systems
When designing the software and hardware architecture of many-core systems with hundreds of processors on a single chip, a central problem is the scheduling and binding of work-items to execution units. In my execution model, tasks are self-timed and do not require explicit book-keeping by a central scheduler, so that also dynamic and recursive tasks can be managed and synchronized by local rewriting operations on the stream. An FPGA prototype consisting of 128 self-designed 32-bit RISC processors, a Network on Chip, and a hardware scheduler has been built. The system is able to manage several millions of dynamically created threads as a token stream and permits recursive programs to scale from 1 to 128 cores.

Compiler: Hardware Synthesis of Recursive Functions
Current high-level synthesis tools based on C/C++ offer only limited support for recursion and function pointers. I presented a novel approach for high-level synthesis that maps the program into a term rewriting system. Based on this concept, dynamic creation of threads, parallel recursive tasks and data-dependent branching can be supported in hardware.

Mobile Camera-Based Evaluation Method of Inertial Measurement Units
In order to support navigation, gesture detection, and augmented reality, modern smartphones contain inertial measurement units (IMU) consisting of accelerometers and gyroscopes. Although the accuracy of these sensors directly affects the soundness of mobile applications, no standardized tests exist to verify the correctness of the retrieved sensor data. For this purpose, a novel benchmark has been developed, which utilizes the camera of the phone as a reference to estimate the quality of its sensor data fusion.

Dynamic Task Scheduling and Binding for Many-Core Systems through Stream Rewriting
Multi and many-core systems offer numerous benefits like reduced energy consumption and latency, as well as improved throughput for both high-performance and low-power applications. Scalability is achieved by parallelization instead of high clock rates and can therefore overcome technological limitations like thermal heat or power issues. While functional units and also processor cores can be replicated at the expense of increased area, the utilization of the additional resources remains a more fundamental problem. Hence, beside the design of the actual hardware architecture, also the programming of a many-core system raises several challenges. Especially in case of irregular tasks and dynamic parallelism, the binding and scheduling of tasks to processor cores as well as the efficient communication and synchronization at the system level are not yet solved in general. As a consequence, the majority of static optimizations, which are based on an extensive knowledge of the behavior at design time, cannot be applied in this case. The model of computation, called stream rewriting, for the specification and implementation of highly current applications. Several many-core systems with up to 128 general purpose processors have been implemented and show the scalability of stream rewriting for complex examples and recursive algorithms.

Papers (Selection)

Lars Middendorf, Felix Mühlbauer, Georg Umlauf, Christophe Bobda, "Embedded Vertex Shader in FPGA", IESS 2007, 155-164
Lars Middendorf, Christophe Bobda, and Christian Haubelt, "Hardware synthesis of recursive functions through partial stream rewriting" in Design Automation Conference (DAC), 2012 49th ACM/EDAC/IEEE, San Francisco, CA , June 2012
Lars Middendorf, C. Zebelein und C. Haubelt, "Dynamic Task Mapping onto Multi-Core Architectures through Stream Rewriting" in Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIII), Agios Konstantinos, 2013.
Lars Middendorf, C. Haubelt, "A novel graphics processor architecture based on partial stream rewriting" in Design and Architectures for Signal and Image Processing (DASIP), Cagliari, 2013.
Christian Haubelt, Florian Ludwig, Lars Middendorf, Christian Zebelein, "Using Stream Rewriting for Mapping and Scheduling Data Flow Graphs onto Many-Core Architectures“ in Proceeding of the Asilomar Conference on Signals, Systems, and Computers, Pacific Grove CA, USA, 2013
Lars Middendorf und C. Haubelt, "A Programmable Graphics Processor based on Partial Stream Rewriting" Computer Graphics Forum, Bd. 32, Nr. 7, pp. 325-334, 2013.
Lars Middendorf und C. Haubelt, "System Level Synthesis of Many-Core Architectures using Parallel Stream Rewriting" in Proceedings of the 2014 Electronic System Level Synthesis Conference (ESLsyn), San Francisco, CA , 2014
Lars Middendorf und C. Haubelt, "Scheduling of Recursive and Dynamic Data-Flow Graphs using Stream Rewriting" in Proceedings of Special Edition on Data-flow Programming Models and Machines (MPP’14), Paris, France, October 2014.
Lars Middendorf and C. Haubelt, "Dynamic task mapping of graphics processing applications on many-core architectures through stream rewriting" in Embedded Systems For Real-time Multimedia (ESTIMedia), 2015 13th IEEE Symposium on, Amsterdam, 2015, pp. 1-2.
Nils Büscher, Lars Middendorf, Christian Haubelt, Rainer Dorsch, Frederik Wegelin: "Improving Repeatability and Reproducibility of Mobile Tests for Inertial Measurement Units" In Proceedings of the Symposium on Engineering Interactive Computing Systems (EICS'16), pp. 149-158, ISBN: 978-1-4503-4322-0, Brüssel, Belgien, Juni 2016
Lars Middendorf, Christian Haubelt, "Supporting Static Binding in Stream Rewriting for Heterogeneous Many-Core Architectures In Proceedings of the International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-16), Lyon, September 2016
Johann-Peter Wolff, Florian Grützmacher, Rainer Dorsch, Rolf Kaack, Lars Middendorf, Christian Haubelt: Towards Automated Prototyping of Gesture Recognition Systems for Wearable Devices using Inertial Sensors In Proceedings of Smart Systems Integration, pp. 85-92, Barcelona, Spanien, April 2019 (Best Paper Award Nominee)

Patents

Rudolf Bichler, Thomas Claus, Rainer Dorsch, Christian Haubelt, Lars Middendorf: "Adjusting and Self-Testing Inertial Sensors, and Methods" China Patent Application CN000105699696A, Stuttgart, Deutschland, Juni 2016
Rudolf Bichler, Thomas Claus, Rainer Dorsch, Christian Haubelt, Lars Middendorf: "Apparatus and Method for Calibration and Self-Test of Inertial Sensors", South Korea Patent Application KR102016072055A, Süd Korea, Juni 2016
Rudolf Bichler, Thomas Claus, Rainer Dorsch, Christian Haubelt, Lars Middendorf: "Vorrichtung zum Abgleich und Selbsttest von Inertialsensoren und Verfahren", Offenlegungsschrift DE 10 2015 203 968 A1, Stuttgart, Deutschland, Juni 2016
Rainer Dorsch, Christian Haubelt, Thomas Claus, Rudolf Bichler, Lars Middendorf: "Device for Adjusting and Self-Testing Inertial Sensors, and Methods", US Patent Application US020160170502A1, Alexandria, U.S.A., Juni 2016
Rainer Dorsch, Christian Haubelt, Lars Middendorf, Sebastian Stieber: "Verarbeitungssteuerung eines Sensorsystems Offenlegungsschrift DE102017204514A1", Rostock, Deutschland, September 2018
Rainer Dorsch, Christian Haubelt, Lars Middendorf, Sebastian Stieber: "Processing Controller of a Sensor System Offenlegungsschrift WIPO Patent Application WO2018/166698A1", September 2018
Rainer Dorsch, Christian Haubelt, Lars Middendorf, Sebastian Stieber: "Processing Control of a Sensor System Taiwan Patent Application TW000201839606A", November 2018"
Rainer Dorsch, Christian Haubelt, Lars Middendorf, Sebastian Stieber: "Offenlegungsschrift: Korea Patent Application KR102019126391A", November 2019
Rainer Dorsch, Christian Haubelt, Lars Middendorf, Sebastian Stieber: "Processing Control of a Sensor System Offenlegungsschrift: China Patent Application CN000110753907A", Februar 2020
Rainer Dorsch, Christian Haubelt, Lars Middendorf, Sebastian Stieber: "Processing Control of a Sensor System Offenlegungsschrift: United States Patent Application US020200125523A1", April 2020
Rainer Dorsch, Christian Haubelt, Lars Middendorf, Sebastian Stieber: "Processing Control of a Sensor System United States Patent US000011263163B2", USA, März 2022

RG Technologies GmbH	Software Developer for a CAD System	June	2007	⟶	April	2009
University of Potsdam	Research Assistant	May	2009	⟶	July	2011
University of Rostock	Research Assistant	August	2011	⟶	May	2018
Bosch Sensortec GmbH	Software Architect and Project Manager	June	2018	⟶	Today