Human Pose Estimation

Autonomous Driving for a Human World

TLDR: Using a Jetson TX2 I implemented DensePose (arXiv paper from FAIR below) to track human postures. This data was then fed into the autonomous car's path planning controller to estimate the likelihood of pedestrians entering the street given their pose. This information was used to make the car more responsive to humans when navigating busy streets.

Posture Recognition for Subject Awareness in Self-Driving Cars via Region-based Convolutional Neural Networks


Self-driving cars have long faced the hurdle of situational awareness and pedestrian accident prevention. This is especially true for Mobility-on-Demand (MOD) services operating within urban spaces, where the vehicle must operate seamlessly within the context of foot traffic. In order to effectively traverse these urban spaces, the car must be able to predict the future behavior of pedestrians. Current methods include trajectory mapping to predict future movement, but true situational awareness requires additional information about the pedestrians’ knowledge of the car. The central thesis is that pedestrians that are aware of the car will behave drastically differently from pedestrians who are not, and this information is vital to predict the future behavior and trajectory of pedestrians within the environment. This research aims to develop, test, and implement a neural network architecture for classifying pedestrian posture, and then incorporate that information into the control system governing the car’s behavior.

Plan of Work

Step 1: Background Research

To better understand the intellectual space and prior work, a background study will be conducted to sample current techniques for posture analysis. From these current practices, further study will be devoted to the most promising architectures to satisfy the task at hand. A comparison the most popular architectures including OpenPose, DensePose, etc. will be conducted. One architecture will be chosen as the basis for this research.

Step 2: Model Implementation

Once a model has been selected, a basic implementation will be run on a Jetson TX2 module. Jetson TX2 is the chosen hardware to allow transportability to mobile robots and drones. This will allow greater understanding into the architecture, runtime, as well as what postures can be recognized as distinct by the software.

Step 3: Data Collection

The main hurdle to be overcome in training the posture recognition model is determining what postures denote awareness of the car and to what extent. To elucidate the bright line between these states of awareness experimental data will need to be collected. The software loaded onto the Jetson TX2 will be run on a small vehicle through pedestrian heavy areas. Each subject within the vehicles field of vision in each frame will be labeled as either aware or not aware, and will then be correlated with the posture data from the Jetson TX2. This will serve as a labeled training dataset to determine awareness of the vehicle. This process will be repeated until a sufficiently large dataset has been collected to train the posture recognition model.

Step 4: Training and Validation

Using the data collected in Step 3, the posture recognition model will output the posture of each subject, which will be combined in a secondary model with information sources including the car’s position, image data, and lidar. The output of the model will be two-fold: a binary classification (aware or not aware) and a confidence level. Combined, this will serve as an additional input into the vehicles main decision-making algorithm.

Step 5: Integration

Along with trajectory, location, and distance to the vehicle, the awareness value and confidence score will be used to predict the behavior of dynamic targets as they move across the vehicle’s field of view. This information will then be used to plan a course of action for the vehicle to avoid all potential subject as it traverses urban spaces.


The goal of this research is to increase situational awareness in self-driving cars operating in busy urban environments. If the software is able to predict whether or not subjects are aware of the car’s presence to a high degree of accuracy, the car will be able to traverse these dynamic environments much more safely. This will hopefully lead to greater acceptance and more widespread usage of automated driving features.