Therbligs in Action: Video Understanding through Motion Primitives
Eadom Dessalene
Michael Maynord
Cornelia Fermüller
Yiannis Aloimonos
Perception and Robotics Group
University of Maryland, College Park

To be presented at CVPR 2023


In this paper we introduce a rule-based, compositional, and hierarchical modeling of action using Therbligs as our atoms. Introducing these atoms provides us with a consistent, expressive, contact-centered representation of action. Over the atoms we introduce a differentiable method of rule-based reasoning to regularize for logical consistency. W e release the first Therblig-centered annotations over two popular video datasets - EPIC Kitchens 100 and 50-Salads. We also broadly demonstrate benefits to adopting Therblig representations through evaluation on the following tasks: action segmentation, action anticipation, and action recognition over CNN and Transformer-based architectures. Code and data will be made publicly available.

We introduce the use of Therbligs in video understanding as a consistent, expressive, symbolic representation of sub-action. Points of Contact indicated by the blue divider dashes are necessarily associated with Therbligs and/or their boundaries. Because of the unambiguity of Points of Contact, Therblig boundaries gain precision and are non-overlapping. On top of Therblig atoms we construct a framework for Rule Enforcement, enforcing greater logical consistency through commonsense rules. This rule-based framework allows for the easy introduction of long-term constraints. Therblig atoms are then composable into actions, which are in turn composable into activities.


