Abstract

In this paper we introduce a rule-based, compositional, and hierarchical modeling of action using Therbligs as our atoms. Introducing these atoms provides us with a consistent, expressive, contact-centered representation of action. Over the atoms we introduce a differentiable method of rule-based reasoning to regularize for logical consistency. W e release the first Therblig-centered annotations over two popular video datasets - EPIC Kitchens 100 and 50-Salads. We also broadly demonstrate benefits to adopting Therblig representations through evaluation on the following tasks: action segmentation, action anticipation, and action recognition over CNN and Transformer-based architectures. Code and data will be made publicly available.

We introduce the use of Therbligs in video understanding as a consistent, expressive, symbolic representation of sub-action. Points of Contact indicated by the blue divider dashes are necessarily associated with Therbligs and/or their boundaries. Because of the unambiguity of Points of Contact, Therblig boundaries gain precision and are non-overlapping. On top of Therblig atoms we construct a framework for Rule Enforcement, enforcing greater logical consistency through commonsense rules. This rule-based framework allows for the easy introduction of long-term constraints. Therblig atoms are then composable into actions, which are in turn composable into activities.

Paper

Eadom Dessalene, Michael Maynord, Cornelia Fermüller, Yiannis Aloimonos.

[Paper]