at University of Maryland, College Park To be presented at CVPR 2023
Abstract
In this paper we introduce a rule-based, compositional, and hierarchical modeling of action using Therbligs as our atoms.
Introducing these atoms provides us with a consistent, expressive, contact-centered representation of action.
Over the atoms we introduce a differentiable method of rule-based reasoning to regularize for logical consistency. W
e release the first Therblig-centered annotations over two popular video datasets - EPIC Kitchens 100 and 50-Salads.
We also broadly demonstrate benefits to adopting Therblig representations through evaluation on the following tasks: action segmentation,
action anticipation, and action recognition over CNN and Transformer-based architectures. Code and data will be made publicly available.
We introduce the use of Therbligs in video understanding as a consistent, expressive, symbolic representation of sub-action.
Points of Contact indicated by the blue divider dashes are necessarily associated with Therbligs and/or their boundaries.
Because of the unambiguity of Points of Contact, Therblig boundaries gain precision and are non-overlapping.
On top of Therblig atoms we construct a framework for Rule Enforcement, enforcing greater logical consistency through commonsense rules.
This rule-based framework allows for the easy introduction of long-term constraints.
Therblig atoms are then composable into actions, which are in turn composable into activities.
Paper
Eadom Dessalene, Michael Maynord, Cornelia Fermüller, Yiannis Aloimonos.