Fig. 1. Different motion representations acquired from the DAVIS sensor:
(a) Grayscale image from a frame-based camera (red bounding box denotes
a moving object). (b) Motion-compensated projected event cloud. Color denotes
inconsistency in motion. (c) The 3D representation of the event cloud in
(x, y, t) coordinate space. Color represents the timestamp with [red -
blue] corresponding to [0.0 - 0.5] seconds. The separately moving object
(a quadrotor) is clearly visible as a trail of events passing through the
entire 3D event cloud.
In this work we present unsupervised learning of depth and motion from
sparse event data generated by a Dynamic Vision Sensor (DVS). To tackle
this low level vision task, we use a novel encoder-decoder neural network
architecture that aggregates multi-level features and addresses the problem
at multiple resolutions. A feature decorrelation technique is introduced
to improve the training of the network. A non-local sparse smoothness constraint
is used to alleviate the challenge of data sparsity. Our work is the first
that generates dense depth and optical flow information from sparse event
data. Our results show significant improvements upon previous works that
used deep learning for flow estimation from both images and events.
Chengxi Ye*, Anton Mitrokhin*, Chethan Parameshwara, Cornelia Fermüller,
James A. Yorke, Yiannis Aloimonos.