He and Aggarwal developed software that analyses each frame of
footage and identifies clusters of pixels matching a primitive model of
the human body. It then examines the interplay of different clusters,
in order to classify interactions between individuals. Two videos show
the system classifying a hug, and then classifying a push (both .avi format).
Many
interactions can be visually ambiguous, however. A person offering
someone a stick of gum or a cigarette can look similar to someone being
threatened with a knife, for example. To cope with this, Park and
Aggarwal chose to build up a profile for each type of behaviour.
Park
calls it a "semantic analysis" of the interaction. This means several
different factors are considered. For example, when identifying two
people shaking hands, their hands must not only be close, but must also
move in synchrony.
They meticulously coded a
description of these key characteristics, which the software searches
for when analysing a scene. This allows it to assign a probability that
a particular activity is being observed. At the moment, the system has
to capture the interaction from side-on to make its evaluation.
12:52:03 PM
|