Feature Centric Multi-modal Information Retrieval in Open World
Environment (FemmIR)
Abstract
Multi-modal information retrieval has great implications for search
engines, situational knowledge delivery and complex data management
systems. Existing cross-modal learning models use separate information
models for each data modality and lack the compatibility to utilize
pre-existing features in an application domain. Moreover, supervised
learning methods lack the capability to include user preference to
define data relevancy without training samples and need
modality-specific translation methods. To address these problems, we
propose a novel multi-modal information retrieval framework (FemmIR)
with two retrieval models based on graph similarity search (RelGSim) and
relational database querying (EARS). FemmIR uses extracted features from
different modalities and translates them into a common information
model. For RelGSim, we propose to build a localized graph for each data
object with the features and define a novel distance metric to measure
the similarity between two data objects. A neural network based graph
similarity approximation model is trained to map the data objects to a
similarity score. Furthermore, for handling feature extraction in an
open world environment, appropriate extraction models are discussed for
different application domains. To tackle the problem of finer attribute
analysis in text, a novel human attribute extraction model is proposed
for unstructured text. Contrary to existing methods, FemmIR can
integrate application domains with existing features and can include
user preference for relevancy determination for situational knowledge
discovery. The single information model (common schema or graph) reduces
the data representation overhead. Comprehensive experimental results on
a novel open world cross-media dataset show the efficacy of our models.