Multiple User Behavior Learning for Enhancing Interactive Image Retrieval

The existing image retrieval approaches focus on the behavior of a single user only in each query without considering the correlation of the behaviors of multiple users in performing similar queries. In fact, users would have similar behaviors while they have similar expectations during queries. Accordingly, this paper therefore proposes the interactive image retrieval framework with the Similar Behavior Learning model. The framework consists of two stages. In the first stage, the framework retrieves images with the content-based feature vector as preliminary query result for user selection. In the second stage, the SBL model determines the similarity of the user behavior and annotates label code to the selected images instantly. The images are indexed by label code can be retrieved more efficiently. Meanwhile, the selected images in preliminary result are used as additional information for retrieving better results at the end of the current query. Experiments show the promising results.


Introduction
Image retrieval (IR) has been widely studied in the past decades. In general, the images with similar low-level features may not have similar high-level features, known as semantic gap [1]. The semantic gap is one of the challenges in image retrieval. To alleviate such a gap, relevant feedback (RF) technique [2] has been proposed for collecting the user behavior information, like the selection of the preferred images at the end of each query session. User behavior information is important for IR, which can help the retrieval result closer to human perception.
In the literature, the IR methods can be summarized into two major lines: supervised learning and reinforcement learning. Both supervised learning [3] and reinforcement learning [4] aim to extract useful knowledge from user behaviors to improve the accuracy of retrieval result. Specifically, given an IR model under the supervised learning setting, user behavior's logs in each session can be considered as labeled data to train the model's optimized parameters before the system operation. By contrast, under reinforcement learning setting, the model keeps learning through user behavior in each session. The outcome of model learning in each session can contribute to improve retrieval performance in the subsequent query sessions.
However, existing IR methods that consider a single user behavior information only still suffers from the following issues: (1) They lacks of considering the correlation of the behaviors of multiple users in performing similar queries. In fact, users would highly likely have similar behaviors when they have similar expectations for similar queries. (2) From the perspective of human perception, an image would be understood or described differently by different users. Therefore, identifying a query image from a group of users rather than a single user will assign the label more accurately in each session.
To this end, we therefore propose a novel IR approach with a Similar Behavior Learning (SBL) model, which consists of two stages, preliminary query and final query respectively. Given a query image, we call a query session if these two stages are completed. Specifically, in the first stage, the  approach generates a preliminary query result based on the query image. The images are sourced from two different retrieval aspects: the unlabeled images with the highest similarity to the query image in low-level features, and the labeled images learned by the SBL model with the highest similarity to the query image. Then, the unlabeled and labeled images are randomly listed in the preliminary query result, where the source of those images is unknown to the user. In the second stage, the user is required to select the preferred images from the preliminary query result. The user image selection is considered as user behavior. Subsequently, the selected images are considered as new query images used for the second retrieval, thereby generating the final query result. Meanwhile, the assignment of existing or new label to selected images in the preliminary query result is determined by the SBL model. If only unlabeled images are selected, it indicates that the labeled images are not user preferred images. In other words, this occurs when there is no previous user behavior matched with the current query image. Thus, the unlabeled images should be assigned a new label. By contrast, when both unlabeled and labeled images are selected, it indicates that user behavior is similar to the previous user behaviors. Therefore, the new selected unlabeled images should be assigned the same label as the labeled images. Labeling is considered a learning outcome and saved in the SBL model database for the subsequent query sessions. Through multiple query sessions, the SBL model can learn user behavior from multiple users for labeling the selected images. By considering user behaviors from multiple users, the represented feature vector of each label in the SBL model database is updated in every query session based on the frequency of the selected images. The basic procedure of the proposed approach is shown in Figure 1.
With the SBL model, the proposed approach not only learns current user behavior and the correlation of the behaviors of other users, but also helps enhance the retrieval result that is closer to human perception by assigning more correct labels.

Example-based Image Retrieval
The proposed IR system collects the information required for the objective and subjective approaches to retrieve user preferred images. The content-based information [5] considered as the objective information is usually visual information that can be extracted from images themselves, e.g. image shape, color, or texture [6,7]. However, there exists a semantic gap issue [8,9] that the content-based information may not match with human perception. In other words, images with similar visual features for the image retrieval system may not correspond with user preference. For example, a "red apple" and a "red cup may be considered as similar images due to the same color (and shape) information. Obviously, the images of "red apple" and "red cup" are not both what user expected. To bridge this gap, previous works [10] have investigated hybrid-based methods for combining content-based and context-based [11][12][13] information to retrieve user preferred images.

Feedback-based Image Retrieval
In contrast to content-based information, user behavior can also be considered as objective information but is closer to human perception. To fully leverage it, some researchers have combined user behavior with supervised learning method. In the study [14], the correlation between interaction signals and user examination analysis is calculated in a Web image search. The authors proposed a Gridbased User Browsing Model (GUBM) inspired by analyzing commercial search logs. The model can capture user behaviors like cursor hovering and alleviate position bias during the interaction. In addition, the model can estimate the topic relevance and quality of images. In the study [15], user behavior data is utilized, like click-through features, browsing features, query-text features, and human judgments with a six-point scale rating from the log file which saved by search engines. The log file requires preprocessing for analysis to optimize the ranking function. In the study [16], top images retrieved by measuring the similarity of the features of color moments and Gray Level Co-occurrence Matrix (GLCM) texture are used as voters to select the most effective similarity coefficient for the final query result.
Relevant Feedback (RF) technique [2] has been proposed to collect user behavior information, like the selection of the preferred images at the end of each query session. User behavior information can be considered as high-level feature, i.e., semantic information, to improve retrieval accuracy. This method can be classified into intra-query and interquery learning methods [17][18][19]. The intra-query learning method (also known as short-term learning) utilizes user behavior information within the current query session only to improve retrieval accuracy . Thus, the information of user behavior is not going to be reused after the query sessions. On the other hand, in the inter-query learning method (also termed long-term learning), the history of user behavior information is saved and used as labeled data during learning.
The RF technique has been attracted a lot of attention to be applied in IR to mitigate semantic gap issue mentioned above. In typical content-based IR, low-level features are utilized, like shape, color, or texture, to retrieve user preferred images. To further improve the performance of the retrieval results, the user identifies the images in the current retrieval result and provides feedback on whether the images are relevant or irrelevant. The feedback can be considered as semantic information that can combine with low-level information. By combining the high-level and low-level features, the performance of retrieval can be significantly enhanced.
In the study in [20], the feedback information of each query is saved in the "concept database" as additional semantic information of images in the database. With accumulating information learned from previous feedback in queries, the performance of retrieval can be further improved.

Reinforcement Learning
Reinforcement learning (RL) [21,22] is one of the main research areas in machine learning. Unlike supervised learning, RL does not require labeled data for model training, and the performance of the RL framework improves sequentially through continuous learning. A typical example of RL is Q-learning [23], where the system generates a Q- Table that records all of the states of each session, giving an award for positive status and penalizing negative status. After a certain number of sessions, the system can learn a perfect Q- Table  that optimizes the chance of achieving the expected result. In the study [24], a weighted trace transform is applied to address the problems related to IR. Through reinforcement learning, the complex parameters of model can be continually fine-tuned to improve the performance. In the work [25], an interactive interface is provided that allows users to select relevant images during the query sessions. Users can explore the current images on display by scoring them. In the study [26], its focus is to integrate Relevant Feedback (RF) techniques by using reinforcement learning. The retrieval system performs relevance learning based on the previous query, and the learning scheme is continually updated to the next session as in long-term learning.
In this work, we propose a novel IR approach which learn effective information from multiple user behaviors which are closer to human perception, to essentially alleviate the semantic gap issue.
In this section, a novel two-stage IR approach with the SBL model is proposed. The approach consists of the preliminary and final query stages in each query session. The SBL model can learn user behavior from the selected images in the preliminary query results. By learning from multiple users' query sessions, the accuracy of image retrieval can be significantly improved. In the following sections, we introduce the two-stage IR approach and the SBL model in detail.

Two-Stage Image Retrieval Approach
The proposed IR approach consists of two stages, the preliminary and final query stages, respectively. Firstly, the user needs to provide a sample image as query to retrieve relevant images. These retrieved images are viewed as the preliminary query result. Then, the user is required to select preferred images from the preliminary query result. Not all preferred images are required to be selected. The number of selected images in the preliminary query result only affects the learning rate and the speed of image labeling in the SBL model database. The selection in the preliminary query result is termed as user behavior in this paper. The approach retrieves images as the final query result based on the user selection in the preliminary query result. Meanwhile, the SBL model labels the selected images in the preliminary query result by comparing them with the previous multiple user behavior records in the SBL model database. The basic procedure of the proposed approach is shown in Figure 1.
Let q be the query image given by a user, x j represents the j-th images in the image database. Following the work [27], we describe images as a 109-dimension feature vector, viewed as the low-level feature information. Formally, the 109-dimension (N ) feature vector are denoted as f (q) and f (x j ). f n (q) and f n (x j ) denote the value of nth dimension in feature vector. The similarity between the query image q and the image x j in the image database is calculated in Eq. 1: Similar images are shown in the preliminary query result at the first stage. Then, the user select his (her) preferred images from the preliminary query result. These selected images are considered as new query images used for the second retrieval. Subsequently, the approach will form the final query result by measuring the distance between the selected images and the images in the image database. Meanwhile, the selection of images from the preliminary query is considered as user behavior. Under this assumption, users will have similar behavior, i.e. selection behavior, if they have similar expectations during queries. Therefore, the SBL model can match the current user behavior with the previous records in the SBL model database. Then, the selected images are instantly assigned a new or existing label according to similar behavior matching, as learning result. The learning result is then saved in the SBL model database and will be used in the next query sessions to improve the retrieval accuracy.

Similar Behavior Learning (SBL) Model
To determine the similarity of current user behavior and historical multiple users behavior, the SBL model is integrated into the query processing, as shown in Figure 1. The learning procedure of the SBL model is as follows: Initially, no learned information is stored in the SBL model database. Then, the approach starts retrieve images based on the similarity of the low-level features between the query image q and the images x j in the image database using Eq. 1 as the preliminary query result. The SBL model starts to learn user behavior information after users select their preferred images from the preliminary query result. Let P = {p 1 , p 2 , ..., p |P | } be the set of selected images from the preliminary query result. Because there is no previous recorded user behavior that can be used to match in the SBL model database, all selected images from the preliminary query result are assigned a new label l p . The feature representation of the corresponding label l p is calculated by Eq. 2 and saved as f (l p ) in the SBL model database.
When the number of query sessions more than 1, it means that there is already learned user behavior information saved in the SBL model database. From now on, the retrieved images in the preliminary query result are sourced from two subsets with different retrieval strategies. We denote these two image subsets as I A and I B , respectively. I A consists of the unlabeled images that are retrieved based on the lowlevel feature vector distance between the query image q and the unlabeled images from image database by Eq. 1. In I B , the images are retrieved based on the similarity between the feature vector of query image q and the feature representation of label f (l) which are learned by the SBL model, as shown in Eq. 3.
The similarity of user behavior depends on the user selected images P in the preliminary query result. Let G A and G B be the number of user selected images derived from I A and I B , respectively. From a practical point of view, there are three cases that may occur, as shown in Figure 2. Among them, the images marked with "A" come from I A while those marked with "B" from I B . The images with the red rectangle are the user selected images in the preliminary query result. The detailed description of three cases is as follows: Case 1 -(G A > 0 and G B = 0). Only the unlabeled images in I A (marked as A) are selected (see an example for case 1 in Figure 2(a)), i.e., P ⊆ I A . We can assume that the images with label l in I B that have the highest similarity with the query image q in SBL model but are not user preferred. As a result, the SBL model determines that the current user behavior is not similar to any of the previous multiple user behavior. Thus, the selected images in the preliminary query result will be assigned a new label r and saved in the SBL model database. Besides, the confidence of selected images  is assigned 1 in which confidence represents the selected frequency. The effect of confidence is to enhance the important of user preferred images.
Case 2 -(G A > 0 and G B > 0). The unlabeled images in I A (marked as A) and labeled images in I B (marked as B) are both selected (see an example for case 2 in Figure 2(b)), i.e., P = {P 1 ∪ P 2 | P 1 ⊂ I A , P 2 ⊂ I B }. We can consider that the unlabeled images in I A are similar to the labeled images in I B . Consequently, the SBL model determines that the current user behavior is similar to a user behavior from previous sessions. The selected unlabeled images should be assigned the same label l as those in I B . Besides, the confidence of selected labeled images will be incremented by 1.
Case 3 -(G A = 0 and G B > 0). Contrary to case 1, if only the labeled images in I B (marked as B) are selected (see an example for case 3 in Figure 2(c)), i.e., P ⊆ I B . Under this circumstances, the selected images in the preliminary query result will keep itself label l. But the confidence of selected labeled images still need to be incremented by 1.
After the user image selection in the preliminary query result, the SBL model assigns a new or existing label to the selected images according to the similarity of multiple user behavior. The algorithm of label annotation with SBL model is summarized in Algorithm 1.
It is worthwhile to note that the retrieved images (I A ∪I B ) in the preliminary query result have been randomly shuffled before user start select preferred images. The reasons are as follows: First, it makes the equal probability to occur each case if the user selects too few images or only selects the first few images. Second, if the labeled images in I B are not selected, it provides more interpretation to analyze why the label of I B does not match the user preference.
The confidence of images in the SBL model database is changed after each query session, so the feature representation of the corresponding label should be updated after each query session. The confidence c is the frequency of images selected in user behavior saved in the SBL model database. In other words, the frequency of the same label is assigned to the image can be considered as the correctness of the label annotation. For the image with higher c, its image feature vectors can more accurately represent their corresponding label l p because they are selected by users repeatedly. Hence, Based on our experimental observations, the change caused by user behavior learning in each session is relatively small. So it avoids overwriting the effect of earlier learning. Besides, from Eq. 4, the f (l p ) is determined by c(s p ) and f (s p ), so individual error events do not have a fatal impact on the whole learning process. The average precision (AP) values are used as evaluation metrics for the following experiments. The formulation of average precision is shown in Eq. 5, 6. Here, P img indicates user selected images, i.e., preferred images, R img is final retrieved images by query, D img represents images in image database.
Two experiments are conducted to demonstrate the effectiveness of our proposed approach with the SBL model. In the first experiment, the accuracy of retrieving user preferred image with SBL model is studied. We use the same concept images as query image repeatedly and see the improvement of retrieval accuracy compared with baseline (without SBL model ). Specifically, user give the same concept query image as input with 10 times. Then, user selects preferred images from the preliminary query result in each query session.
Based on the user selection behavior in preliminary query result in each session, the selected images are assigned new or existing label. It will be conducive to improve the accuracy of preferred image retrieval since our approach consider multiple user behavior. The experimental result is shown in Figure 3.
From Figure 3, we can see that the average precision (AP) of the query result is increased with the number of query sessions. Especially, the AP of the 2 th query sessions significantly improves by 27.2% compared with 1 th query sessions in which no learned information is stored in the SBL model (Baseline). It indicates that considering multiple user behavior can effectively bridge the semantic gap and retrieve more user preferred images. Besides, the average precision becomes stable after the 7 th session, since it has been occurred case 3 mentioned above.
Furthermore, we show the retrieval results with Top@24 in 1 th session and 10 th session, as shown in Figure 4. We observe that almost 58% images are correctly retrieved in It strongly verifies that our proposed two-stage image retrieval approach can significantly improve retrieval accuracy and alleviate semantic gap issues well.
In the second experiment, we simulate different user performing queries with 10 query images representing five concepts. The sequence of concept or user query image input is randomly assigned.
As shown in Table 1, it includes the number of sessions, the images representing concept, the selection of the image ID in the preliminary query result, and the SBL log. The column "Similar Behavior Learning Log" records the changes in the image ID with assigned label l p and the corresponding confidence value c(s p ) in each session. The image IDs represented in bold show the data updated in each session. From the table, we can see that the selected images representing different concepts are annotated with different label, i.e., case 1. In contrast, images representing the same concept are annotated with the same label correctly, i.e., case 2 and case 3.

Conclusion
To enhance image retrieval performance and bridge the semantic gap issue, we have explored using multiple users behavior as additional information for long-term learning. Thus, we have proposed a novel two-stage image retrieval approach with the SBL model, which can combine the lowlevel features and the high-level features for image retrieval. The preliminary query stage is added to collect valuable in-formation, i.e., user behaviors. Then, the proposed SBL model can instantly determine the similarity of user behaviors with previous cases. The selected images will be annotated as new or existing label and saved in the SBL model database, which is updated in every session. This benefits the subsequent query sessions by providing faster speed and generating more accurate preliminary query results for user selection. With the annotated images learned by the SBL model, significant improvement has been demonstrated by the experiments.