An Evaluation Framework of Trust Aware Recommender System

To date, there exists a variety of prediction approaches have been used in recommender systems. Among the widely known approaches are Content Based Filtering (CBF) and Collaborative Filtering (CF). Based on literatures, CF with users rating element has been widely used but the approach faced two common problems namely cold start and sparsity. As an alternative, Trust Aware Recommender Systems (TARS) for the CF based users rating has been introduced. The research progress on TARS improvement is found to be rapidly progressing but lacking in the algorithm evaluation has been started to appear. Many researchers that introduced their new TARS approach provides different evaluation of users’ views for the TARS performances. As a result, the performances of different TARS from different publications are not comparable and difficult to be analyzed. Therefore, this paper is written with objective to provide common group of the users’ views based on trusted users in TARS. Then, this paper demonstrates a comparison study between different TARS techniques with the identified common groups by means of the accuracy error, rating and users coverage. The results therefore provide a relative comparison between different TARS.


Introduction
Since the last decades, the cumulative progress of knowledge and information from the Internet technology has been tremendous. With the wide application of the technology, the peoples' ability to process the beneficial information is relatively crucial to everyone. To provide this beneficial information quickly from the huge repository of web and mobile applications, recommender systems have appeared and has gained wide attention from the community. Today, recommender systems have a significant impact to the way of peoples finding the best products, information and even other peoples and contacts. Recommender system plays a role to filter large information search space and to select the most suitable items that are likely to be more interesting and attractive to a user. Recommender systems have been proved to be beneficial used in many kinds of application domain such as online job directories, online libraries, e-commerce and social networks, including Facebook and LinkedIn. Besides that, with the rapid development of e-commerce, recommender systems have been utilized as an important tool for the seller and customers. Since the introduction, there exists many approaches have been used to implement a recommender system. One of the popular approaches is Collaborative Filtering (CF) that utilizing user ratings based on items [1]. Another approach is Content-Based Filtering (CBF) that uses content information of items in measuring the matching values between the items and users [2]. Additionally, demographic information, such as, age, gender and occupation, in the user profile have also been used to recommend items to the users [3][4]. Although recommender system has been widely used, some crucial problems remain appeared in the implementation for examples cold start and sparsity problems. Cold start problem appears due to the existence of new users or items that not received any ratings [5]. Furthermore, if the number of rating on the existing items is very small, the sparsity problem occurs. As the number of items is rapidly increasing while the users rating is progressively slow, the cold start and sparsity problems would create less rating coverage and inaccurate recommendations [3]. In order solve the problems, a recommender system with trust aware elements have been introduced [6][7]. The recommender system is called as Trust Aware Recommender System (TARS). It has been reported by many researchers that the accuracy of TARS is better than the traditional CF approach [7][8]. Since the introduction in the early of 2012, different techniques of TARS have been introduced to improve the cold start and sparsity problems. One of the current techniques is TARS with distrust element [3]. It is anticipated in this research that further performance observation of TARS with distrust should be conducted as the technique is still new and current evaluation has been difficult to be compared due to less structured of performances evaluation. In this research, the interest has been directed to observe the performances of TARS with distrust in different types of trusted users' views. The types of trusted users' view are priory identified based on previous studies. This paper is organized as follow. The next part provides research background of recommender systems and TARS followed with research methodology in part 3. Part 4 reports the results and discussions before the concluding remarks at part 5.

Research Background
This part describes different approaches of recommender system, including trust aware recommender systems.

Recommender System (RS)
Two common approaches for RS are Collaborative Filtering (CF) and Contents-based Filtering (CBF). CF recommender system utilizing a group of users' information and also the attribute similarity of the related items [9]. In other words, it provides recommendations to a user that are based on recommendations given by other users with similar interest or profiles. In other words, it considers the ratings provided by the related users. CF has been widely used by the majority e-commerce systems like Amazon, Lazada and Facebook. To date, CF can be classified into two sub-categories namely memory-based and model-based [8]. Memory-based approaches make predictions by utilizing the ratings from the active users, which stored in the memory caches of the users' devices [9][10]. Conversely, model-based approaches utilizing the construction of training model based on classification and clustering paradigms. Then, it will make prediction based on the training model onto another set of real [11]. Typical examples that used this approach are clustering models [12][13] and machine learning [14]. Computational intelligent approaches like fuzzy [15] and meta-heuristics algorithms [16][17] are also popular for these applications. Contrast with CF, Content-based Filtering Systems (CBF) recommender system utilizing information receives from the active users and data about the items associated. It makes recommendations by comparing the users' profiles that consist the content of document collections. The technique focuses more on the characteristics of the users and item rather than utilizing other data such user rating [3,8]. Without users rating inclusion, the technique has an advantage in recommending more accurate contents to users [2]. The extensively used of CF in recommender system has given an attraction for this research to focus on the Trust Aware Recommender System (TARS).

Trust Aware Recommender System (TARS)
Trust-Aware Recommender System (TARS) is basically the consequence of traditional CF approach. TARS reflects to trust link between users in order to generate recommendations [6]. Research has proved that TARS can efficiently overcome data sparsity and cold start problems, which appeared in the traditional CF approaches. The technique for TARS is common to the traditional CF. If the weight of each recommendation in traditional CF counted an active user similarity, TARS in [7] allow the inclusion of active user trust recommendation, which consists of two steps. The first step is the trust measurement calculated by trust metric used as a weight parameter of the second step to replace the user similarity weight. The second step is defined as in (1).
where max d is the maximum allowable propagation distance (MAPD) between users of the recommender system. The value of MAPD can be preset. Then, , au d is the active user a trust propagation distance to the recommender u. In TARS, the trust propagation distance refers to the number of hops in the shortest trust propagation path from the truster to the trustee. As in (1) and (2) used a measure of active user a to the recommender u, the network trust property is a kind of local network. The trust values are directly provided by user u to user a, therefore the trust establishment is explicit. Furthermore, researchers in [17] proposed new formulation of TARS that extended the basic Epinions dataset with distrust statement. The formula of calculating , au w in (2) has been changed as denoted in (3).
In (3) In (5), R L is the average path length of the network in the corresponding random network related to trust network, n is the size of trust network, and k is average of trust network degree. Empirical experiments have been conducted by researchers in [7] that observed the performances of TARS with in (1) on different sets of views from the Epinions dataset. The results from the experiments have shown a significant impact of the different views in relation to the different tested algorithms. An interesting research in [18] focuses on computing the predicted rating using a simple version of Resnick's prediction formula based on a single user as denoted in (6).
where _ a r and _ u r refer to the mean ratings of users a and u respectively. , ui r is the rating of item i given by user . u The trust score is then derived by averaging the prediction error on co-rated items as follows in (7).
,, ,, max , where , au I refer to the set of rated items of a and , u respectively, and max r is the size of the rating range. The results prove that the used of trust into CF can certainly improve the prediction accuracy while maintain the fair prediction coverage. Besides that, another research by [5] also adopt Resnick's prediction formula but compute trust based on mean squared distance (MSD) as shown in (8) The users whose trust value is greater than a threshold ,  i.e., ,  au t are regarded as trusted neighbours. The approach proved to resolve the cold start and sparsity significantly, but there is problem with the computational cost because it is very expensive. The following part describe the methodology used in this research that compare these TARS techniques. Due to the computational cost, the technique that introduced by [5] is not included in this study.

Methodology
This part describes the methodology used by means of performance metrics, parameters, TARS techniques and dataset.

Performance Metrics
Different performance metrics have been used to evaluate the quality of recommendations in recommender systems. The most common performance metrics used are the standard Mean Absolute Error (MAE) and the Coverage metrics. The MAE is the most widely used metric in recommendation research to measure the accuracy of the recommendations [19]. MAE measures the accuracy by computing the average absolute deviation between the difference between the predicted rating and actual rating assigned by the user. The lower the MAE, the more accurate the predictions are, allowing for better recommendations to be formulated. The formula to calculate MAE is denoted in (9).
where ij ar is the real rating related to user i and item , j and ij r is the predicted corresponding rate of user i to item . j While, n is the number of predicted ratings. The rating coverage of TARS is measured by using the following formula in (10).
where r n is the total number of items that the recommender system could predict and c n is the total number of items. Furthermore, Mean Absolute User Error (MAUE) are defined by [6] that calculates the mean error of each user from the MEA of all users. Lastly, user coverage is related with the percentage of users that a recommender system can provide predictions. To summarize, the performance metrics used in this research are MAE, MAUE, rating coverage and users coverage. The MAE and rating coverage of these TARS will be compared according to different types of trust users view described in the following.

Different Types of Trust Users View
The experiments focused on five views of trusted user as listed in the following Table 1. Users who give more than 10 times ratings Opinionated user Users who gives 1-4 times ratings and standard deviation of rating value is more than 1.5 Flexible user Users who give more than 10 times ratings and standard deviation of rating value is more than 1.5

Epinions Dataset
This research used Epinions and extended Epinions datasets.
Epinions.com is a web site that allows users to review various items (cars, books, music, etc.). Figure 1 shows the data representation for the Epinions dataset by [7]. Epinions dataset is divided into two set of datasets that is the basic Epinions dataset and the extended Epinions dataset. Basic Epinions Dataset: The basic Epinions dataset contains 49,290 users who rated a total of 139,738 different items at least once, writing 664,824 reviews and 487,181 issued trust statements. The dataset consists of 2 files: rating_data and trust_data.

Fig. 1: Epinions Dataset representations
The ratings_data contains the ratings given by users to items for example 1, 2, 3 and 5. Every line in the file of ratings_data has the following data, which are user_id, item_id and rating_value. The ranges of the user_id is from 1to 49290, item_id from 1 to 139738 and rating_value from 1 to 5. Figure 2 presents the diagram to illustrate the connection of user_id that gives rating_value to item_id in the rating_data file. As for example, the dataset is saved as the following Table 2. Furthermore, the second file named as trust_data contains trust statements issued by users. Every line in the file has the following data namely source_user_id, target_user_id and trust_state-ment_value. The ranges of the source_user_id and target_user_id is from 1 to 49290. Besides, the trust_statement_value is always 1 (since in the dataset there are only positive trust statements and not negative ones (distrust)). Figure 3 presents the diagram to illustrate the connection of source_user_id that gives trust_statement_value to target_user_id in the trust_data file. As for example, the dataset is saved in the file as the following Table 3. The user_rating file contains the trust and distrust value of the user. It stores source_user_id that is the member who is making the trust and distrust statement, target_user_id of the member being trusted or distrusted and trust_statement_value that is the value of trust whether the value equal to 1 for trust and -1 for distrust. The dataset is saved such as in the Table 4. Furthermore, the review file contains the information about each review that is written by a user. This file consists of three columns that include the following details, the object_review_id, user_re-view_id and review_id. The dataset is saved as the following in Table 5. The last file is the rating file. This file contains columns of details that include the item_id, which is the object that is being rated, user_id of the member who is rating the object and rating_value from 1 to 5. Value 1 means not helpful, value means 2 somewhat helpful, value 3 means helpful, value 4 means very helpful and value 5 means most helpful. Next, is the rate_status, with 1 means the member has chosen not to show his rating of the object and 0 is the member does not mind showing his name besides the rating. The dataset is saved as the following in Table 6.  item_id  user_id  rating_value  rate_status  139431556  237911  5  0  139431556  409066  2  0  41332100  241261  5  0  143101572  264696  4  1 Many researches on recommender used Epinions dataset because it is the largest and the most significant dataset. In addition, the data has been collected through real world data.

Results and Discussion
The results are presented in the following Table 7. The MEA and rating coverage of the three TARS were compared according to five different view namely all users, cold start users, heavy users, opinionated users and flexible users.  Table 8.

Conclusion
Trust is the measure of enthusiasm to believe in a user based on behavior within a specific context in a period time. In this research, the fundamental aspects and parameters of trust aware recommender systems have been defined. Then, empirical experiments have been conducted to compare the performances of three existing algorithms with different types of trusted users' views. Previous evaluations on the TARS have been conducted in different setting of parameters, which creates a difficulty for researchers to compare the performances of the different techniques. The experiments demonstrated in this research promotes a common way of evaluation to provide performances results that are comparable from different TARS techniques. It was found in this research that the error of accuracy from TARS with distrust element is smaller than the other TARS mainly in all users view. Similarly, all the rating coverage achieved by Distrust TARS in all the users view have been outperform than the two TARS. In term of user's coverage, Distrust TARS has more percentages in all and cold start users and the rests of views have been achieved by Basic TARS.
In future works, different types of parameters should be defined for the performances of TARS. How a different categories of rating values might affect all the performance metrics of TARS is one important question to be answered in the future research. Additionally, this research is not yet considering the types of propagation in TARS between local and global. Among the important issue that need to be studied in future is the local or global propagation in relative to the different groups of users view and ratings. In conclusion, this research has open a lot of performances issues of the existing TARS. Therefore, further research should be conducted to give the reader a visual grasp of the relative benefits of the different techniques