Effect of Number of Versions on Receiving High Citations

Nowadays, the h-index is an index that attempts to measure both the productivity and impact of the published work of a scientist or scholar. The index is based upon the set of the scientist's most cited papers and the number of citations that they have received in other publications. Besides, the most commonly used measure of journal quality is Impact Factor. This is a number which attempts to measure the impact of a journal in terms of the average number of citations to recent articles published in the journal. So, receiving more citation is very important for authors and journals to get high h-index and impact factor. In this paper, we tried to analyses the effect of the number of available version from the web on receive more citations. We analyzed 10162 papers which are published in Scopus database in year 2010. Then we developed a software to collect the number of citations and versions of each paper from Google Scholar automatically. The results show that …


Introduction
Jorge E. Hirsch in [1] proposed the Hirsch-index which is commonly abbreviated as the H-index. The h-index is an index that attempts to measure both the productivity and impact of the published work of a scientist or scholar [2,3]. The index is based upon the set of the scientist's most cited papers and the number of citations that they have received in other publications [1]. Besides, one of the measures of reputation and academic standard of a journal is the so-called 'Impact Factor', which, with some qualifications, is the average number of citations for papers published in a particular journal [4]. It is obtained as the ratio of the total number of citations received by the papers published in the journal to the number of papers published in the journal [5,6]. So, receiving more citation is very important for authors and journals to get high h-index and impact factor [7,8]. In this research, we analyses the effect of the number of available version from the web on receive more citations. We targeted to analyze all of published papers in year 2010 related to five top university of Malaysia which appear in the Scopus database. To achieve on this purpose, 10162 papers which are published in Scopus database in year 2010 are selected. Then we developed software to collect the number of citations and versions of each paper from Google Scholar automatically.

Definition of Citation
A bibliographic citation is a reference to a book, article, web page, or other published item. Citations should supply detail to identify the item uniquely [9,10]. Citation is a reference to a published or unpublished source. Citing sources points the way for other scholars [11].

Important Purposes Citation
 To avoiding plagiarism or support of academic honesty [12].  To attribute prior or unoriginal work and ideas to the correct sources [13].  To allow the reader to determine independently whether the referenced material supports the author's argument in the claimed way  To help the reader gauge the strength and validity of the material the author has used.

Number of Version for a Paper
Publishing a research paper in a scholarly journal is necessary but not sufficient for receiving citations in the future. We need to ensure that the paper is visible to the relevant users and authors. When the authors published a paper, the publisher will put the published version of the paper to own website and repository. This means we have a product, and we also have one shop then if somebody wants to have our product must come into our shop and buy it. But if we have so many version then we can find more customer. For example, somebody made a pen and put on the one shop to sell it, beside somebody else made another pen and put on the 20 shop to sell it. It is appeared that the pen in the 20 shops is more visible for customers and then this pen will sell more.
The question is that, due to copyright roles how we can publish our paper in more than one journal to get more websites for advertise our paper. Actually, no need to publish in more than one journal but the authors can use some tools that help in enhancing the visibility and readership of research papers. Effective use of these tools can result in increased citations and, thus, improve the h-index of the author and journal impact factor. Here is a sample of tools to increase the visibility of one's published papers:

Strategies for Enhancing the Impact of Research Dissemination
 Submit the manuscript to a digital subject repository.
 Submit the manuscript to an institutional repository.  Set up a web site devoted to the research project and post manuscripts of publications and conference abstracts [14].  Take advantage of SEO (search engine optimization).  Present preliminary research findings at a meeting or conference.  Follow up preliminary research findings presented at a meeting or conference with a published manuscript [15].  Consider submitting the same article to a journal in a different language as a "secondary publication."  Start a blog devoted to the research project [16].  Contribute to Wikipedia.  Contribute to a social network [17].

Data identification and collection
In this research, five research universities of Malaysia namely Universiti Kebangsaan Malaysia (UKM), University of Malaya (UM), Universiti Putra Malaysia (UPM), Universiti Sains Malaysia (USM) and Universiti Teknologi Malaysia (UTM) were selected to conduct this analysis. We collected 10,162 papers related to yare 2010 from the Scopus database, and the extraction process for collecting these papers was done in 13 July 2013 11:00 AM (UTC +8:00) for 2 hours. The process of data collection is shown in the To collect the numbers of citation and version of these articles, Google Scholar search engine was used. We decided to focus on this tool because of its popularity and ability to provide a simple way to find the citations of articles. Also, Google Scholar database covers more resources and it reflects more versions and citation in comparison with the other databases such as ISI Thomson Reuters or Scopus. Therefore, we developed a software to collect the number of citations and versions of each paper from Google Scholar automatically.

Developed the Software
In the previous section, more than 10,000 paper titles were extracted from Scopus. This number of records should have been processed for the number of citations and versions in only one day. Because each day, new citations and versions might be created which results the incompetency in our data analysis. In order to overcome this issue, a server-based software application was developed to retrieve citations and versions.
ASP.NET platform was selected to software development, and launched on a high-speed and band-width server to be able to process all these 10,000+ records in few hours.

Software Algorithm
It searched every single title in Google Scholar in 2 times. The first time with quotation marks (") and the second time without quotation marks. In the resulted page of Google Scholar, the titles and description might be included with some HTML tags as below: <b> </b>: For the keywords match the search query, they would be bold to show the matched title with the keywords.
<i> </i>: This tag was also found in few titles in Google Scholar search results.
<sup> </sup> <sub> </sub>: Those titles with the superscripts and subscripts (e.g. Chemical formulas) consist of these tags to show the titles properly.
In order to extract and find the correct matching title in Google Scholar, all these HTML tags were removed from the titles. But still there was another challenge, and it was the different spacing. Some of the titles extracted from Scopus were 1 or 2 spaces different with those indexed in Google Scholar. So, after removal of all tags from titles, all spaces were also removed to find the correct match of the paper in Google Scholar results. In some cases, there were more than two items matched with the full title, and then in this case, the year and the authors' names of the published data were matched to find the relevant record.
If the title, were found, then the number of citations and versions were extracted from the page and it was updated in the database; and if it was not found, it would be marked as "not found" in the database.
The whole extraction process was done in 15 July 2013 12:00 AM (UTC +8:00) for 4 hours. After completion of the data extraction, those not found records, were also checked manually to make sure that system and data analysis have minimum incomplete data and no record has been missed on Google Scholar. The structural procedure is visualized in the Figure 8.  Figure 8: The structure of developed software

Data Analysis
In order to analysis the data statistically, five top universities of Malaysia selected. Table 1 shows the collected data of these universities for year 2010 from Scopus database. The result of spearman correlation coefficient revealed that there was positive significant association between number of citation and number of version for different universities publication. The overall correlation coefficient was a moderate and positive(r=0.431, p<0.01). Relationship between number of citation and number of version shows in Table 2.

Comparison among 5 Malaysian top universities for number of citation and number of versions
As both number of citation and number of versions were not distributed normally, Kruskal Wallis test, a non-parametric method, was applied to test the differences among these universities. The results revealed that there were significant differences among these 5 universities for both number of citation and version (table 3).

Relationship between type of document and type of university for number of publication
The frequency of different type of document in each university was calculated. Table 4 showed the pattern of publication in each university based on the document type.

Comparison among different types of publication for number of citation and number of versions
Kruskal Wallis test was applied to test the differences for number of citation and number of versions among different types of publication . The results revealed that there were significant differences for both number of citation and version (Table 6)