Plagiarism in research

  • January 2015
  • Medicine Health Care and Philosophy 18(1):91-101
  • 18(1):91-101

Gert Helgesson at Karolinska Institutet

  • Karolinska Institutet

Stefan Eriksson at Uppsala University

  • Uppsala University

Discover the world's research

  • 25+ million members
  • 160+ million publication pages
  • 2.3+ billion citations
  • Syed Ali Hussain
  • Indian J Canc

Karthik N. Rao

  • Omar Hamdan Mohammad Alkharabsheh
  • Pok Wei Fong
  • Sia Bee Chuan

Soheil Hassanipour

  • Sandeep S Nayak
  • Ali Bozorgi

Ehsan Amini-Salehi

  • James Stacey Taylor

Emilija Gjorgjioska

  • Rodrigo C Aniceto

Maristela Holanda

  • Dilma Da Silva

Per-Anders Tengland

  • Bengt Brülde

Sam Bruton

  • Martha L. Brogan
  • Miguel Roig

Caroline Vitse

  • Gregory A Poland
  • Comput Compos

Danielle Nicole Devoss

  • Recruit researchers
  • Join for free
  • Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up
  • Resources Home 🏠
  • Try SciSpace Copilot
  • Search research papers
  • Add Copilot Extension
  • Try AI Detector
  • Try Paraphraser
  • Try Citation Generator
  • April Papers
  • June Papers
  • July Papers

SciSpace Resources

Plagiarism in Research — The Complete Guide [eBook]

Deeptanshu D

Table of Contents

Plagiarism in research

Plagiarism can be described as the not-so-subtle art of stealing an already existing work, violating the principles of academic integrity and fairness. Well, there's no denying that we see further by standing on the shoulders of giants, and when it comes to constructing a research prose, we often need to look at the world through their lens. However, in this process, many students and researchers, knowingly or otherwise, resort to plagiarism.

In many instances, plagiarism is intentional, whether through direct copying or paraphrasing. Unfortunately, there are also times when it happens unintentionally. Regardless of the intent, plagiarism goes against the ethos of the scientific world and is considered a severe moral and disciplinary offense.

The good news is that you can avoid plagiarism and even work around it. So, if you're keen on publishing unplagiarized papers and maintaining academic integrity, you've come to the right place.

With this comprehensive ebook on plagiarism, we intend to help you understand what constitutes plagiarism in research, why it happens, plagiarism concepts and types, how you can prevent it, and much more.

What is plagiarism?

Plagiarism

Plagiarism is defined as representing a part of or the entirety of someone else's work as your own. Whether published or unpublished, this could be ideas, text verbatim, infographics, etc. It is no different in the academic writing, either. However, it is not considered plagiarism if most of your work is original and the referred part is diligently cited.

The degree of plagiarism can vary from discipline to discipline. Like in mathematics or engineering, there are times when you have to copy and paste entire equations or proofs, which can take a significant chunk of your paper. Again, that is not constituted plagiarism, provided there's an analysis or rebuttal to it.

That said, there are some objective parameters defining plagiarism. Get to know them, and your life as a researcher will be much smoother.

Common types of plagiarism

Types of plagiarism

Plagiarism often creeps into academic works in various forms, from complete plagiarism to accidental plagiarism.

The types of plagiarism varies depending on the two critical aspects — the writer's intention and the degree to which the prose is plagiarized. These aspects help institutions and publishers define plagiarism types more accurately.

Common forms of Plagiarism

The agreed-upon forms of plagiarism that occur in research writing include:

1. Global or Complete Plagiarism

Global or Complete plagiarism is inarguably the most severe form of plagiarism  — It is as good as stealing. It happens when an author blatantly copies somebody else's work in its entirety and passes it on as their own.

Since complete plagiarism is always committed deliberately and disguises the ownership of the work, it is directly recognized under copyright violation and can lead to intellectual property abuse and legal battles. That, along with irredeemable repercussions like a damaged reputation, getting expelled, or losing your job.

2. Verbatim or Direct Plagiarism

Verbatim or direct plagiarism happens when you copy a part of someone else's work, word-to-word, without providing adequate credits or attributions. The ideas, structure, and diction in your work would match the original author's work. Even if you were to change a few words or the position of sentences here and there, the final result remains the same.

The best way to avoid this is to minimize copy-pasting entire paragraphs and use it only when the situation calls for it. And when you do so, use quotation marks and in-text citations, crediting the original source.

3. Source-based Plagiarism

Source-based plagiarism results from an author trying to mislead or disguise the natural source of their work. Say you write a paper, giving enough citations, but when the editor or peer reviewers try to cross-check your references, they find a dead end or incorrect information. Another instance is when you use both primary and secondary data to support your argument but only cite the former with no reference for the latter.

In both cases, the information provided is either irrelevant or misleading. You may have cited it, but it does not support the text completely.

Similarly, another type of plagiarism is called data manipulation and counterfeiting . Data Manipulation is creating your own data and results. In contrast, data counterfeiting is skipping or adultering the key findings to suit your expected outcomes.

Using misinformed sources in a research study constitutes grave violations and offenses. Particularly in the medical field, it can lead to legal issues such as wrong data presentation. Its interpretation can lead to false clinical trials, which can have grave consequences.

4. Paraphrasing Plagiarism

Paraphrasing plagiarism is one of the more common types of plagiarism. It refers to when an author copies ideas, thoughts, and inferences, rephrases sentences, and then claims ownership.

Compared to verbatim, paraphrasing plagiarism involves changing words, sentences, semantics or translating texts. The general idea or the topic of the thesis, however, remains the same and as clever as it may seem, it is straightforward to detect.

More often authors commit paraphrasing by reading a few sources and writing them in their own words without due citation. This can lead the reader to believe that the idea was the author's own when it wasn’t.

plagiarism in research papers

5. Mosaic or Patchwork Plagiarism

One of the more mischievous ways to abstain from writing original work is mosaic plagiarism. Patchwork or mosaic plagiarism occurs when an author stitches together a research paper by lending pieces from multiple sources and weaving them as their creation. Sure, the author can add a few new words and phrases, but the meat of the paper is stolen.

It’s common for authors to refer to various sources during the research. But to patch them together and form a new paper from them is wrong.

Mosaic plagiarism can be difficult to detect, so authors, too confident in themselves, often resort to it. However, these days, there are plenty of online tools like Turnitin, Enago, and EasyBib that identify patchwork and correctly point to the sources from which you have borrowed.

6. Ghostwriting

Outside of the academic world, ghostwriting is entirely acceptable. Leaders do it, politicians do it, and artists do it. In academia, however, ghostwriting is a breach of conduct that tarnishes the integrity of a student or a researcher.

Ghostwriting is the act of using an unacknowledged person’s assistance to complete a paper. This happens in two ways — when an author has their paper’s foundation laid out but pays someone else to write, edit, and proofread. The other is when they pay someone to write the whole article from scratch.

In either case, it’s utterly unacceptable since the whole point of a paper is to exhibit an author's original thoughts presented by them. Ghostwriting, thus, raises a serious question about the academic capabilities of an author.

7. Self-plagiarism

This may surprise many, but rehashing previous works, even if they are your own, is also considered plagiarism. The biggest reason why self-plagiarism is a fallacy is because you’re trying to claim credit for something that you have already received credit for.

Authors often borrow their past data or experiment results, use them in their current work, and present them as brand new. Some may even plagiarize old published works' ideas, cues, or phrases.

The degree to which self-plagiarism is still under debate depends on the volume of work that has been copied. Additionally, many academic and non-academic journals have devised a fixed ratio on what percentage of self-plagiarism is acceptable. Unless you have made a proper declaration through citations and quotation marks about old data usage, it will fall under the scope of self-plagiarism.

8. Accidental Plagiarism

Apart from the intentional forms of plagiarism, there’s also accidental plagiarism. As the name suggests, it happens inadvertently. Unwitting paraphrasing, missing in-text or end-of-text citations, or not using quotation blocks falls under the same criteria.

While writing your academic papers, you have to stay cautious to avoid accidental plagiarism. The best way to do this is by going through your article thoroughly. Proofread as if your life depended on it, and check whether you’ve given citations where required.

Why is it important to avoid research plagiarism?

Why we should avoid plagiarism

As a scholar, you must be aware that the sole purpose of any article or academic writing is to present an original idea to its readers. When the prose is plagiarized, it removes any credibility from the author, discredits the source, and leaves the reader misinformed which goes against the ethos of academic institutions.

Here are the few reasons why you should avoid research plagiarism:

Critical analysis is important

While writing research papers, an author must dive deep into finding various sources, like scholarly articles, especially peer-reviewed ones. You are expected to examine the sources keenly to understand the gaps in the chosen topic and formulate your research questions.

Crafting critical questions related to the field of study is essential as it displays your understanding and the analysis you employed to decipher the problems in the chosen topic. When you do this, your chances of being published improve, and it’s also good for your long-term career growth.

Streamlined scholarly communication

An extended form of scholarly communication is established when you respond and craft your academic work based on what others have previously done in a particular domain. By appropriately using others' work, i.e., through citations, you acknowledge the tasks done before you and how they helped shape your work. Moreover, citations expand the doorway for readers to learn more about a topic from the beginning to the current state. Plagiarism prevents this.

Credibility in originality

Originality is invaluable in the research community. From your thesis topic and fresh methodology to new data, conclusion, and tone of writing, the more original your paper is, the more people are intrigued by it. And as long as your paper is backed by credible sources, it further solidifies your academic integrity. Plagiarism can hinder these.

How does plagiarism happen?

Even though plagiarism is a cardinal sin and plagiarized academic writing is consistently rejected, it still happens. So the question is, what makes people resort to plagiarism?

Some of the reasons why authors choose the plagiarism include:

  • Lack of knowledge about plagiarism
  • Accidentally copying a work
  • Forgetting to cite a source
  • Desire to excel among peers
  • A false belief that no one will catch them
  • No interest in academic work and just taking that as an assignment
  • Using shortcuts in the form of self-plagiarism
  • Fear of failing

Whatever the reason an author may have, plagiarism can never be justified. It is seen as an unfair advantage and disrespect to those who have put in the blood, sweat, and tears into doing their due diligence. Additionally, remember that readers, universities, or publishers are only interested in your genuine ideas, and your evaluation, as an author, is done based on that.

Related Article: Citation Machine Alternatives — Top citation tools 2023

Consequences of plagiarism

We have reiterated enough that plagiarism is objectionable and has consequences. But what exactly are the consequences? Well, that depends on who the author is and the type of plagiarism.

For minor offenses like accidental plagiarism or missing citations, a slap on the wrist in the form of feedback from the editor or peers is the norm. For major cases, let’s take a look:

For students

  • Poor grades

Even if you are a first-timer, your professor may choose to fail you, which can have a detrimental effect on your scores.

  • Failing a course

It is not rare for professors to fail Ph.D. and graduate students when caught plagiarizing. Not only does this hurt your academics, but it also extends the duration of your study by a year.

  • Disciplinary action

Every university or academic institution has strict policies and regulations regarding plagiarism. If caught, an author may have to face the academic review committee to decide their future. The results seen in general cases range from poor grades, failure for a year, or being banished from any academic or research-related work.

  • Expulsion from the university

A university may resort to expulsion only in the worst of cases, like copyright violation or Intellectual Property theft.

  • Tarnished academic reputation

This just might be the most consequential of all scenarios. It takes a lifetime to build a great impression but a few seconds to tarnish it. Many academics lose their peers' trust and find it hard to recover.  Moreover, background checks for future jobs or fellowships become a nightmare.

For universities

A university is built on reputation. Letting plagiarism slide is the quickest way to tarnish its reputation. This leads to lesser interest from top talent and publishers and trouble finding grant money.

Prospective students turning away from a university means losing out on tuition money. This further drives experienced faculty away. And the cycle continues.

For researchers

  • Legal battles

Since it falls under copyright infringement, researchers may face legal battles if their academic work is believed to be plagiarized. There is no shortage of case studies, like those of Doris Kearns Goodwin or Mark Chabedi, where authors, without permission, used another person's work and claimed it to be their own. In all these instances, they faced legal issues that led to fines, barred from writing and research, and sometimes, imprisonment even.

  • Professional reputation

Publishers and journals will not engage authors with a past of plagiarism to produce content under their brand name. Also, if the author is a professor or a fellow, it can lead to contract termination.

How to avoid plagiarism in research?

Things to watch out for to avoid plagiarism

The simplest way to avoid plagiarism would be to put in the work. Do original research, collect new data, and derive new conclusions. If you use references, keep track of each and every single one and cite them in your paper.

To ensure that your academic writing or research paper is unique and free from any type of plagiarism, incorporate the following tips:

  • Pay adequate attention to your references

Writing a paper requires extraordinary research. So, it’s understandable when researchers sometimes lose track of their references. This often leads to accidental plagiarism.

So, instead of falling into this trap, maintain lists or take notes of your reference while doing your research. This will help you when you’re writing your citations.

  • Find credible sources

Always refer to credible sources, whether a paper, a conference proceeding or an infographic.  These will present unbiased evidence and accurate experimentation results with facts backing the evidence presented by your paper.

  • Proper use of paraphrasing, quotations, and citations

It’s borderline impossible to avoid using direct references in your paper, especially if you’re providing a critical analysis or a rebuttal to an already existing article. So, to avoid getting prosecuted, use quotation marks when using a text verbatim.

In case you’re paraphrasing, use citations so that everyone knows that it’s not your idea. Credit the original author and a secondary source, if any. Publishers usually have guidelines about how to cite. There are many different styles like APA, MLA, Chicago, etc. Be on top of what your publisher demands.

Usually, it is observed that readers or the audience have a greater inclination towards paraphrasing than the quotes, especially if it is bulky sections. The reason is obvious: paraphrasing displays your understanding of the original work's meaning and interpretation, uniquely suiting the current state of affairs.

  • Review and recheck your work multiple times

Before submitting the final, you must subject your work to scrutiny. Multiple times at that. The more you do it, the less your chances of falling under accidental plagiarism.  To ensure that your final work does not constitute any types of plagiarism, ensure that:

  • There are no misplaced or missed citations
  • The paraphrased text does not closely resemble the original text
  • You don’t have any wrongful references
  • You’re not missing quotation marks or failing to provide the author's credentials after quotation marks
  • You use a plagiarism checker

More on how to avoid plagiarism .

On top of these, read your university or your publisher’s policies. All of them have their sets of rules about what’s acceptable and what’s not. They also define the punishment for any offense, factoring in its degree.

  • Use Online Tools

After receiving your article, most universities, publishers, and other institutions will run it through plagiarism checkers, including AI detectors , to detect all types of plagiarism. These plagiarism checkers function based on drawing similarities between your article and previously published works present in their database. If found similar, your paper is deemed plagiarized.

You can always save yourself from embarrassment by staying a step ahead. Use a plagiarism checker before you submit your paper. Using plagiarism checker tools, you can quickly identify if you have committed plagiarism. Then, no one except you will know about it, and you will have a chance to correct yourself.

Best Plagiarism Checkers in 2023

Plagiarism checkers are an incredibly convenient tool for improving academic writing. Therefore, here are some of the best plagiarism checkers for academic writing.

Turnitin's iThenticate

This is one of the best plagiarism checker for your academic paper and a good fit for academic writers, researchers, and scholars.

Turnitin’s iThenticare claims to cross-check your paper against 99 billion+ current and archived web pages, 1.8 billion student papers, and best-in-class scholarly content from top publishers in every major discipline and dozens of languages.

The iThenticate plagiarism checker is now available on SciSpace. ( Instructions on how to use it .)

Grammarly serves as a one-stop solution for better writing. Through Grammarly, you can make your paper have fewer grammatical errors, better clarity, and, yes, be plagiarism-free.

Grammarly's plagiarism checker compares your paper to billions of web pages and existing papers online. It points out all the sentences which need a citation, giving you the original source as well. On top of this, Grammarly also rates your document for an originality score.

ProWritingAid

ProWritingAid is another AI writing assistant that offers a plethora of tools to better your document. One of its paid services include a ProWritingAid Plagiarism Checker that helps authors find out how much of their work is plagiarized.

Once you scan your document, the plagiarism checker gives you details like the percentage of non-original text, how much of that is quoted, and how much is not. It will also give you links so you can cite them as required.

EasyBib Plagiarism Checker

EasyBib Plagiarism Checker compares your writing sample with billions of available sources online to detect plagiarism at every level. You'll be notified which phrases are too similar to current research and literature, prompting a possible rewrite or additional citation.

Moreover, you'll get feedback on your paper's inconsistencies, such as changes in text, formatting, or style. These small details could suggest possible plagiarism within your assignment.

Plagiarism CheckerX

Working on the same principle of scanning and matching against various sources, the critical aspect of Plagiarism CheckerX is that you can download and use it whenever you wish. It is slightly faster than others and never stores your data, so you can stay assured of any data loss.

Compilatio Magister

Compilatio Magister is a plagiarism checker designed explicitly for teaching professionals. It lets you access turnkey educational resources, check for plagiarism against thousands of documents, and seek reliable and accurate analysis reports.

Quick Wrap Up

In the world of academia, the spectre of plagiarism lurks but fear not, for armed with awareness and right plagiarism checkers, you have the power to conquer this foe.

Even though plenty of students or researchers believe they can get away with it, it’s never the case. You owe it to yourself and everyone who has invested time and resources in you to publish original, plagiarism-free research work every time.

Throughout this eBook, we have explored the depths of plagiarism, unraveling its consequences and the importance of originality. Many universities have specific classes and workshops discussing plagiarism to create ample awareness of the subject. Thus, you should continue to be honourable in this regard and write papers from the heart.

Hey there! We encourage you to visit our SciSpace discover page to explore how our suite of products can make research workflows easier and allow you to spend more time advancing science.

With the best-in-class solution, you can manage everything from literature search and discovery to profile management, research writing, and much more.

Frequently Asked Questions (FAQs)

1. how to paraphrase without plagiarizing.

  • Understand the original text completely.
  • Write the idea in your own words without looking at the original text.
  • Change the structure of sentences, not just individual words.
  • Use synonyms wisely and ensure the context remains the same.
  • Lastly, always cite the original source.

Even when paraphrasing, it's important to attribute ideas to the original author.

2. How to avoid plagiarism in research?

  • Understand what constitutes plagiarism.
  • Always give proper credit to the original authors when quoting or paraphrasing their work.
  • Use plagiarism checker tools to ensure your work is original.
  • Keep track of your sources throughout your research.
  • Quote and paraphrase accurately.

3. Examples of plagiarism?

  • Copying and pasting text directly from a source without quotation or citation.
  • Paraphrasing someone else's work without correct citation.
  • Presenting someone else's work or ideas as your own.
  • Recycling or self-plagiarism, where you mention your previous work without citing it.

4. How much plagiarism is allowed in a research paper?

In the academic world, the goal is always to strive for 0% plagiarism. However, sometimes, minor plagiarism can occur unintentionally, such as when common phrases are matched in plagiarism software. Most institutions and publishers will allow a small percentage, typically under 10%, for such instances. Remember, this doesn't mean you can deliberately plagiarize 10% of your work.

5. What are the four types of plagiarism?

  • Direct Plagiarism definition: This occurs when one directly copies someone else's work word-for-word without giving credit.
  • Mosaic Plagiarism definition: This happens when someone borrows phrases from a source without using quotation marks, or finds synonyms for the author's language while keeping the same general structure and meaning.
  • Accidental Plagiarism definition: This happens when a person neglects to cite their sources, or misquotes their sources, or unintentionally paraphrases a source by using similar words, groupings, or phrases without attribution.
  • Self-Plagiarism definition: This happens when someone recycles their own work from a previous paper or study and presents it as new content without citing the original.

6. How much copying is considered plagiarism?

Any amount of copying can be considered plagiarism if you're presenting someone else's work as your own without attribution. Even a single sentence copied without proper citation can be seen as plagiarism. The key is to always give credit where it's due.

7. How to check plagiarism in a research paper?

There are numerous online tools and software that you can use to check plagiarism in a research paper. Some popular ones include Grammarly, and Copyscape. These tools compare your paper with millions of other documents on the web and databases to identify any matches. You can also use SciSpace paraphraser to rephrase the content and keep it unique.

plagiarism in research papers

You might also like

Plagiarism FAQs: 10 Most Commonly Asked Questions on Plagiarism in Research Answered

Plagiarism FAQs: 10 Most Commonly Asked Questions on Plagiarism in Research Answered

Reyon Gifto

3 Common Mistakes in Research Publication, and How to Avoid Them

Monali Ghosh

  • Locations and Hours
  • UCLA Library
  • Research Guides
  • Research Tips and Tools

Citing Sources

  • How to Avoid Plagiarism
  • Introduction
  • Reading Citations

Best Practices for Avoiding Plagiarism

The entire section below came from a research guide from Iowa State University.  To avoid plagiarism, one must provide a reference to that source to indicate where the original information came from (see the "Source:" section below).

"There are many ways to avoid plagiarism, including developing good research habits, good time management, and taking responsibility for your own learning. Here are some specific tips:

  • Don't procrastinate with your research and assignments. Good research takes time. Procrastinating makes it likely you'll run out of time or be unduly pressured to finish. This sort of pressure can often lead to sloppy research habits and bad decisions. Plan your research well in advance, and seek help when needed from your professor, from librarians and other campus support staff.
  • Commit to doing your own work. If you don't understand an assignment, talk with your professor. Don't take the "easy way" out by asking your roommate or friends for copies of old assignments. A different aspect of this is group work. Group projects are very popular in some classes on campus, but not all. Make sure you clearly understand when your professor says it's okay to work with others on assignments and submit group work on assignments, versus when assignments and papers need to represent your own work.
  •  Be 100% scrupulous in your note taking. As you prepare your paper or research, and as you begin drafting your paper. One good practice is to clearly label in your notes your own ideas (write "ME" in parentheses) and ideas and words from others (write "SMITH, 2005" or something to indicate author, source, source date). Keep good records of the sources you consult, and the ideas you take from them. If you're writing a paper, you'll need this information for your bibliographies or references cited list anyway, so you'll benefit from good organization from the beginning.
  • Cite your sources scrupulously. Always cite other people's work, words, ideas and phrases that you use directly or indirectly in your paper. Regardless of whether you found the information in a book, article, or website, and whether it's text, a graphic, an illustration, chart or table, you need to cite it. When you use words or phrases from other sources, these need to be in quotes. Current style manuals are available at most reference desks and online. They may also give further advice on avoiding plagiarism.
  • Understand good paraphrasing. Simply using synonyms or scrambling an author's words and phrases and then using these "rewrites" uncredited in your work is plagiarism, plain and simple. Good paraphrasing requires that you genuinely understand the original source, that you are genuinely using your own words to summarize a point or concept, and that you insert in quotes any unique words or phrases you use from the original source. Good paraphrasing also requires that you cite the original source. Anything less and you veer into the dangerous territory of plagiarism."

Source: Vega García, S.A. (2012). Understanding plagiarism: Information literacy guide. Iowa State University. Retrieved from  http://instr.iastate.libguides.com/content.php?pid=10314 . [Accessed January 3, 2017]

Plagiarism prevention.

  • Plagiarism Prevention (onlinecolleges.net) This resource provides information about preventing plagiarism, understanding the various types of plagiarism, and learning how to cite properly to avoid plagiarism.

UCLA has a campuswide license to Turnitin.com. Faculty may turn in student papers electronically, where the text can be compared with a vast database of other student papers, online articles, general Web pages, and other sources. Turnitin.com then produces a report for the instructor indicating whether the paper was plagiarized and if so, how much.

For more information, go to Turnitin.com .

Plagiarism in the News

  • << Previous: Plagiarism
  • Next: Get Help >>
  • Last Updated: May 17, 2024 2:33 PM
  • URL: https://guides.library.ucla.edu/citing

Academic Plagiarism Detection: A Systematic Literature Review Academic Plagiarism Detection: A Systematic Literature Review

ACM Comput. Surv., Vol. 52, No. 6, Article 112, Publication date: October 2019. DOI: https://doi.org/10.1145/3345317

This article summarizes the research on computational methods to detect academic plagiarism by systematically reviewing 239 research papers published between 2013 and 2018. To structure the presentation of the research contributions, we propose novel technically oriented typologies for plagiarism prevention and detection efforts, the forms of academic plagiarism, and computational plagiarism detection methods. We show that academic plagiarism detection is a highly active research field. Over the period we review, the field has seen major advances regarding the automated detection of strongly obfuscated and thus hard-to-identify forms of academic plagiarism. These improvements mainly originate from better semantic text analysis methods, the investigation of non-textual content features, and the application of machine learning. We identify a research gap in the lack of methodologically thorough performance evaluations of plagiarism detection systems. Concluding from our analysis, we see the integration of heterogeneous analysis methods for textual and non-textual content features using machine learning as the most promising area for future research contributions to improve the detection of academic plagiarism further.

ACM Reference format: Tomáš Foltýnek, Norman Meuschke, and Bela Gipp. 2019. Academic Plagiarism Detection: A Systematic Literature Review. ACM Comput. Surv. 52, 6, Article 112 (October 2019), 42 pages. https://doi.org/10.1145/3345317

INTRODUCTION

Academic plagiarism is one of the severest forms of research misconduct (a “cardinal sin”) [ 14 ] and has strong negative impacts on academia and the public. Plagiarized research papers impede the scientific process, e.g., by distorting the mechanisms for tracing and correcting results. If researchers expand or revise earlier findings in subsequent research, then papers that plagiarized the original paper remain unaffected. Wrong findings can spread and affect later research or practical applications [ 90 ]. For example, in medicine or pharmacology, meta-studies are an important tool to assess the efficacy and safety of medical drugs and treatments. Plagiarized research papers can skew meta-studies and thus jeopardize patient safety [ 65 ].

Furthermore, academic plagiarism wastes resources. For example, Wager [ 261 ] quotes a journal editor stating that 10% of the papers submitted to the respective journal suffered from plagiarism of an unacceptable extent. In Germany, the ongoing crowdsourcing project VroniPlag 1 has investigated more than 200 cases of alleged academic plagiarism (as of July 2019). Even in the best case, i.e., if the plagiarism is discovered, reviewing and punishing plagiarized research papers and grant applications still causes a high effort for the reviewers, affected institutions, and funding agencies. The cases reported in VroniPlag showed that investigations into plagiarism allegations often require hundreds of work hours from affected institutions.

If plagiarism remains undiscovered, then the negative effects are even more severe. Plagiarists can unduly receive research funds and career advancements as funding agencies may award grants for plagiarized ideas or accept plagiarized research papers as the outcomes of research projects. The artificial inflation of publication and citation counts through plagiarism can further aggravate the problem. Studies showed that some plagiarized papers are cited at least as often as the original [ 23 ]. This phenomenon is problematic, since citation counts are widely used indicators of research performance, e.g., for funding or hiring decisions.

From an educational perspective, academic plagiarism is detrimental to competence acquisition and assessment. Practicing is crucial to human learning. If students receive credit for work done by others, then an important extrinsic motivation for acquiring knowledge and competences is reduced. Likewise, the assessment of competence is distorted, which again can result in undue career benefits for plagiarists.

The problem of academic plagiarism is not new but has been present for centuries. However, the rapid and continuous advancement of information technology (IT), which offers convenient and instant access to vast amounts of information, has made plagiarizing easier than ever. At the same time, IT also facilitated the detection of academic plagiarism. As we present in this article, hundreds of researchers address the automated detection of academic plagiarism and publish hundreds of research papers a year.

The high intensity and rapid pace of research on academic plagiarism detection make it difficult for researchers to get an overview of the field. Published literature reviews alleviate the problem by summarizing previous research, critically examining contributions, explaining results, and clarifying alternative views [ 212 , 40 ]. Literature reviews are particularly helpful for young researchers and researchers who newly enter a field. Often, these two groups of researchers contribute new ideas that keep a field alive and advance the state of the art.

In 2013, we provided a first descriptive review of the state of the art in academic plagiarism detection [ 160 ]. Given the rapid development of the field, we see the need for a follow-up study to summarize the research since 2013. Therefore, this article provides a systematic qualitative literature review [ 187 ] that critically evaluates the capabilities of computational methods to detect plagiarism in academic documents and identifies current research trends and research gaps.

The literature review at hand answers the following research questions:

  • Did researchers propose conceptually new approaches for this task?
  • Which improvements to existing detection methods have been reported?
  • Which research gaps and trends for future research are observable in the literature?

To answer these questions, we organize the remainder of this article as follows. The section Methodology describes our procedure and criteria for data collection. The following section, Related Literature Reviews , summarizes the contributions of our compared to topically related reviews published since 2013. The section Overview of the Research Field describes the major research areas in the field of academic plagiarism detection. The section Definition and Typology of Plagiarism introduces our definition and a three-layered model for addressing plagiarism (methods, systems, and policies). The section Review of Plagiarism Typologies synthesizes the classifications of plagiarism found in the literature into a technically oriented typology suitable for our review. The section Plagiarism Detection Methods is the core of this article. For each class of computational plagiarism detection methods, the section provides a description and an overview of research papers that employ the method in question. The section Plagiarism Detection Systems discusses the application of detection methods in plagiarism detection systems. The Discussion section summarizes the advances in plagiarism detection research and outlines open research questions.

METHODOLOGY

To collect the research papers included in our review, we performed a keyword-based automated search [ 212 ] using Google Scholar and Web of Science. We limited the search period to 2013 until 2018 (including). However, papers that introduced a novel concept or approach often predate 2013. To ensure that our survey covers all relevant primary literature, we included such seminal papers regardless of their publication date.

Google Scholar indexes major computer science literature databases, including IEEE Xplore, ACM Digital Library, ScienceDirect, SpringerLink, and TandFonline, as well as grey literature. Fagan [ 68 ] provides an extensive list of “ recent studies [that] repeatedly find that Google Scholar's coverage meets or exceeds that of other search tools, no matter what is identified by target samples, including journals, articles, and citations ” [ 68 ]. Therefore, we consider Google Scholar as a meta-database that meets the search criteria recommended in the guidelines for conducting systematic literature reviews [ 40 , 137 ]. Using Google Scholar also addresses the “lack of conformity, especially in terms of searching facilities, across commonly used digital libraries,” which Brereton et al. [ 40 ] identified as a hindrance to systematic literature reviews in computer science.

Criticism of using Google Scholar for literature research includes that the system's relevance ranking assigns too much importance to citation count [ 68 ], i.e., the number of citations a paper receives. Moreover, Google Scholar covers predatory journals [ 31 ]. Most guidelines for systematic reviews, therefore, recommend using additional search tools despite the comprehensive coverage of Google Scholar [ 68 ]. Following this recommendation, we additionally queried Web of Science. Since we seek to cover the most influential papers on academic plagiarism detection, we consider a relevance ranking based on citation counts as an advantage rather than a disadvantage. Hence, we used the relevance ranking of Google Scholar and ranked search results from Web of Science by citation count. We excluded all papers (11) that appeared in venues mentioned in Beall's List of Predatory Journals and Publishers . 2

Our procedure for paper collection consisted of the five phases described hereafter. We reviewed the first 50 search results when using Google Scholar and the first 150 search results when using Web of Science.

In the first phase , we sought to include existing literature reviews on plagiarism detection for academic documents. Therefore, we queried Google Scholar using the following keywords: plagiarism detection literature review, similarity detection literature review, plagiarism detection state of art, similarity detection state of art, plagiarism detection survey, similarity detection survey .

In the second phase , we added topically related papers using the following rather general keywords: plagiarism, plagiarism detection, similarity detection, extrinsic plagiarism detection, external plagiarism detection, intrinsic plagiarism detection, internal plagiarism detection .

After reviewing the papers retrieved in the first and second phases, we defined the structure of our review and adjusted the scope of our data collection as follows:

  • We focused our search on plagiarism detection for text documents and hence excluded papers addressing other tasks, such as plagiarism detection for source code or images. We also excluded papers focusing on corpora development.
  • We excluded papers addressing policy and educational issues related to plagiarism detection to sharpen the focus of our review on computational detection methods.

Having made these adjustments to our search strategy, we started the third phase of the data collection. We queried Google Scholar with the following keywords related to specific sub-topics of plagiarism detection, which we had identified as important during the first and second phases: semantic analysis plagiarism detection, machine-learning plagiarism detection .

In the fourth phase , we sought to prevent selection bias from exclusively using Google Scholar by querying Web of Science using the keyword plagiarism detection .

In the fifth phase , we added to our dataset papers from the search period that are topically related to papers we had already collected. To do so, we included relevant references of collected papers and papers that publishers’ systems recommended as related to papers in our collection. Following this procedure, we included notebook papers of the annual PAN and SemEval workshops. To ensure the significance of research contributions, we excluded papers that were not referenced in the official overview papers of the PAN and SemEval workshops or reported results below the baseline provided by the workshop organizers. For the same reason, we excluded papers that do not report experimental evaluation results.

To ensure the consistency of paper processing, the first author read all papers in the final dataset and recorded the paper's key content in a mind map. All authors continuously reviewed, discussed, and updated the mind map. Additionally, we maintained a spreadsheet to record the key features of each paper (task, methods, improvements, dataset, results, etc.).

Table 1 and Table 2 list the numbers of papers retrieved and processed in each phase of the data collection.

1) Google Scholar: reviews 66 28 38 38
2) Google Scholar: related papers 143 54 89 23 104
3) Google Scholar: sub-topics 49 42 111
4) Web of Science 134 82 52 35 128
5) Processing stage 126 126 254
Papers identified by keyword-based automated search 128
Papers collected through references and automated recommendations 126
Inaccessible papers 3
Excluded papers 12
- Reviews and general papers 35
- Papers containing experiments (included in overview tables) 204
– Extrinsic PD 136
– Intrinsic PD 67
– Both extrinsic and intrinsic PD 1

Methodological Risks

The main risks for systematic literature reviews are incompleteness of the collected data and deficiencies in the selection, structure, and presentation of the content.

We addressed the risk of data incompleteness mainly by using two of the most comprehensive databases for academic literature—Google Scholar and Web of Science. To achieve the best possible coverage, we queried the two databases with keywords that we gradually refined in a multi-stage process, in which the results of each phase informed the next phase. By including all relevant references of papers that our keyword-based search had retrieved, we leveraged the knowledge of domain experts, i.e., the authors of research papers and literature reviews on the topic, to retrieve additional papers. We also included the content-based recommendations provided by the digital library systems of major publishers, such as Elsevier and ACM. We are confident that this multi-faceted and multi-stage approach to data collection yielded a set of papers that comprehensively reflects the state of the art in detecting academic plagiarism.

To mitigate the risk of subjectivity regarding the selection and presentation of content, we adhered to best practice guidelines for conducting systematic reviews and investigated the taxonomies and structure put forward in related reviews. We present the insights of the latter investigation in the following section.

RELATED LITERATURE REVIEWS

Table 3 lists related literature reviews in chronological order and categorized according to (i) the plagiarism detection (PD) tasks the review covers (PD for text documents, PD for source code, other PD tasks), (ii) whether the review includes descriptions or evaluations of productive plagiarism detection systems, and (iii) whether the review addresses policy issues related to plagiarism and academic integrity. All reviews are “narrative” according to the typology of Pare et al. [ 187 ]. Two of the reviews (References [ 61 ] and [ 48 ]) cover articles that appeared at venues included in Beall's List of Predatory Journals and Publishers .

Meuschke and Gipp [ ] YES NO NO YES NO
Chong [ ] YES NO NO NO NO
Eisa et al. [ ] YES NO YES NO NO
Agarwal and Sharma [ ] YES YES NO YES NO
Chowdhury et al. [ ] YES YES NO YES NO
Kanjirangat and Gupta [ ] YES YES NO YES NO
Velasquez et al. [ ] YES NO NO YES YES
Hourrane and Benlahmar [ ] YES NO NO NO NO

Our previous review article [ 160 ] surveyed the state of the art in detecting academic plagiarism, presented plagiarism detection systems, and summarized evaluations of their detection effectiveness. We outlined the limitations of text-based plagiarism detection methods and suggested that future research should focus on semantic analysis approaches that also include non-textual document features, such as academic citations.

The main contribution of Chong [ 47 ] is an extensive experimental evaluation of text preprocessing methods as well as shallow and deep NLP techniques. However, the paper also provides a sizable state-of-the-art review of plagiarism detection methods for text documents.

Eisa et al. [ 61 ] defined a clear methodology and meticulously followed it but did not include a temporal dimension. Their well-written review provides comprehensive descriptions and a useful taxonomy of features and methods for plagiarism detection. The authors concluded that future research should consider non-textual document features, such as equations, figures, and tables.

Agarwal and Sharma [ 8 ] focused on source code PD but also gave a basic overview of plagiarism detection methods for text documents. Technologically, source code PD and PD for text are closely related, and many plagiarism detection methods for text can also be applied for source code PD [ 57 ].

Chowdhury et al. [ 48 ] provided a comprehensive list of available plagiarism detection systems.

Kanjirangat and Gupta [ 251 ] summarized plagiarism detection methods for text documents that participated in the PAN competitions and compared four plagiarism detection systems.

Velasquez et al. [ 256 ] proposed a new plagiarism detection system but also provided an extensive literature review that includes a typology of plagiarism and an overview of six plagiarism detection systems.

Hourrane and Benlahmar [ 114 ] described individual research papers in detail but did not provide an abstraction of the presented detection methods.

The literature review at hand extends and improves the reviews outlined in Table 3 as follows:

  • We include significantly more papers than other reviews.
  • Our literature survey is the first that analyses research contributions during a specific period to provide insights on the most recent research trends.
  • Our review is the first that adheres to the guidelines for conducting systematic literature surveys.
  • We introduce a three-layered conceptual model to describe and analyze the phenomenon of academic plagiarism comprehensively.

OVERVIEW OF THE RESEARCH FIELD

The papers we retrieved during our research fall into three broad categories: plagiarism detection methods, plagiarism detection systems , and plagiarism policies . Ordering these categories by the level of abstraction at which they address the problem of academic plagiarism yields the three-layered model shown in Figure 1 . We propose this model to structure and systematically analyze the large and heterogeneous body of literature on academic plagiarism.

Fig. 1.

Layer 1: Plagiarism detection methods subsumes research that addresses the automated identification of potential plagiarism instances. Papers falling into this layer typically present methods that analyze textual similarity at the lexical, syntactic, and semantic levels, as well as similarity of non-textual content elements, such as citations, figures, tables, and mathematical formulae. To this layer, we also assign papers that address the evaluation of plagiarism detection methods, e.g., by providing test collections and reporting on performance comparisons. The research contributions in Layer 1 are the focus of this survey.

Layer 2: Plagiarism detection systems encompasses applied research papers that address production-ready plagiarism detection systems, as opposed to the research prototypes that are typically presented in papers assigned to Layer 1. Production-ready systems implement the detection methods included in Layer 1, visually present detection results to the users and should be able to identify duly quoted text. Turnitin LLC is the market leader for plagiarism detection services. The company's plagiarism detection system Turnitin is most frequently cited in papers included in Layer 2 [ 116 , 191 , 256 ].

Layer 3: Plagiarism policies subsumes papers that research the prevention, detection, prosecution, and punishment of plagiarism at educational institutions. Typical papers in Layer 3 investigate students’ and teachers’ attitudes toward plagiarism (e.g., Reference [ 75 ]), analyze the prevalence of plagiarism at institutions (e.g., Reference [ 50 ]), or discuss the impact of institutional policies (e.g., Reference [ 183 ]).

The three layers of the model are interdependent and essential to analyze the phenomenon of academic plagiarism comprehensively. Plagiarism detection systems (Layer 2) depend on reliable detection methods (Layer 1), which in turn would be of little practical value without production-ready systems that employ them. Using plagiarism detection systems in practice would be futile without the presence of a policy framework (Layer 3) that governs the investigation, documentation, prosecution, and punishment of plagiarism. The insights derived from analyzing the use of plagiarism detection systems in practice (Layer 3) also inform the research and development efforts for improving plagiarism detection methods (Layer 1) and plagiarism detection systems (Layer 2).

Continued research in all three layers is necessary to keep pace with the behavior changes that are a typical reaction of plagiarists when being confronted with an increased risk of discovery due to better detection technology and stricter policies. For example, improved plagiarism detection capabilities led to a rise in contract cheating, i.e., paying ghostwriters to produce original works that the cheaters submit as their own [ 177 ]. Many researchers agree that counteracting these developments requires approaches that integrate plagiarism detection technology with plagiarism policies.

Originally, we intended to survey the research in all three layers. However, the extent of the research fields is too large to cover all of them in one survey comprehensively. Therefore, the curr- ent article surveys plagiarism detection methods and systems. A future survey will cover the research on plagiarism policies.

DEFINITION AND TYPOLOGY OF PLAGIARISM

In accordance with Fishman, we define academic plagiarism as the use of ideas, content, or structures without appropriately acknowledging the source to benefit in a setting where originality is expected [ 279 ]. We used a nearly identical definition in our previous survey [ 160 ], because it describes the full breadth of the phenomenon. The definition includes all forms of intellectual contributions in academic documents regardless of their presentation, e.g., text, figures, tables, and mathematical formulae, and their origin. Other definitions of academic plagiarism often include the notion of theft (e.g., References [ 13 , 38 , 116 , 146 , 188 , 274 , 252 ]), i.e., require intent and limit the scope to reusing the content of others. Our definition also includes self-plagiarism, unintentional plagiarism, and plagiarism with the consent of the original author.

Review of Plagiarism Typologies

Aside from a definition, a typology helps to structure the research and facilitates communication on a phenomenon [ 29 , 261 ]. Researchers proposed a variety of typologies for academic plagiarism. Walker [ 263 ] coined a typology from a plagiarist's point of view, which is still recognized by contemporary literature [ 51 ]. Walker's typology distinguishes between:

  • Sham paraphrasing ( presenting copied text as a paraphrase by leaving out quotations )
  • Illicit paraphrasing
  • Other plagiarism ( plagiarizing with the original author's consent )
  • Verbatim copying ( without reference )
  • Recycling ( self-plagiarism )
  • Ghostwriting
  • Purloining ( copying another student's assignment without consent )

All typologies we encountered in our research categorize verbatim copying as one form of academic plagiarism. Alfikri and Ayu Purwarianti [ 13 ] additionally distinguished as separate forms of academic plagiarism the partial copying of smaller text segments, two forms of paraphrasing that differ regarding whether the sentence structure changes and translations. Velasquez et al. [ 256 ] distinguished verbatim copying and technical disguise, combined paraphrasing and translation into one form, and categorized the deliberate misuse of references as a separate form. Weber-Wulff [ 265 ] and Chowdhury and Bhattacharyya [ 48 ] likewise categorized referencing errors as a form of plagiarism. Many authors agreed on classifying idea plagiarism as a separate form of plagiarism [ 47 , 48 , 114 , 179 , 252 ]. Mozgovoy et al. [ 173 ] presented a typology that consolidates other classifications into five forms of academic plagiarism:

  • Verbatim copying
  • Hiding plagiarism instances by paraphrasing
  • Technical tricks exploiting weaknesses of current plagiarism detection systems
  • Deliberately inaccurate use of references
  • Tough plagiarism

“Tough plagiarism” subsumes the forms of plagiarism that are difficult to detect for both humans and computers, like idea plagiarism, structural plagiarism, and cross-language plagiarism [ 173 ].

The typology of Eisa et al. [ 61 ], which originated from a typology by Alzahrani et al. [ 21 ], distinguishes only two forms of plagiarism: literal plagiarism and intelligent plagiarism . Literal plagiarism encompasses near copies and modified copies, whereas intelligent plagiarism includes paraphrasing, summarization, translation, and idea plagiarism.

Our Typology of Plagiarism

Since we focus on reviewing plagiarism detection technology, we exclusively consider technical properties to derive a typology of academic plagiarism forms. From a technical perspective, several distinctions that are important from a policy perspective are irrelevant or at least less important. Technically irrelevant properties of plagiarism instances are whether:

  • the original author permitted to reuse content;
  • the suspicious document and its potential source have the same author(s), i.e., whether similarities in the documents’ content may constitute self-plagiarism.
  • how much of the content represents potential plagiarism;
  • whether a plagiarist uses one or multiple sources. Detecting compilation plagiarism (also referred to as shake-and-paste, patch-writing, remix, mosaic or mash-up) is impossible at the document level but requires an analysis on the level of paragraphs or sentences.

Both properties are of little technical importance, since similar methods are employed regardless of the extent of plagiarism and whether it may originate from one or multiple source documents.

Our typology of academic plagiarism derives from the generally accepted layers of natural language: lexis, syntax, and semantics. Ultimately, the goal of language is expressing ideas [ 96 ]. Therefore, we extend the classic three-layered language model to four layers and categorize plagiarism forms according to the language layer they affect. We order the resulting plagiarism forms increasingly by their level of obfuscation:

  • Literal plagiarism (copy and paste)
  • Possibly with mentioning the source
  • Technical disguise
  • Synonym substitution
  • Translation
  • Paraphrase (mosaic, clause quilts)
  • Structural plagiarism
  • Using concepts and ideas only

Characters-preserving plagiarism includes, aside from verbatim copying, plagiarism forms in which sources are mentioned, like “pawn sacrifice” and “cut and slide” [ 265 ]. Syntax-preserving plagiarism often results from employing simple substitution techniques, e.g., using regular expressions. Basic synonym substitution approaches operate in the same way; however, employing more sophisticated substitution methods has become typical. Semantics-preserving plagiarism refers to sophisticated forms of obfuscation that involve changing both the words and the sentence structure but preserve the meaning of passages. In agreement with Velasquez et al. [ 256 ], we consider translation plagiarism as a semantics-preserving form of plagiarism, since a translation can be seen as the ultimate paraphrase. In the section devoted to semantics-based plagiarism detection methods, we will also show a significant overlap in the methods for paraphrase detection and cross-language plagiarism detection. Idea-preserving plagiarism (also referred to as template plagiarism or boilerplate plagiarism) includes cases in which plagiarists use the concept or structure of a source and describe it entirely in their own words. This form of plagiarism is difficult to identify and even harder to prove. Ghostwriting [ 47 , 114 ] describes the hiring of a third party to write genuine text [ 50 , 263 ]. It is the only form of plagiarism that is undetectable by comparing a suspicious document to a likely source. Currently, the only technical option for discovering potential ghostwriting is to compare stylometric features of a possibly ghost-written document with documents certainly written by the alleged author.

PLAGIARISM DETECTION APPROACHES

Conceptually, the task of detecting plagiarism in academic documents consists of locating the parts of a document that exhibit indicators of potential plagiarism and subsequently substantiating the suspicion through more in-depth analysis steps [ 218 ]. From a technical perspective, the literature distinguishes the following two general approaches to plagiarism detection.

The extrinsic plagiarism detection approach compares suspicious documents to a collection of documents assumed to be genuine (reference collection) and retrieves all documents that exhibit similarities above a threshold as potential sources [ 252 , 235 ].

The intrinsic plagiarism detection approach exclusively analyzes the input document, i.e., does not perform comparisons to documents in a reference collection. Intrinsic detection methods employ a process known as stylometry to examine linguistic features of a text [ 90 ]. The goal is to identify changes in writing style, which the approach considers as indicators for potential plagiarism [ 277 ]. Passages with linguistic differences can become the input for an extrinsic plagiarism analysis or be presented to human reviewers. Hereafter, we describe the extrinsic and intrinsic approaches to plagiarism detection in more detail.

Extrinsic Plagiarism Detection

The reference collection to which extrinsic plagiarism detection approaches compare the suspicious document is typically very large, e.g., a significant subset of the Internet for production-ready plagiarism detection systems. Therefore, pairwise comparisons of the input document to all documents in the reference collection are often computationally infeasible. To address this challenge, most extrinsic plagiarism detection approaches consist of two stages: candidate retrieval (also called source retrieval) and detailed analysis (also referred to as text alignment) [ 197 ]. The candidate retrieval stage efficiently limits the collection to a subset of potential source documents. The detailed analysis stage then performs elaborate pairwise document comparisons to identify parts of the source documents that are similar to parts of the suspicious document.

Candidate Retrieval.  Given a suspicious input document and a querying tool, e.g., a search engine or database interface, the task in the candidate retrieval stage is to retrieve from the reference collection all documents that share content with the input document [ 198 ]. Many plagiarism detection systems use the APIs of Web search engines instead of maintaining own reference collections and querying tools.

Recall is the most important performance metric for the candidate retrieval stage of the extrinsic plagiarism detection process, since the subsequent detailed analysis cannot identify source documents missed in the first stage [ 105 ]. The number of queries issued is another typical metric to quantify the performance in the candidate retrieval stage. Keeping the number of queries low is particularly important if the candidate retrieval approach involves Web search engines, since such engines typically charge for issuing queries.

Detailed Analysis.  The set of documents retrieved in the candidate retrieval stage is the input to the detailed analysis stage. Formally, the task in the detailed analysis stage is defined as follows. Let d q be a suspicious document. Let $D = \lbrace {{d_s}} \rbrace\;|\;s = 1 \ldots n$ be a set of potential source documents. Determine whether a fragment ${s_q} \in {d_q}$ is similar to a fragment $s \in {d_s}$ ( ${d_s} \in D$ ) and identify all such pairs of fragments $( {{s_q},\;s} )$ [ 202 ]. Eventually, an expert should determine whether the identified pairs $( {{s_q},\;s} )$ constitute legitimate content re-use, plagiarism, or false positives [ 29 ]. The detailed analysis typically consists of three steps [ 197 ]:

  • Seeding : Finding parts of the content in the input document (the seed) within a document of the reference collection
  • Extension : Extending each seed as far as possible to find the complete passage that may have been reused
  • Filtering : Excluding fragments that do not meet predefined criteria (e.g., that are too short), and handling of overlapping passages

The most common strategy for the extension step is the so-called rule-based approach. The approach merges seeds if they occur next to each other in both the suspicious and the source document and if the size of the gap between the passages is below a threshold [ 198 ].

Paraphrase Identification is often a separate step within the detailed analysis stages of extrinsic plagiarism detection methods but also a research field on its own. The task in paraphrase identification is determining semantically equivalent sentences in a set of sentences [ 71 ]. SemEval is a well-known conference series that addresses paraphrase identification for tweets [ 9 , 222 ]. Identifying semantically equivalent tweets is more difficult than identifying semantically equivalent sentences in academic documents due to out-of-vocabulary words, abbreviations, and slang terms that are frequent in tweets [ 24 ]. Al-Samadi et al. [ 9 ] provided a thorough review of the research on paraphrase identification.

Intrinsic Plagiarism Detection

The concept of intrinsic plagiarism detection was introduced by Meyer zu Eissen and Stein [ 277 ]. Whereas extrinsic plagiarism detection methods search for similarities across documents, intrinsic plagiarism detection methods search for dissimilarities within a document. A crucial presumption of the intrinsic approach is that authors have different writing styles that allow identifying the authors. Juola provides a comprehensive overview of stylometric methods to analyze and quantify writing style [ 127 ].

Intrinsic plagiarism detection consists of two tasks [ 200 , 233 ]:

  • Style breach detection : Delineating passages with different writing styles
  • Author identification : Identifying the author of documents or passages

Author identification furthermore subsumes two specialized tasks:

  • Author clustering : Grouping documents or passages by authorship
  • Author verification : Deciding whether an input document was authored

by the same person as a set of sample documents

Style Breach Detection.  Given a suspicious document, the goal of style-breach detection is identifying passages that exhibit different stylometric characteristics [ 233 ].

Most of the algorithms for style breach detection follow a three-step process [ 214 ]:

  • Text segmentation based on paragraphs, (overlapping) sentences, character or word n-grams
  • Feature space mapping , i.e., computing stylometric measures for segments
  • Clustering segments according to observed critical values

Author Clustering typically follows the style breach detection stage and employs pairwise comparisons of passages identified in the previous stage to group them by author [ 247 ]. For each pair of passages, a similarity measure is computed that considers the results of the feature space mapping in the style-breach detection stage. Formally, for a given set of documents or passages D , the task is to find the decomposition of this set ${D_1},\;{D_2},\ldots Dn$ , such that:

  • $D = {\rm{U}}_{i = 1}^n{D_i}$
  • ${D_i} \cap {D_j} = \emptyset $ for each $i \ne j$
  • All documents of the same class have the same author;

For each pair of documents from different classes, the authors are different.

Author Verification is typically defined as the prediction of whether two pieces of text were written by the same person. In practice, author verification is a one-class classification problem [ 234 ] that assumes all documents in a set have the same author. By comparing the writing style at the document level, outliers can be detected that may represent plagiarized documents. This method can reveal ghostwriting [ 127 ], unless the same ghost-writer authored all documents in the set.

Author Identification (also referred to as author classification), takes multiple document sets as input. Each set of documents must have been written verifiably by a single author. The task is assigning documents with unclear authorship to the stylistically most similar document set. Each authorship identification problem, for which the set of candidate authors is known, is easily transformable into multiple authorship verification problems [ 128 ]. An open-set variant of the author identification problem allows for a suspicious document with an author that is not included in any of the input sets [ 234 ].

Several other stylometry-based tasks, e.g., author profiling, exist. However, we limit the descriptions in the next section to methods whose main application is plagiarism detection. We recommend readers interested in related tasks to refer to the overview paper of PAN’17 [ 200 ].

PLAGIARISM DETECTION METHODS

We categorize plagiarism detection methods and structure their description according to our typology of plagiarism. Lexical detection methods exclusively consider the characters in a document. Syntax-based detection methods consider the sentence structure, i.e., the parts of speech and their relationships. Semantics-based detection methods compare the meaning of sentences, paragraphs, or documents. Idea-based detection methods go beyond the analysis of text in a document by considering non-textual content elements like citations, images, and mathematical content. Before presenting details on each class of detection methods, we describe preprocessing strategies that are relevant for all classes of detection methods.

Preprocessing

The initial preprocessing steps applied as part of plagiarism detection methods typically include document format conversions and information extraction. Before 2013, researchers described the extraction of text from binary document formats like PDF and DOC as well as from structured document formats like HTML and DOCX in more details than in more recent years (e.g., Refer- ence [ 49 ]). Most research papers on text-based plagiarism detection methods we review in this article do not describe any format conversion or text extraction procedures. We attribute this development to the technical maturity of text extraction approaches. For plagiarism detection approaches that analyze non-textual content elements, e.g., academic citations and references [ 90 , 91 , 161 , 191 ], images [ 162 ], and mathematical content [ 163 , 165 ], document format conversion, and information extraction still present significant challenges.

Specific preprocessing operations heavily depend on the chosen approach. The aim is to remove noise while keeping the information required for the analysis. For text-based detection methods, typical preprocessing steps include lowercasing, punctuation removal, tokenization, segmentation, number removal or number replacement, named entity recognition, stop words removal, stemming or lemmatization, Part of Speech (PoS) tagging, and synset extension. Approaches employing synset extension typically employ thesauri like WordNet [ 69 ] to assign the identifier of the class of synonymous words to which a word in the text belongs. The synonymous words can then be considered for similarity calculation. Detection methods operating on the lexical level usually perform chunking as a preprocessing step. Chunking groups text elements into sets of given lengths, e.g., word n-grams, line chunks, or phrasal constituents in a sentence [ 47 ].

Some detection approaches, especially in intrinsic plagiarism detection, limit preprocessing to a minimum to not loose potentially useful information [ 9 , 67 ]. For example, intrinsic detection methods typically do not remove punctuation.

All preprocessing steps we described represent standard procedures in Natural Language Processing (NLP), hence well-established, publicly available software libraries support these steps. The research papers we reviewed predominantly used the multilingual and multifunctional text processing pipelines Natural Language Toolkit Kit (Python) or Stanford CoreNLP library (Java). Commonly applied syntax analysis tools include Penn Treebank, 3 Citar, 4 TreeTagger, 5 and Stanford parser. 6 Several papers present resources for Arabic [ 33 , 34 , 227 ] and Urdu [ 54 ] language processing.

Lexical Detection Methods

Lexical detection methods exclusively consider the characters in a text for similarity computation. The methods are best suited for identifying copy-and-paste plagiarism that exhibits little to no obfuscation. To detect obfuscated plagiarism, the lexical detection methods must be combined with more sophisticated NLP approaches [ 9 , 67 ]. Lexical detection methods are also well-suited to identify homoglyph substitutions, which are a common form of technical disguise. The only paper in our collection that addressed the identification of technically disguised plagiarism is Refer- ence [ 19 ]. The authors used a list of confusable Unicode characters and applied approximate word n-gram matching using the normalized Hamming distance.

Lexical detection approaches typically fall into one of the three categories we describe in the following: n-gram comparisons, vector space models, and querying search engines .

N-gram Comparisons.  Comparing n-grams refers to determining the similarity of sequences of $n$ consecutive entities, which are typically characters or words and less frequently phrases or sentences. n-gram comparisons are widely applied for candidate retrieval or the seeding phase of the detailed analysis stage in extrinsic monolingual and cross-language detection approaches as well as in intrinsic detection.

Approaches using n-gram comparisons first split a document into (possibly overlapping) n-grams, which they use to create a set-based representation of the document or passage (“fingerprint”). To enable efficient retrieval, most approaches store fingerprints in index data structures. To speed up the comparison of individual fingerprints, some approaches hash or compress the n-grams that form the fingerprints. Hashing or compression reduces the lengths of the strings under comparison and allows performing computationally more efficient numerical comparisons. However, hashing introduces the risk of false positives due to hash collisions. Therefore, hashed or compressed fingerprinting is more commonly applied for the candidate retrieval stage, in which achieving high recall is more important than achieving high precision.

Fingerprinting is the most popular method for assessing local lexical similarity [ 104 ]. However, recent research has focused increasingly on detecting obfuscated plagiarism. Thus n-gram fingerprinting is often restricted to the preprocessing stage [ 20 ] or used as a feature for machine learning [ 7 ]. Character n-gram comparisons can be applied to cross-language plagiarism detection (CLPD) if the languages in question exhibit a high lexical similarity, e.g., English and Spanish [ 79 ].

Table 4 presents papers employing word n-grams; Table 5 lists papers using character n-grams, and Table 6 shows papers that employ hashing or compression for n-gram fingerprinting.

Extrinsic Document-level detection Stop words removed [ , , , , , , ]
Stop word n-grams [ ]
Candidate retrieval Stop words removed [ ]
All word n-grams and stop word n-grams [ ]
Detailed analysis All word n-grams [ , , ]
Stop words removed [ , ]
All word n-grams, stop word n-grams, and named entity n-grams [ ]
Numerous n-gram variations [ , ]
Context n-grams [ , ]
Paraphrase identification All word n-grams [ , ]
Combination with ESA [ ]
CLPD Stop words removed [ ]
Intrinsic Author identification Overlap in LZW dictionary [ ]
Author verification Word n-grams [ , , , , , , ]
Stop word n-grams [ , , ]
Extrinsic Document-level detection Pure character n-grams [ , ]
Overlap in LZW dictionary [ ]
Machine learning [ ]
Combined with Bloom filters [ ]
Detailed analysis Hashed character n-grams [ ]
Paraphrase identification Feature for machine learning [ ]
Cross-language PD Cross-language CNG [ , , , ]
Intrinsic Style-breach detection CNG as stylometric features [ , ],
Author identification Bit n-grams [ ]
Author verification CNG as stylometric features [ , , , , ], [ , , , , , , , , , , , ]
Author clustering CNG as stylometric features [ , , , , ]
Document-level detection Hashing [ , , , ]
Candidate retrieval Hashing [ , , , ]
Detailed analysis Hashing [ , , , ]
Document-level detection Compression [ ]
Author identification Compression [ , , , ]

Vector Space Models (VSM) are a classic retrieval approach that represents texts as high-dimensional vectors [ 249 ]. In plagiarism detection, words or word n-grams typically form the dimensions of the vector space and the components of a vector undergo term frequency–inverse document frequency (tf-idf) weighting [ 249 ]. Idf values are either derived from the suspicious document or the corpus [ 205 , 238 ]. The similarity of vector representations—typically quantified using the cosine measure, i.e., the angle between the vectors—is used as a proxy for the similarity of the documents the vectors represent.

Most approaches employ predefined similarity thresholds to retrieve documents or passages for subsequent processing. Kanjirangat and Gupta [ 249 ] and Ravi et al. [ 208 ] follow a different approach. They divide the set of source documents into K clusters by first selecting K centroids and then assigning each document to the group whose centroid is most similar. The suspicious document is used as one of the centroids and the corresponding cluster is passed on to the subsequent processing stages.

VSM remain popular and well-performing approaches not only for detecting copy-and-paste plagiarism but also for identifying obfuscated plagiarism as part of a semantic analysis. VSM are also frequently applied in intrinsic plagiarism detection. A typical approach is to represent sentences as vectors of stylometric features to find outliers or to group stylistically similar sentences.

Table 7 presents papers that employ VSM for extrinsic plagiarism detection; Table 8 lists papers using VSM for intrinsic plagiarism detection.

Document-level detection sentence Combination of similarity metrics [ ]
Document-level detection sentence VSM as a bitmap; compressed for comparison [ ]
Document-level detection sentence Machine learning to set similarity thresholds [ ]
Document-level detection word Synonym replacement [ ]
Document-level detection word, sentence Fuzzy set of WordNet synonyms [ ]
Candidate retrieval word Vectors of word N-grams [ , , ],
Candidate retrieval word K-means clustering of vectors to find documents most similar to the input doc. [ , ]
Candidate retrieval word Z-order mapping of multidimensional vectors to scalar and subsequent filtering [ ]
Candidate retrieval word Topic-based segmentation; Re-ranking of results based on the proximity of terms [ ]
Detailed analysis sentence Pure VSM [ , , , ]
Detailed analysis sentence Adaptive adjustment of parameters to detect the type of obfuscation [ , ]
Detailed analysis sentence Hybrid similarity (Cosine+ Jaccard) [ ]
Detailed analysis word Pure VSM [ ]
Paraphrase identification sentence Semantic role annotation [ ]
Style-breach detection word Word frequencies [ ]
Style-breach detection word Vectors of lexical and syntactic features [ , ]
Style-breach detection sentence Vectors of word embeddings [ ]
Style-breach detection sentence Vectors of lexical features [ ]
Style-breach detection sliding window Vectors of lexical features [ ]
Author clustering document Vectors of lexical features [ , , , , ]
Author clustering document Word frequencies [ ]
Author clustering document Word embeddings [ ]
Author verification document Word frequencies [ ]
Author verification document Vectors of lexical features [ , , , ]
Author verification document Vectors of lexical and syntactic features [ , , , ]
Author verification document Vectors of syntactic features [ ]

Querying Web Search Engines.  Many detection methods employ Web search engines for candidate retrieval, i.e., for finding potential source documents in the initial stage of the detection process. The strategy for selecting the query terms from the suspicious document is crucial for the success of this approach. Table 9 gives an overview of the strategies for query term selection employed by papers in our collection.

Querying the words with the highest tf-idf value [ , , , , , ]
Querying the least frequent words [ , ]
Querying the least frequent strings [ ]
Querying the words with the highest tf-idf value as well as noun phrases [ , , ]
Querying the nouns and most frequent words [ ]
Querying the nouns and verbs [ ]
Querying the nouns, verbs, and adjectives [ , , , ]
Querying the nouns, facts (dates, names, etc.) as well as the most frequent words [ ]
Querying keywords and the longest sentence in a paragraph [ , ]
Comparing different querying heuristics [ ]
Incrementing passage length and passage selection heuristics [ ]
Query expansion by words from UMLS Meta-thesaurus [ ]

Intrinsic detection approaches can employ Web Search engines to realize the G eneral Impostors Method . This method transforms the one-class verification problem regarding an author's writing style into a two-class classification problem. The method extracts keywords from the suspicious document to retrieve a set of topically related documents from external sources, the so-called “impostors.” The method then quantifies the “average” writing style observable in impostor documents, i.e., the distribution of stylistic features to be expected. Subsequently, the method compares the stylometric features of passages from the suspicious document to the features of the “average” writing style in impostor documents. This way, the method distinguishes the stylistic features that are characteristic of an author from the features that are specific to the topic [ 135 ]. Koppel and Winter present the method in detail [ 146 ]. Detection approaches implementing the general impostors method achieved excellent results in the PAN competitions, e.g., winning the competition in 2013 and 2014 [ 128 , 232 ]. Table 10 presents papers using this method.

Author verification [ , , , , ]

Syntax-based Methods

Syntax-based detection methods typically operate on the sentence level and employ PoS tagging to determine the syntactic structure of sentences [ 99 , 245 ]. The syntactic information helps to address morphological ambiguity during the lemmatization or stemming step of preprocessing [ 117 ], or to reduce the workload of a subsequent semantic analysis, typically by exclusively comparing the pairs of words belonging to the same PoS class [ 102 ]. Many intrinsic detection methods use the frequency of PoS tags as a stylometric feature.

The method of Tschuggnall and Specht [ 245 ] relies solely on the syntactic structure of sentences. Table 11 presents an overview of papers using syntax-based methods.

Extrinsic PoS tagging Addressing morphological ambiguity [ , ]
Word comparisons within the same PoS class only [ , ]
Combined with stop-words [ ]
Comparing PoS sequences [ ]
Combination with PPM compression [ ]
Intrinsic PoS tags as stylometric features PoS frequency [ , , , , ]
PoS n-gram frequency [ , , , , , , , ]
PoS frequency, PoS n-gram frequency, starting PoS tag [ ]
Comparing syntactic trees Direct comparison [ , ]
Integrated syntactic graphs [ ]

Semantics-based Methods

Papers presenting semantics-based detection methods are the largest group in our collection. This finding reflects the importance of detecting obfuscated forms of academic plagiarism, for which semantics-based detection methods are the most promising approach [ 216 ]. Semantics-based methods operate on the hypothesis that the semantic similarity of two passages depends on the occurrence of similar semantic units in these passages. The semantic similarity of two units derives from their occurrence in similar contexts.

Many semantics-based methods use thesauri (e.g., WordNet or EuroVoc 7 ). Including semantic features, like synonyms, hypernyms, and hyponyms, in the analysis improves the performance of paraphrase identification [ 9 ]. Using a canonical synonym for each word helps detecting synonym-replacement obfuscation and reduces the vector space dimension [ 206 ]. Sentence segmentation and text tokenization are crucial parameters for all semantics-based detection methods. Tokenization extracts the atomic units of the analysis, which are typically either words or phrases. Most papers in our collection use words as tokens.

Employing established semantic text analysis methods like Latent Semantic Analysis (LSA), Explicit Semantic Analysis (ESA), and word embeddings for extrinsic plagiarism detection is a popular and successful approach. This group of methods follows the idea of “distributional semantics,” i.e., terms co-occurring in similar contexts tend to convey a similar meaning. In the reverse conclusion, distributional semantics assumes that similar distributions of terms indicate semantically similar texts. The methods differ in the scope within which they consider co-occurring terms. Word embeddings consider only the immediately surrounding terms, LSA analyzes the entire document and ESA uses an external corpus.

Latent Semantic Analysis is a technique to reveal and compare the underlying semantic structure of texts [ 55 ]. To determine the similarity of term distributions in texts, LSA computes a matrix, in which rows represent terms, columns represent documents and the entries of the matrix typically represent log-weighted tf-idf values [ 46 ]. LSA then employs Singular Value Decomposition (SVD) or similar dimensionality reduction techniques to find a lower-rank approximation of the term-document matrix by reducing the number of rows (i.e., pruning less relevant terms) while maintaining the similarity distribution between columns (i.e., the text representations). The terms remaining after the dimensionality reduction are assumed to be most representative of the semantic meaning of the text. Hence, comparing the rank-reduced matrix-representations of texts allows computing the semantic similarity of the texts [ 46 ].

LSA can reveal similarities between texts that traditional vector space models cannot express [ 116 ]. The ability of LSA to address synonymy is beneficial for paraphrase identification. For example, Satyapanich et al. [ 222 ] considered two sentences as paraphrases if their LSA similarity is above a threshold. While LSA performs well in addressing synonymy, its ability to reflect polysemy is limited [ 55 ].

Ceska [ 46 ] first applied LSA for plagiarism detection. AlSallal et al. [ 15 ] proposed a novel weighting approach that assigns higher weights to the most common terms and used LSA as a stylometric feature for intrinsic plagiarism detection. Aldarmaki and Diab [ 11 ] used weighted matrix factorization—a method similar to LSA—for cross-language paraphrase identification. Table 12 lists other papers employing LSA for extrinsic and intrinsic plagiarism detection.

Extrinsic Document-level detection LSA with phrase tf-idf [ , ]
LSA in combination with other methods [ ]
Candidate retrieval LSA only [ ]
Paraphrase identification LSA only [ ]
LSA with machine learning [ , , , ]
Weighted matrix factorization [ ]
Intrinsic Document-level detection LSA with stylometric features [ ]
Author identification LSA with machine learning [ , ]
LSA at CNG level [ ]

Explicit Semantic Analysis is an approach to model the semantics of a text in a high-dimensional vector space of semantic concepts [ 82 ]. Semantic concepts are the topics in a man-made knowledge base corpus (typically Wikipedia or other encyclopedias). Each article in the knowledge base is an explicit description of the semantic content of the concept, i.e., the topic of the article [ 163 ]. ESA builds a “semantic interpreter” that allows representing texts as concept vectors whose components reflect the relevance of the text for each of the semantic concepts, i.e., knowledge base articles [ 82 ]. Applying vector similarity measures, such as the cosine metric, to the concept vectors then allows determining the texts’ semantic similarity.

Table 13 shows detection methods that employed ESA depending on the corpus used to build the semantic interpreter. Constructing the semantic interpreter from multilingual corpora, such as Wikipedia, allows the application of ESA for cross-language plagiarism detection [ 78 ]. ESA has several applications beyond PD, e.g., when applied for document classification, ESA achieved a precision above 95% [ 124 , 174 ].

Wikipedia (monolingual) [ , , ]
Wikipedia (cross-language) [ , ]
Wikipedia + FanFiction [ ]

The Information Retrieval-based semantic similarity approach proposed by Itoh [ 120 ] is a generalization of ESA. The method models a text passage as a set of words and employs a Web search engine to obtain a set of relevant documents for each word in the set. The method then computes the semantic similarity of the text passages as the similarity of the document sets obtained, typically using the Jaccard metric. Table 14 presents papers that also follow this approach.

Articles from Wikipedia [ ]
Synonyms from Farsnet [ ]

Word embeddings is another semantic analysis approach that is conceptually related to ESA. While ESA considers term occurrences in each document of the corpus, word embeddings exclusively analyze the words that surround the term in question. The idea is that terms appearing in proximity to a given term are more characteristic of the semantic concept represented by the term in question than more distant words. Therefore, terms that frequently co-occur in proximity within texts should also appear closer within the vector space [ 73 ]. In cross-language plagiarism detection, word embeddings outperformed other methods when syntactic weighting was employed [ 73 ]. Table 15 summarizes papers that employ word embeddings.

Extrinsic Candidate retrieval [ ]
Cross-language PD [ ]
Intrinsic Paraphrase identification [ , , , , ]
Style-breach detection [ ]
Author clustering [ ]

Word Alignment is a semantic analysis approach widely used for machine translation [ 240 ] and paraphrase identification. Words are aligned, i.e., marked as related, if they are semantically similar. Semantic similarity of two words is typically retrieved from an external database, like WordNet. The semantic similarity of two sentences is then computed as the proportion of aligned words. Word alignment approaches achieved the best performance for the paraphrase identification task at SemEval 2014 [ 240 ] and were among the top-performing approaches at SemEval-2015 [ 9 , 242 ].

Cross-language alignment-based similarity analysis (CL-ASA) is a variation of the word alignment approach for cross-language semantic analysis. The approach uses a parallel corpus to compute the similarity that a word $x$ in the suspicious document is a valid translation of the term $y$ in a potential source document for all terms in the suspicious and the source documents. The sum of the translation probabilities yields the probability that the suspicious document is a translation of the source document [ 28 ]. Table 16 presents papers using Word alignment and CL-ASA.

Word alignment only [ , ]
Word alignment-based modification of Jaccard and Levenshtein measure [ ]
Word alignment in combination with machine learning [ , , ]
CL-ASA [ , ]
Translation + word alignment [ ]

Graph-based Semantic Analysis. Knowledge graph analysis (KGA) represents a text as a weighted directed graph, in which the nodes represent the semantic concepts expressed by the words in the text and the edges represent the relations between these concepts [ 79 ]. The relations are typically obtained from publicly available corpora, such as BabelNet 8 or WordNet. Determining the edge weights is the major challenge in KGA. Traditionally, edge weights were computed from analyzing the relations between concepts in WordNet [ 79 ]. Salvador et al. [ 79 ] improved the weighting procedure by using continuous skip-grams that additionally consider the context in which the concepts appear. Applying graph similarity measures yields a semantic similarity score for documents or parts thereof (typically sentences).

Inherent characteristics of KGA like word sense disambiguation, vocabulary expansion, and language independence are highly beneficial for plagiarism detection. Thanks to these characteristics, KGA is resistant to synonym replacements and syntactic changes. Using multilingual corpora allows the application of KGA for cross-language PD [ 79 ]. KGA achieves high detection effectiveness if the text is translated literally; for paraphrased translations, the results are worse [ 77 ].

The universal networking language approach proposed by Avishek and Bhattacharyyan [ 53 ] is conceptually similar to KGA. The method constructs a dependency graph for each sentence and then compares the lexical, syntactic, and semantic similarity separately. Kumar [ 147 ] used semantic graphs for the seeding phase of the detailed analysis stage. In those graphs, the nodes corresponded to all words in a document or passage. The edges represented the adjacency of the words. The edge weights expressed the semantic similarity of words based on the probability that the words occur in a 100-word window within a corpus of DBpedia 9 articles. Overlapping passages in two documents were identified using the minimum weight bipartite clique cover.

Table 17 presents detection methods that employ graph-based semantic analysis.

Document-level detection Knowledge graph analysis [ ]
Detailed analysis Semantic graphs [ ]
Detailed analysis Word n-gram graphs for sentences [ ]
Paraphrase identification Knowledge graph analysis [ ]
Paraphrase identification Universal networking language [ ]
Cross-language plagiarism detection Knowledge graph analysis [ , , ]

Semantic Role Labeling (SRL) determines the semantic roles of terms in a sentence, e.g., the subject, object, events, and relations between these entities, based on roles defined in linguistic resources, such as PropBank 10 or VerbNet. 11 The goal is to extract “who” did “what” to “whom” “where” and “when” [ 188 ]. The first step in SRL is PoS tagging and syntax analysis to obtain the dependency tree of a sentence. Subsequently, the semantic annotation is performed [ 71 ].

Paul and Jamal [ 188 ] used SRL in combination with sentence ranking for document-level plagiarism detection. Hamza and Salim [ 182 ] employed SRL to extract arguments from sentences, which they used to quantify and compare the syntactic and semantic similarity of the sentences. Ferreira et al. [ 71 ] obtained the similarity of sentences by combining various features and measures using machine learning. Table 18 lists detection approaches that employ SRL.

Document-level detection [ , ]
Paraphrase identification [ ]
Monolingual plagiarism detection Citation-based PD [ , , , , ]
Math-based PD [ , ]
Image-based PD [ ]
Cross-lingual plagiarism detection CbPD [ ]

Idea-based Methods

Idea-based methods analyze non-textual content elements to identify obfuscated forms of academic plagiarism. The goal is to complement detection methods that analyze the lexical, syntactic, and semantic similarity of text to identify plagiarism instances that are hard to detect both for humans and for machines. Table 19 lists papers that proposed idea-based detection methods.

Citation-based plagiarism detection (CbPD) proposed by Gipp et al. [ 91 ] analyses patterns of in-text citations in academic documents, i.e., identical citations occurring in proximity or in a similar order within two documents. The idea is that in-text citations encode semantic information language-independently. Thus, analyzing in-text citation patterns can indicate shared structural and semantic similarity among texts. Assessing semantic and structural similarity using citation patterns requires significantly less computational effort than approaches for semantic and syntactic text analysis [ 90 ]. Therefore, CbPD is applicable for the candidate retrieval and the detailed analysis stage [ 161 ] of monolingual [ 90 , 93 ] and cross-lingual [ 92 ] detection methods. For weakly obfuscated instances of plagiarism, CbPD achieved comparable results as lexical detection methods; for paraphrased and idea plagiarism, CbPD outperformed lexical detection methods in the experiments of Gipp et al. [ 90 , 93 ]. Moreover, the visualization of citation patterns was found to facilitate the inspection of the detection results by humans, especially for cases of structural and idea plagiarism [ 90 , 93 ]. Pertile et al. [ 191 ] confirmed the positive effect of combining citation and text analysis on the detection effectiveness and devised a hybrid approach using machine learning. CbPD can also alert a user when the in-text citations are inconsistent with the list of references. Such inconsistency may be caused by mistake, or deliberately to obfuscate plagiarism.

Meuschke et al. [ 163 ] proposed mathematics-based plagiarism detection (MathPD) as an extension of CbPD for documents in the Science, Technology, Engineering and Mathematics (STEM) fields. Mathematical expressions share many properties of academic citations, e.g., they are essential components of academic STEM documents, are language-independent, and contain rich semantic information. Furthermore, some disciplines, such as mathematics and physics, use academic citations sparsely [ 167 ]. Therefore, a citation-based analysis alone is less likely to reveal suspicious content similarity for these disciplines [ 163 ], [ 165 ]. Meuschke et al. showed that an exclusive math-based similarity analysis performed well for detecting confirmed cases of academic plagiarism in STEM documents [ 163 ]. Combining a math-based and a citation-based analysis further improved the detection performance for confirmed cases of plagiarism [ 165 ].

Image-based plagiarism detection analyze graphical content elements. While a large variety of methods to retrieve similar images have been proposed [ 56 ], few studies investigated the application of content-based image retrieval approaches for academic plagiarism detection. Meuschke et al. [ 162 ] is the only such study we encountered during our data collection. The authors proposed a detection approach that integrates established image retrieval methods with novel similarity assessments for images that are tailored to plagiarism detection. The approach has been shown to retrieve both copied and altered figures.

Ensembles of Detection Methods

Each class of detection methods has characteristic strengths and weaknesses. Many authors showed that combining detection methods achieves better results than applying the methods individually [ 7 , 62 , 78 , 128 , 133 , 234 , 242 , 273 , 275 ]. By assembling the best-performing detection methods in PAN 2014, the organizers of the workshop created a meta-system that performed best overall [ 232 ].

In intrinsic plagiarism detection, combining feature analysis methods is a standard approach [ 233 ], since an author's writing style always comprises of a multitude of stylometric features [ 127 ]. Many recent author verification methods employ machine learning to select the best performing feature combination [ 234 ].

In general, there are three ways of combining plagiarism detection methods:

  • Using adaptive algorithms that determine the obfuscation strategy, choose the detection method, and set similarity thresholds accordingly
  • Using an ensemble of detection methods whose results are combined using static weights
  • Using machine learning to determine the best-performing combination of detection methods

The winning approach at PAN 2014 and 2015 [ 216 ] used an adaptive algorithm . After finding the seeds of overlapping passages, the authors extended the seeds using two different thresholds for the maximum gap. Based on the length of the passages, the algorithm automatically recognized different plagiarism forms and set the parameters for the VSM-based detection method accordingly.

The “ linguistic knowledge approach ” proposed by Abdi et al. [ 2 ] exemplifies an ensemble of detection methods . The method combines the analysis of syntactic and semantic sentence similarity using a linear combination of two similarity metrics: (i) the cosine similarity of semantic vectors and (ii) the similarity of syntactic word order vectors [ 2 ]. The authors showed that the method outperformed other contesters on the PAN-10 and PAN-11 corpora. Table 20 lists other ensembles of detection methods.

Document-level detection Linguistic knowledge [ ]
Candidate retrieval Querying a Web search engine Combination of querying heuristics [ ]
Detailed analysis Vector space model Adaptive algorithm [ , , ]

Machine Learning approaches for plagiarism detection typically train a classification model that combines a given set of features. The trained model can then be used to classify other datasets. Support vector machine (SVM) is the most popular model type for plagiarism detection tasks. SVM uses statistical learning to minimize the distance between a hyperplane and the training data. Choosing the hyperplane is the main challenge for correct data classification [ 66 ].

Machine-learning approaches are very successful in intrinsic plagiarism detection. Supervised machine-learning methods, specifically random forests, were the best-performing approach at the intrinsic detection task of the PAN 2015 competition [ 233 ]. The best-known method for author verification is unmasking [ 232 ], which uses an SVM classifier to distinguish the stylistic features of the suspicious document from a set of documents for which the author is known. The idea of unmasking is to train and run the classifier and then remove the most significant features of the classification model and rerun the classification. If the classification accuracy drops significantly, then the suspicious and known documents are likely from the same author; otherwise, they are likely written by different authors [ 232 ]. There is no consensus on the stylometric features that are most suitable for authorship identification [ 158 ]. Table 21 gives an overview of intrinsic detection methods that employ machine-learning techniques.

Style-breach detection Gradient Boosting Regression Trees Lexical, syntax [ ]
Author identification SVM Semantic (LSA) [ ]
Author clustering Recurrent ANN Lexical [ ],
SVM Lexical, syntax [ ]
Author verification Recurrent ANN Lexical [ ]
k-nearest neighbor Lexical [ ]
Lexical, syntax [ ]
Homotopy-based classification Lexical [ ]
Naïve Bayes Lexical [ ]
SVM Lexical, syntax [ , , , , ]
Equal error rate Lexical [ ]
Decision Tree Lexical [ ]
Random Forest Lexical, syntax [ , , ]
Genetic algorithm Lexical, syntax [ , ]
Multilayer perceptron Lexical, semantic (LSA) [ ]
Many Lexical [ , ]
Lexical, syntax [ ]

For extrinsic plagiarism detection, the application of machine learning has been studied for various components of the detection process [ 208 ]. Gharaviet al. [ 88 ] used machine learning to determine the suspiciousness thresholds for a vector space model. Zarrella et al. [ 273 ] won the SemEval competition in 2015 with their ensemble of seven algorithms; most of them used machine learning. While Hussain and Suryani [ 116 ] successfully used an SVM classifier for the candidate retrieval stage [ 269 ], Williams et al. compared many supervised machine-learning methods and concluded that applying them for classifying and ranking Web search engine results did not improve candidate retrieval. Kanjirangat and Gupta [ 252 ] used a genetic algorithm to detect idea plagiarism. The method randomly chooses a set of sentences as chromosomes. The sentence sets that are most descriptive of the entire document are combined and form the next generation. In this way, the method gradually extracts the sentences that represent the idea of the document and can be used to retrieve similar documents.

Sánchez-Vega et al. [ 218 ] proposed a method termed rewriting index that evaluates the degree of membership of each sentence in the suspicious document to a possible source document. The method uses five different Turing machines to uncover verbatim copying as well as basic transformations on the word level (insertion, deletion, substitution). The output values of the Turing machines are used as the features to train a Naïve Bayes classifier and identify reused passages.

In the approach of Afzal et al. [ 5 ], the linear combination of supervised and unsupervised machine-learning methods outperformed each of the methods applied individually. In the experiments of Alfikri and Purwarianti [ 13 ], SVM classifiers outperformed Naïve Bayes classifiers. In the experiments of Subroto and Selamat [ 236 ], the best performing configuration was a hybrid model that combined SVM and an artificial neural network (ANN). El-Alfy et al. [ 62 ] found that an abductive network outperformed SVM. However, as shown in Table 22 , SVM is the most popular classifier for extrinsic plagiarism detection methods. Machine learning appears to be more beneficial when applied for the detailed analysis, as indicated by the fact that most extrinsic detection methods apply machine learning for that stage (cf. Table 22 ).

Document-level detection SVM Semantic [ , ]
SVM, Naïve Bayes Lexical, semantic [ ]
Decision tree, k-nearest neighbor Syntax [ ]
Naïve Bayes, SVM, Decision tree Lexical, syntax [ ]
Many Semantic (CbPD) [ ]
Candidate retrieval SVM Lexical [ ]
Linear discriminant analysis Lexical, syntax [ ]
Genetic algorithm Lexical, syntax [ ]
Detailed analysis Logical regression model Lexical, syntax, semantic [ ]
Naïve Bayes Lexical [ ]
Naïve Bayes, Decision Tree, Random Forest Lexical [ ]
SVM Lexical, semantic [ ]
Paraphrase identification SVM Lexical [ ]
Lexical, semantic [ , ]
Lexical, syntax, semantic [ , , ]
MT metrics [ ]
ML with syntax and semantic features [ ]
k-nearest neighbor, SVM, artificial neural network Lexical [ ]
SVM, Random forest, Gradient boosting Lexical, syntax, semantic, MT metrics [ ]
SVM, MaxEnt Lexical, syntax, semantic [ ]
Abductive networks Lexical [ ]
Linear regression Lexical, syntax, semantic [ ]
L2-regularized logistic regression Lexical, syntax, semantic, ML [ ]
Ridge regression Lexical, semantic [ ]
Gaussian process regression Lexical, semantic [ ]
Isotonic regression Semantic [ ]
Artificial neural network Lexical, semantic [ ]
Deep neural network Syntax, semantic [ ]
Semantic [ ]
Decision Tree Semantic [ ]
Lexical, syntax, semantic [ , ]
Random Forest Semantic, MT metrics [ ]
Many Lexical, semantic [ , ]
Lexical, syntax, semantic [ , ]
Cross-language PD Artificial neural networks Semantic [ ]

Evaluation of Plagiarism Detection Methods

The availability of datasets for development and evaluation is essential for research on natural language processing and information retrieval. The PAN series of benchmark competitions is a comprehensive and well‑established platform for the comparative evaluation of plagiarism detection methods and systems [ 197 ]. The PAN test datasets contain artificially created monolingual (English, Arabic, Persian) and—to a lesser extent—cross-language plagiarism instances (German and Spanish to English) with different levels of obfuscation. The papers included in this review that present lexical, syntactic, and semantic detection methods mostly use PAN datasets 12 or the Microsoft Research Paraphrase corpus. 13 Authors presenting idea-based detection methods that analyze non-textual content features or cross-language detection methods for non-European languages typically use self-created test collections, since the PAN datasets are not suitable for these tasks. A comprehensive review of corpus development initiatives is out of the scope of this article.

Since plagiarism detection is an information retrieval task, precision, recall, and F‑measure are typically employed to evaluate plagiarism detection methods. A notable use-case-specific extension of these general performance measures is the PlagDet metric. Potthast et al. introduced the metric to evaluate the performance of methods for the detailed analysis stage in external plagiarism detection [ 201 ]. A method may detect only a fragment of a plagiarism instance or report a coherent instance as multiple detections. To account for these possibilities, Potthast et al. included the granularity score as part of the PlagDet metric. The granularity score is the ratio of the detections a method reports and the true number of plagiarism instances.

PLAGIARISM DETECTION SYSTEMS

Plagiarism detection systems implement (some of) the methods described in the previous sections. To be applicable in practice, the systems must address the tradeoff between detection performance and processing speed [ 102 ], i.e., find sources of plagiarism with reasonable computational costs.

Most systems are Web-based; some can run locally. The systems typically highlight the parts of a suspicious document that likely originate from another source as well as which source that is. Understanding how the source was changed is often left to the user. Providers of plagiarism detection systems, especially of commercial systems, rarely publish information on the detection methods they employ [ 85 , 256 ]. Thus, estimating to what extent plagiarism detection research influences practical applications is difficult.

Velásquez et al. [ 256 ] provided a text-matching software and described its functionality that included the recognition of quotes. The system achieved excellent results in the PAN 10 and PAN 11 competitions. Meanwhile, the authors commercialized the system [ 195 ].

Academics and practitioners are naturally interested in which detection system achieves the best results. Weber-Wulff and her team performed the most methodologically sound investigation of this question in 2004, 2007, 2008, 2010, 2011, 2012, and 2013 [ 266 ]. In their latest benchmark evaluation, the group compared 15 systems using documents written in English and German.

Chowdhury and Bhattacharyya [ 48 ] provided an exhaustive list of currently available plagiarism detection systems. Unfortunately, the description of each system is short, and the authors did not provide performance comparisons. Pertile et al. [ 191 ] summarized the basic characteristics of 17 plagiarism detection systems. Kanjirangat and Gupta [ 251 ] compared four publicly available systems. They used four test documents that contained five forms of plagiarism (copy-and-paste, random obfuscation, translation to Hindi and back, summarization). All systems failed to identify plagiarism instances other than copy-and-paste and random obfuscation.

There is consensus in the literature that the inability of plagiarism detection systems to identify obfuscated plagiarism is currently their most severe limitation [ 88 , 251 , 266 ].

In summary, there is a lack of systematic and methodologically sound performance evaluations of plagiarism detection systems, since the benchmark comparisons of Weber-Wulff ended in 2013. This lack is problematic, since plagiarism detection systems are typically a key building block of plagiarism policies. Plagiarism detection methods and plagiarism policies are the subjects of extensive research. We argue that plagiarism detection systems should be researched just as extensively but are currently not.

In this section, we summarize the advancements in the research on methods to detect academic plagiarism that our review identified. Figure 2 depicts the suitability of the methods discussed in the previous sections for identifying the plagiarism forms presented in our typology. As shown in the Figure, n-gram comparisons are well-suited for detecting character-preserving plagiarism and partially suitable for identifying ghostwriting and syntax-preserving plagiarism. Stylometry is routinely applied for intrinsic plagiarism detection and can reveal ghostwriting and copy-and-paste plagiarism. Vector space models have a wide range of applications but appear not to be particularly beneficial for detecting idea plagiarism. Semantics-based methods are tailored to the detection of semantics-preserving plagiarism, yet also perform well for character-preserving and syntax-preserving forms of plagiarism. Non-textual feature analysis and machine learning are particularly beneficial for detecting strongly obfuscated forms of plagiarism, such as semantics-preserving and idea-preserving plagiarism. However, machine learning is a universal approach that also performs well for less strongly disguised forms of plagiarism.

Fig. 2.

The first observation of our literature survey is that ensembles of detection methods tend to outperform approaches based on a single method [ 93 , 161 ]. Chong experimented with numerous methods for preprocessing as well as with shallow and deep NLP techniques [ 47 ]. He tested the approaches on both small and large-scale corpora and concluded that a combination of string-matching and deep NLP techniques achieves better results than applying the techniques individually.

Machine-learning approaches represent the logical evolution of the idea to combine heterogeneous detection methods. Since our previous review in 2013, unsupervised and supervised machine-learning methods have found increasingly wide-spread adoption in plagiarism detection research and significantly increased the performance of detection methods. Baroni et al. [ 27 ] provided a systematic comparison of vector-based similarity assessments. The authors were particularly interested in whether unsupervised count-based approaches like LSA achieve better results than supervised prediction-based approaches like Softmax. They concluded that the prediction-based methods outperformed their count-based counterparts in precision and recall while requiring similar computational effort. We expect that the research on applying machine learning for plagiarism detection will continue to grow significantly in the future.

Considering the heterogeneous forms of plagiarism (see the typology section), the static one-fits-all approach observable in most plagiarism detection methods before 2013 is increasingly replaced by adaptive detection algorithms. Many recent detection methods first seek to identify the likely obfuscation method and then apply the appropriate detection algorithm [ 79 , 198 ], or at least to dynamically adjust the parameters of the detection method [ 216 ].

Graph-based methods operating on the syntactic and semantic levels achieve comparable results to other semantics-based methods. Mohebbi and Talebpour [ 168 ] successfully employed graph-based methods to identify paraphrases. Franco-Salvador et al. [ 79 ] demonstrated the suitability of knowledge graph analysis for cross-language plagiarism detection.

Several researchers showed the benefit of analyzing non-textual content elements to improve the detection of strongly obfuscated forms of plagiarism. Gipp et al. demonstrated that analyzing in-text citation patterns achieves higher detection rates than lexical approaches for strongly obfuscated forms of academic plagiarism [ 90 , 92 – 94 ]. The approach is computationally modest and reduces the effort required of users for investigating the detection results. Pertile et al. [ 191 ] combined lexical and citation-based approaches to improve detection performance. Eisa et al. [ 61 ] strongly advocated for additional research on analyzing non-textual content features. The research by Meuschke et al. on analyzing images [ 162 ] and mathematical expressions [ 164 ] confirms that non-textual detection methods significantly enhance the detection capabilities. Following the trend of combining detection methods, we see the analysis of non-textual content features as a promising component of future integrated detection approaches.

Surprisingly many papers in our collection addressed plagiarism detection for Arabic and Persian texts (e.g., References [ 22 , 118 , 231 , 262 ]). The interest in plagiarism detection for the Arabic language led the organizers of the PAN competitions to develop an Arabic corpus for intrinsic plagiarism detection [ 34 ]. In 2015, the PAN organizers also introduced a shared task on plagiarism detection for Arabic texts [ 32 ], followed by a shared task for Persian texts one year later [ 22 ]. While these are promising steps toward improving plagiarism detection for Arabic, Wali et al. [ 262 ] noted that the availability of corpora and lexicons for Arabic is still insufficient when compared to other languages. This lack of resources and the complex linguistic features of the Arabic language cause plagiarism detection for Arabic to remain a significant research challenge [ 262 ].

For cross-language plagiarism detection methods, Ferrero et al. [ 74 ] introduced a five-class typology that still reflects the state of the art: cross-language character n-grams (CL-CNG), cross-language conceptual thesaurus-based similarity (CL-CTS), cross-language alignment-based similarity analysis (CL-ASA), cross-language explicit semantic analysis (CL-ESA), and translation with monolingual analysis (T+MA). Franco-Salvador et al. [ 80 ] showed that the performance of these methods varies depending on the language and corpus. The observation that the combination of detection methods improves the detection performance also holds for the cross-language scenario [ 80 ]. In the analysis of Ferrero et al. [ 74 ], the detection performance of methods exclusively depended on the size of the chosen chunk but not on the language, nor the dataset. Translation with monolingual analysis is a widely used approach. For the cross-language detection task (Spanish–English) at the SemEval competition in 2016, most of the contesters applied a machine translation from Spanish to English and then compared the sentences in English [ 7 ]. However, some authors do not consider this approach as cross-language plagiarism detection but as monolingual plagiarism detection with translation as a preprocessing step [ 80 ].

For intrinsic plagiarism detection, authors predominantly use lexical and syntax-based text analysis methods. Widely analyzed lexical features include character n-grams, word frequencies, as well as the average lengths of words, sentences, and paragraphs [ 247 ]. The most common syntax-based features include PoS tag frequencies, PoS tag pair frequencies, and PoS structures [ 247 ]. At the PAN competitions, methods that analyzed lexical features and employed simple clustering algorithms achieved the best results [ 200 ].

For the author verification task, the most successful methods treated the problem as a binary classification task. They adopted the extrinsic verification paradigm by using texts from other authors to identify features that are characteristic of the writing style of the suspected author [ 233 ]. The general impostors method is a widely used and largely successful realization of this approach [ 135 , 146 , 159 , 224 ].

From a practitioner's perspective, intrinsic detection methods exhibit several shortcomings. First, stylometric comparisons are inherently error-prone for documents collaboratively written by multiple authors [ 209 ]. This shortcoming is particularly critical, since most scientific publications have multiple authors [ 39 ]. Second, intrinsic methods are not well suited for detecting paraphrased plagiarism, i.e., instances in which authors illegitimately reused content from other sources that they presented in their own words. Third, the methods are generally not reliable enough for practical applications yet. Author identification methods achieve a precision of approximately 60%, author profiling methods of approximately 80% [ 200 ]. These values are sufficient for raising suspicion and encouraging further examination but not for proving plagiarism or ghostwriting. The availability of methods for automated author obfuscation aggravates the problem. The most effective methods can mislead the identification systems in almost half of the cases [ 199 ]. Fourth, intrinsic plagiarism detection approaches cannot point an examiner to the source document of potential plagiarism. If a stylistic analysis raised suspicion, then extrinsic detection methods or other search and retrieval approaches are necessary to discover the potential source document(s).

Other Applications of Plagiarism Detection Methods

Aside from extrinsic and intrinsic plagiarism detection, the methods described in this article have numerous other applications such as machine translation [ 67 ], author profiling for marketing applications [ 211 ], spam detection [ 248 ], law enforcement [ 127 , 211 ], identifying duplicate accounts in internet fora [ 4 ], identifying journalistic text reuse [ 47 ], patent analysis [ 1 ], event recognition based on tweet similarity [ 24 , 130 ], short answer scoring based on paraphrase identification [ 242 ], or native language identification [ 119 ].

In 2010, Mozgovoy et al. [ 173 ] proposed a roadmap for the future development of plagiarism detection systems. They suggested the inclusion of syntactic parsing, considering synonym thesauri, employing LSA to discover “tough plagiarism,” intrinsic plagiarism detection, and tracking citations and references. As our review of the literature shows, all these suggestions have been realized. Moreover, the field of plagiarism detection has made a significant leap in detection performance thanks to machine learning.

In 2015, Eisa et al. [ 61 ] praised the effort invested into improving text-based plagiarism detection but noted a critical lack of “techniques capable of identifying plagiarized figures, tables, equations and scanned documents or images .” While Meuschke et al. [ 163 , 165 ] proposed initial approaches that addressed these suggestions and achieved promising results, most of the research still addresses text-based plagiarism detection only.

A generally observable trend is that approaches that integrate different detection methods—often with the help of machine learning—achieve better results. In line with this observation, we see a large potential for the future improvement of plagiarism detection methods in integrating non-textual analysis approaches with the many well-performing approaches for the analysis of lexical, syntactic, and semantic text similarity.

To summarize the contributions of this article, we refer to the four questions Kitchenham et al. [ 138 ] suggested to assess the quality of literature reviews:

  • “Are the review's inclusion and exclusion criteria described and appropriate?
  • Is the literature search likely to have covered all relevant studies?
  • Did the reviewers assess the quality/validity of the included studies?
  • Were the basic data/studies adequately described?”

We believe that the answers to these four questions are positive for our survey. Our article summarizes previous research and identifies research gaps to be addressed in the future. We are confident that this review will help researchers newly entering the field of academic plagiarism detection to get oriented as well that it will help experienced researchers to identify related works. We hope that our findings will aid in the development of more effective and efficient plagiarism detection methods and system that will then facilitate the implementation of plagiarism policies.

  • Assad Abbas, Limin Zhang, and Samee U. Khan. 2014. A literature review on the state-of-the-art in patent analysis. World Pat. Inf. 37 (2014), 3–13. DOI: 10.1016/j.wpi.2013.12.006
  • Asad Abdi, Norisma Idris, Rasim M. Alguliyev, and Ramiz M. Aliguliyev. 2015. PDLK: Plagiarism detection using linguistic knowledge. Expert Syst. Appl . 42, 22 (2015), 8936–8946. DOI: 10.1016/j.eswa.2015.07.048
  • Samira Abnar, Mostafa Dehghani, Hamed Zamani, and Azadeh Shakery. 2014. Expanded n-grams for semantic text alignment—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Sadia Afroz, Aylin Caliskan Islam, Ariel Stolerman, Rachel Greenstadt, and Damon McCoy. 2014. Doppelgänger finder: Taking stylometry to the underground. In Proceedings of the 2014 IEEE Symposium on Security and Privacy . 212–226.
  • Naveed Afzal, Yanshan Wang, and Hongfang Liu. 2016. MayoNLP at SemEval-2016 Task 1: Semantic textual similarity based on lexical semantic net and deep learning semantic model. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16) . 674–679.
  • Basant Agarwal, Heri Ramampiaro, Helge Langseth, and Massimiliano Ruocco. 2018. A deep network model for paraphrase detection in short text messages. Inf. Process. Manag. 54, 6 (2018), 922–937. DOI: 10.1016/j.ipm.2018.06.005
  • Eneko Agirre, Carmen Banea, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, Rada Mihalcea, German Rigau, and Janyce Wiebe. 2016. Semeval-2016 task 1: Semantic textual similarity, monolingual and cross-lingual evaluation. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16) . 497–511.
  • Mayank Agrawal and Dilip Kumar Sharma. 2016. A state of art on source code plagiarism detection. In Proceedings of the 2016 2nd International Conference on Next Generation Computing Technologies (NGCT’16) . 236–241. DOI: 10.1109/NGCT.2016.7877421
  • Mohammad Al-Smadi, Zain Jaradat, Mahmoud Al-Ayyoub, and Yaser Jararweh. 2017. Paraphrase identification and semantic text similarity analysis in arabic news tweets using lexical, syntactic, and semantic features. Inf. Process. Manag. 53, 3 (2017), 640–652. DOI: 10.1016/j.ipm.2017.01.002
  • Houda Alberts. 2017. Author clustering with the aid of a simple distance measure—Notebook for PAN at CLEF 2017. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’17) .
  • Hanan Aldarmaki and Mona Diab. 2016. GWU NLP at SemEval-2016 Shared Task 1: Matrix factorization for crosslingual STS. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16) . 663–667.
  • Mahmoud Alewiwi, Cengiz Orencik, and Erkay Savas. 2016. Efficient top-k similarity document search utilizing distributed file systems and cosine similarity. Cluster Comput . 19, 1 (2016), 109–126. DOI: 10.1007/s10586-015-0506-0
  • Zakiy Firdaus Alfikri and Ayu Purwarianti. 2014. Detailed analysis of extrinsic plagiarism detection system using machine learning approach (naive bayes and svm). Indones. J. Electr. Eng. Comput. Sci. 12, 11 (2014), 7884–7894.
  • Muna Alsallal, Rahat Iqbal, Saad Amin, and Anne James. 2013. Intrinsic plagiarism detection using latent semantic indexing and stylometry. In Proceedings of the 2013 6th International Conference on Developments in eSystems Engineering . 145–150. DOI: 10.1109/DeSE.2013.34
  • Muna AlSallal, Rahat Iqbal, Saad Amin, Anne James, and Vasile Palade. 2016. An integrated machine learning approach for extrinsic plagiarism detection. In Proceedings of the 2016 9th International Conference on Developments in eSystems Engineering (DeSE’16) . 203–208. DOI: 10.1109/DeSE.2016.1
  • Muna AlSallal, Rahat Iqbal, Vasile Palade, Saad Amin, and Victor Chang. 2019. An integrated approach for intrinsic plagiarism detection. Fut. Gener. Comput. Syst. 96 (2019), 700–712. DOI: 10.1016/j.future.2017.11.023
  • Miguel A. Álvarez-Carmona, Marc Franco-Salvador, Esaú Villatoro-Tello, Manuel Montes-y-Gómez, Paolo Rosso, and Luis Villaseñor-Pineda. 2018. Semantically-informed distance and similarity measures for paraphrase plagiarism identification. J. Intell. Fuzzy Syst. 34, 5 (2018), 2983–2990.
  • Faisal Alvi, Mark Stevenson, and Paul Clough. 2014. Hashing and merging heuristics for text reuse detection. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) . 939–946.
  • Faisal Alvi, Mark Stevenson, and Paul Clough. 2017. Plagiarism detection in texts obfuscated with homoglyphs. In Advances in Information Retrieval . 669–675.
  • Salha Alzahrani. 2015. Arabic plagiarism detection using word correlation in N-Grams with K-Overlapping approach—Working notes for PAN-AraPlagDet at FIRE 2015. In Proceedings of the Forum for Information Retrieval Evaluation (FIRE’15) .
  • Salha M. Alzahrani, Naomie Salim, and Ajith Abraham. 2012. Understanding plagiarism linguistic patterns, textual features, and detection methods. IEEE Trans. Syst. Man, Cybern. C Appl. Rev. 42, 2 (2012), 133–149.
  • Habibollah Asghari, Salar Mohtaj, Omid Fatemi, Heshaam Faili, Paolo Rosso, and Martin Potthast. 2016. Algorithms and corpora for persian plagiarism detection. In Proceedings of the Forum for Information Retrieval Evaluation (FIRE’16) . 61.
  • Duygu Ataman, Jose G. C. De Souza, Marco Turchi, and Matteo Negri. 2016. FBK HLT-MT at SemEval-2016 Task 1: Cross-lingual semantic similarity measurement using quality estimation features and compositional bilingual word embeddings. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16) . 570–576.
  • Farzindar Atefeh and Wael Khreich. 2015. A survey of techniques for event detection in twitter. Comput. Intell. 31, 1 (2015), 132–164. DOI: 10.1111/coin.12017
  • Douglas Bagnall. 2015. Author identification using multi-headed recurrent neural networks—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Douglas Bagnall. 2016. Authorship clustering using multi-headed recurrent neural networks—Notebook for PAN at CLEF 2016. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’16) .
  • Marco Baroni, Georgiana Dinu, and Germán Kruszewski. 2014. Don't count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) . 238–247.
  • Alberto Barrón-Cedeño, Parth Gupta, and Paolo Rosso. 2013. Methods for cross-language plagiarism detection. Knowl.-Based Syst. 50 (2013), 211–217. DOI: 10.1016/j.knosys.2013.06.018
  • Alberto Barrón-Cedeño, Marta Vila, M. Antònia Martí, and Paolo Rosso. 2013. Plagiarism meets paraphrasing: insights for the next generation in automatic plagiarism detection. Comput. Linguist. 39, 4 (2013), 917–947. DOI: 10.1162/COLI_a_00153
  • Alberto Bartoli, Alex Dagri, Andrea De Lorenzo, Eric Medvet, and Fabiano Tarlao. 2015. An author verification approach based on differential features—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Jeffrey Beall. 2016. Best practices for scholarly authors in the age of predatory journals. Ann. R. Coll. Surg. Engl. 98, 2 (2016), 77–79.
  • Imene Bensalem, Imene Boukhalfa, Paolo Rosso, Lahsen Abouenour, Kareem Darwish, and Salim Chikhi. 2015. Overview of the AraPlagDet PAN@FIRE2015 shared task on arabic plagiarism detection. In Proceedings of the Forum for Information Retrieval Evaluation (FIRE’15) .
  • Imene Bensalem, Salim Chikhi, and Paolo Rosso. 2013. Building arabic corpora from wikisource. In Proceedings of the 2013 ACS International Conference on Computer Systems and Applications (AICCSA’13) . 1–2. DOI: 10.1109/AICCSA.2013.6616474
  • Imene Bensalem, Paolo Rosso, and Salim Chikhi. 2013. A new corpus for the evaluation of arabic intrinsic plagiarism detection. In Information Access Evaluation: Multilinguality, Multimodality, and Visualization . 53–58.
  • Imene Bensalem, Paolo Rosso, and Salim Chikhi. 2014. Intrinsic plagiarism detection using n-gram classes. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14) . 1459–1464.
  • Ergun Bicici. 2016. RTM at SemEval-2016 Task 1: Predicting semantic similarity with referential translation machines and related statistics. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16) . 758–764.
  • Victoria Bobicev. 2013. Authorship detection with PPM—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Hadj Ahmed Bouarara, Amine Rahmani, Reda Mohamed Hamou, and Abdelmalek Amine. 2014. Machine learning tool and meta-heuristic based on genetic algorithms for plagiarism detection over mail service. In Proceedings of the 2014 IEEE/ACIS 13th International Conference on Computer and Information Science (ICIS’14) . 157–162. DOI: 10.1109/ICIS.2014.6912125
  • Barry Bozeman, Daniel Fay, and Catherine P. Slade. 2013. Research collaboration in universities and academic entrepreneurship: The-state-of-the-art. J. Technol. Transf. 38, 1 (2013), 1–67. DOI: 10.1007/s10961-012-9281-8
  • Pearl Brereton, Barbara A. Kitchenham, David Budgen, Mark Turner, and Mohamed Khalil. 2007. Lessons from applying the systematic literature review process within the software engineering domain. J. Syst. Softw. 80, 4 (2007), 571–583. DOI: 10.1016/j.jss.2006.07.009
  • Tomáš Brychcín and Lukáš Svoboda. 2016. UWB at SemEval-2016 Task 1: Semantic textual similarity using lexical, syntactic, and semantic information. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16) . 588–594.
  • Davide Buscaldi, Joseph Le Roux, Jorge J. García Flores, and Adrian Popescu. 2013. LIPN-CORE: Semantic text similarity using n-grams, wordnet, syntactic analysis, ESA and information retrieval based features. In Proceedings of the 2nd Joint Conference on Lexical and Computational Semantics . 63.
  • Esteban Castillo, Ofelia Cervantes, Darnes Vilariño, David Pinto, and Saul León. 2014. Unsupervised method for the authorship identification task—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF ’ 14) .
  • Daniel Castro, Yaritza Adame, María Pelaez, and Rafael Muñoz. 2015. Authorship verification, combining linguistic features and different similarity functions—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Daniele Cerra, Mihai Datcu, and Peter Reinartz. 2014. Authorship analysis based on data compression. Pattern Recogn. Lett. 42 (2014), 79–84. DOI: 10.1016/j.patrec.2014.01.019
  • Zdenek Ceska. 2008. Plagiarism detection based on singular value decomposition. In Advances in Natural Language Processing . Springer, 108–119.
  • Man Yan Miranda Chong. 2013. A study on plagiarism detection and plagiarism direction identification using natural language processing techniques. Ph. D Thesis. University of Wolverhampton.
  • Hussain A. Chowdhury and Dhruba K. Bhattacharyya. 2016. Plagiarism: Taxonomy, tools and detection techniques. In Proceedings of the 19th National Convention on Knowledge, Library and Information Networking (NACLIN’16) .
  • Daniela Chudá, Jozef Lačný, Maroš Maršalek, Pavel Michalko, and Ján Súkeník. 2013. Plagiarism detection in slovak texts on the web. In Proceedings of the Conference on Plagiarism across Europe and Beyond . 249–260.
  • Guy J. Curtis and Joseph Clare. 2017. How prevalent is contract cheating and to what extent are students repeat offenders? J. Acad. Ethics 15, 2 (2017), 115–124. DOI: 10.1007/s10805-017-9278-x
  • Guy J. Curtis and Lucia Vardanega. 2016. Is plagiarism changing over time? A 10-year time-lag study with three points of measurement. High. Educ. Res. Dev. 35, 6 (2016), 1167–1179. DOI: 10.1080/07294360.2016.1161602
  • Michiel van Dam. 2013. A basic character n-gram approach to authorship verification—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Avishek Dan and Pushpak Bhattacharyya. 2013. Cfilt-core: Semantic textual similarity using universal networking language. In Proceedings of the 2nd Joint Conference on Lexical and Computational Semantics (*SEM’13) . 216–220.
  • Ali Daud, Wahab Khan, and Dunren Che. 2017. Urdu language processing: a survey. Artif. Intell. Rev. 47, 3 (2017), 279–311. DOI: 10.1007/s10462-016-9482-x
  • Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and Richard A. Harshman. 1990. Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41, 6 (1990), 391. DOI: 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9
  • T. Dharani and I. Laurence Aroquiaraj. 2013. A survey on content based image retrieval. In Proceedings of the 2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering . 485–490. DOI: 10.1109/ICPRIME.2013.6496719
  • Michal Ďuračík, Emil Kršák, and Patrik Hrkút. 2017. Current trends in source code analysis, plagiarism detection and issues of analysis big datasets. Proc. Eng. 192 (2017), 136–141. DOI: 10.1016/j.proeng.2017.06.024
  • Nava Ehsan and Azadeh Shakery. 2016. Candidate document retrieval for cross-lingual plagiarism detection using two-level proximity information. Inf. Process. Manag. 52, 6 (2016), 1004–1017. DOI: 10.1016/j.ipm.2016.04.006
  • Nava Ehsan and Azadeh Shakery. 2016. A pairwise document analysis approach for monolingual plagiarism detection. In Proceedings of the Forum for Information Retrieval Evaluation ( FIRE’16) . 145–148.
  • Nava Ehsan, Frank Wm. Tompa, and Azadeh Shakery. 2016. Using a dictionary and n-gram alignment to improve fine-grained cross-language plagiarism detection. In Proceedings of the 2016 ACM Symposium on Document Engineering (DocEng’16) . 59–68. DOI: 10.1145/2960811.2960817
  • Taiseer Abdalla Elfadil Eisa, Naomie Salim, and Salha Alzahrani. 2015. Existing plagiarism detection techniques: A systematic mapping of the scholarly literature. Online Inf. Rev. 39, 3 (2015), 383–400.
  • El-Sayed M. El-Alfy, Radwan E. Abdel-Aal, Wasfi G. Al-Khatib, and Faisal Alvi. 2015. Boosting paraphrase detection through textual similarity metrics with abductive networks. Appl. Soft Comput. 26, (2015), 444–453. DOI: 10.1016/j.asoc.2014.10.021
  • Victoria Elizalde. 2013. Using statistic and semantic analysis to detect plagiarism—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF ’ 13) .
  • Victoria Elizalde. 2014. Using noun phrases and tf-idf for plagiarized document retrieval—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Erik von Elm, Greta Poglia, Bernhard Walder, and Martin R. Tramèr. 2004. Different patterns of duplicate publication: An Analysis of articles used in systematic reviews. JAMA 291, 8 (2004), 974–980. DOI: 10.1001/jama.291.8.974
  • Fezeh Esteki and Faramarz Safi Esfahani. 2016. A plagiarism detection approach based on SVM for persian texts. In Proceedings of the Forum for Information Retrieval Evaluation (FIRE’16) . 149–153.
  • Asli Eyecioglu and Bill Keller. 2015. Twitter paraphrase identification with simple overlap features and SVMs. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval’15) . 64–69.
  • Jody Condit Fagan. 2017. An evidence-based review of academic web search engines, 2014–2016: Implications for librarians’ practice and research agenda. Inf. Technol. Libr. 36, 2 (2017), 7.
  • Christiane Fellbaum. 1998. WordNet: An Electronic Lexical Database (Language, Speech, and Communication) . The MIT Press.
  • Vanessa Wei Feng and Graeme Hirst. 2013. Authorship verification with entity coherence and other rich linguistic features—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Rafael Ferreira, George D. C. Cavalcanti, Fred Freitas, Rafael Dueire Lins, Steven J. Simske, and Marcelo Riss. 2018. Combining sentence similarities measures to identify paraphrases. Comput. Speech Lang. 47 (2018), 59–73. DOI: 10.1016/j.csl.2017.07.002
  • Jérémy Ferrero, Frederic Agnes, Laurent Besacier, and Didier Schwab. 2017. CompiLIG at SemEval-2017 Task 1: Cross-language plagiarism detection methods for semantic textual similarity. arXiv:1704.01346 .
  • Jérémy Ferrero, Frédéric Agnes, Laurent Besacier, and Didier Schwab. 2017. Using word embedding for cross-language plagiarism detection. arXiv:1702.03082 .
  • Jérémy Ferrero, Laurent Besacier, Didier Schwab, and Frédéric Agnes. 2017. Deep investigation of cross-language plagiarism detection methods. arXiv:1705.08828 .
  • Tomáš Foltýnek and Irene Glendinning. 2015. Impact of policies for plagiarism in higher education across europe: Results of the project. Acta Univ. Agric. Silvic. Mendel. Brun. 63, 1 (2015), 207–216.
  • Marc Franco-Salvador, Parth Gupta, and Paolo Rosso. 2013. Cross-language plagiarism detection using a multilingual semantic network. In Advances in Information Retrieval . 710–713.
  • Marc Franco-Salvador, Parth Gupta, and Paolo Rosso. 2014. Knowledge graphs as context models: Improving the detection of cross-language plagiarism with paraphrasing. In Bridging Between Information Retrieval and Databases: PROMISE Winter School 2013 , Nicola Ferro (ed.). Springer-Verlag, Berlin, 227–236. DOI: 10.1007/978-3-642-54798-0_12
  • Marc Franco-Salvador, Parth Gupta, Paolo Rosso, and Rafael E. Banchs. 2016. Cross-language plagiarism detection over continuous-space- and knowledge graph-based representations of language. Knowl.-Based Syst. 111 (2016), 87–99. DOI: 10.1016/j.knosys.2016.08.004
  • Marc Franco-Salvador, Paolo Rosso, and Manuel Montes-y-Gómez. 2016. A systematic study of knowledge graph analysis for cross-language plagiarism detection. Inf. Process. Manag. 52, 4 (2016), 550–570. DOI: 10.1016/j.ipm.2015.12.004
  • Marc Franco-Salvador, Paolo Rosso, and Roberto Navigli. 2014. A knowledge-based representation for cross-language document retrieval and categorization. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics . 414–423.
  • Jordan Fréry, Christine Largeron, and Mihaela Juganaru-Mathieu. 2014. UJM at CLEF in Author Identification—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Evgeniy Gabrilovich and Shaul Markovitch. 2007. Computing semantic relatedness using wikipedia-based explicit semantic analysis. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’07) . 1606–1611.
  • Jean-Gabriel Ganascia, Peirre Glaudes, and Andrea Del Lungo. 2014. Automatic detection of reuses and citations in literary texts. Lit. Linguist. Comput. 29, 3 (2014), 412–421. DOI: 10.1093/llc/fqu020
  • Yasmany García-Mondeja, Daniel Castro-Castro, Vania Lavielle-Castro, and Rafael Muñoz. 2017. Discovering author groups using a b-compact graph-based clustering—notebook for PAN at CLEF 2017. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’17) .
  • Urvashi Garg and Vishal Goyal. 2016. Maulik: A plagiarism detection tool for hindi documents. Ind. J. Sci. Technol. 9, 12 (2016).
  • Shahabeddin Geravand and Mahmood Ahmadi. 2014. An efficient and scalable plagiarism checking system using bloom filters. Comput. Electr. Eng. 40, 6 (2014), 1789–1800.
  • M. R. Ghaeini. 2013. Intrinsic author identification using modified weighted KNN—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Erfaneh Gharavi, Kayvan Bijari, Kiarash Zahirnia, and Hadi Veisi. 2016. A deep learning approach to persian plagiarism detection. In Proceedings of the Forum for Information Retrieval Evaluation (FIRE’16) . 154– 159.
  • Lee Gillam. 2013. Guess again and see if they line up: Surrey's runs at plagiarism detection—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Bela Gipp. 2014. Citation-based Plagiarism Detection -Detecting Disguised and Cross-language Plagiarism Using Citation Pattern Analysis . Springer Vieweg Research. Retrieved from http://www.springer.com/978-3-658-06393-1 .
  • Bela Gipp and Norman Meuschke. 2011. Citation pattern matching algorithms for citation-based plagiarism detection: Greedy citation tiling, citation chunking and longest common citation sequence. In Proceedings of the 11th ACM Symposium on Document Engineering . 249–258. DOI: 10.1145/2034691.2034741
  • Bela Gipp, Norman Meuschke, and Joeran Beel. 2011. Comparative evaluation of text- and citation-based plagiarism detection approaches using guttenplag. In Proceedings of 11th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL’11) . 255–258. DOI: 10.1145/1998076.1998124
  • Bela Gipp, Norman Meuschke, and Corinna Breitinger. 2014. Citation‐based plagiarism detection: Practicability on a large‐scale scientific corpus. J. Assoc. Inf. Sci. Technol. 65, 8 (2014), 1527–1540. DOI: 10.1002/asi.23228
  • Bela Gipp, Norman Meuschke, Corinna Breitinger, Jim Pitman, and Andreas Nürnberger. 2014. Web-based demonstration of semantic similarity detection using citation pattern visualization for a cross language plagiarism case. In Proceedings of the International Conference on Enterprise Information Systems (ICEIS’14) . 677–683. DOI: 10.5220/0004985406770683
  • Goran Glavaš, Marc Franco-Salvador, Simone P. Ponzetto, and Paolo Rosso. 2018. A resource-light method for cross-lingual semantic textual similarity. Knowl.-Based Syst. 143 (2018), 1–9. DOI: 10.1016/j.knosys.2017.11.041
  • Lila Gleitman and Anna Papafragou. 2005. Language and thought. In The Cambridge Handbook of Thinking and Reasoning , Keith J. Holyoak and Robert G. Morrison (eds.). Cambridge University Press, 633– 661.
  • Demetrios G. Glinos. 2014. A hybrid architecture for plagiarism detection—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) . 958–965.
  • Helena Gómez-Adorno, Yuridiana Alemán, Darnes Vilariño Ayala, Miguel A Sanchez-Perez, David Pinto, and Grigori Sidorov. 2017. Author clustering using hierarchical clustering analysis—Notebook for PAN at CLEF 2017. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’17) .
  • Helena Gómez-Adorno, Grigori Sidorov, David Pinto, and Ilia Markov. 2015. A graph based authorship identification approach—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Philipp Gross and Pashutan Modaresi. 2014. Plagiarism alignment detection by merging context seeds—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Deepa Gupta, Vani Kanjirangat, and L. M. Leema. 2016. Plagiarism detection in text documents using sentence bounded stop word n-grams. J. Eng. Sci. Technol . 11, 10 (2016), 1403–1420.
  • Deepa Gupta, Vani Kanjirangat, and Charan Kamal Singh. 2014. Using natural language processing techniques and fuzzy-semantic similarity for automatic external plagiarism detection. In Proceedings of the 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI’14) . 2694–2699. DOI: 10.1109/ICACCI.2014.6968314
  • Josue Gutierrez, Jose Casillas, Paola Ledesma, Gibran Fuentes, and Ivan Meza. 2015. Homotopy based classification for author verification task—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Yaakov HaCohen-Kerner and Aharon Tayeb. 2017. Rapid detection of similar peer-reviewed scientific papers via constant number of randomized fingerprints. Inf. Process. Manag. 53, 1 (2017), 70–86. DOI: 10.1016/j.ipm.2016.06.007
  • Matthias Hagen, Martin Potthast, and Benno Stein. 2015. Source retrieval for plagiarism detection from large web corpora: Recent approaches. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Osama Haggag and Samhaa Smhaa El-Beltagy. 2013. Plagiarism candidate retrieval using selective query formulation and discriminative query scoring. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Oren Halvani and Lukas Graner. 2017. Author clustering based on compression-based dissimilarity scores—notebook for PAN at CLEF 2017. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’17) .
  • Oren Halvani and Martin Steinebach. 2014. VEBAV - A simple, scalable and fast authorship verification scheme—notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Oren Halvani, Martin Steinebach, and Ralf Zimmermann. 2013. Authorship verification via k-nearest neighbor estimation—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Oren Halvani and Christian Winter. 2015. A generic authorship verification scheme based on equal error rates—notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Christian Hänig, Robert Remus, and Xose De La Puente. 2015. Exb themis: Extensive feature extraction from word alignments for semantic textual similarity. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval’15) . 264–268.
  • Sarah Harvey. 2014. Author verification using PPM with parts of speech tagging—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Hua He, John Wieting, Kevin Gimpel, Jinfeng Rao, and Jimmy Lin. 2016. UMD-TTIC-UW at SemEval-2016 Task 1: Attention-based multi-perspective convolutional neural networks for textual similarity measurement. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16) . 1103–1108.
  • Oumaima Hourrane and El Habib Benlahmar. 2017. Survey of plagiarism detection approaches and big data techniques related to plagiarism candidate retrieval. In Proceedings of the 2nd International Conference on Big Data, Cloud and Applications (BDCA’17) . 15:1–15:6. DOI: 10.1145/3090354.3090369
  • Manuela Hürlimann, Benno Weck, Esther van denBerg, Simon Šuster, and Malvina Nissim. 2015. GLAD: Groningen lightweight authorship detection—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Syed Fawad Hussain and Asif Suryani. 2015. On retrieving intelligently plagiarized documents using semantic similarity. Eng. Appl. Artif. Intell. 45 (2015), 246–258. DOI: 10.1016/j.engappai.2015.07.011
  • Ashraf S. Hussein. 2015. A plagiarism detection system for arabic documents. In Intelligent Systems 2014 , D. Filev, J. Jabłkowski, J. Kacprzyk, M. Krawczak, I. Popchev, L. Rutkowski, V. Sgurev, E. Sotirova, P. Szynkarczyk, and S. Zadrozny (Eds.). Springer International Publishing, 541–552.
  • Ashraf S. Hussein. 2015. Arabic document similarity analysis using n-grams and singular value decomposition. In Proceedings of the 2015 IEEE 9th International Conference on Research Challenges in Information Science (RCIS’15) . 445–455. DOI: 10.1109/RCIS.2015.7128906
  • Radu Tudor Ionescu, Marius Popescu, and Aoife Cahill. 2014. Can characters reveal your native language? A language-independent approach to native language identification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14) . 1363–1373.
  • Hideo Itoh. 2016. RICOH at SemEval-2016 Task 1: IR-based semantic textual similarity estimation. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16) . 691–695.
  • Magdalena Jankowska, Vlado Kešelj, and and Evangelos Milios. 2013. Proximity based one-class classification with common n-gram dissimilarity for authorship verification task—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Magdalena Jankowska, Vlado Kešelj, and Evangelos Milios. 2014. Ensembles of proximity-based one-class classifiers for author verification—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Arun Jayapal and Binayak Goswami. 2013. Vector space model and overlap metric for author identification—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Zhuoren Jiang, Miao Chen, and Xiaozhong Liu. 2014. Semantic annotation with rescoredESA: Rescoring concept features generated from explicit semantic analysis. In Proceedings of the 7th International Workshop on Exploiting Semantic Annotations in Information Retrieval (ESAIR’14) . 25–27. DOI: 10.1145/2663712.2666192
  • M. A. C. Jiffriya, M. A. C. Akmal Jahan, and Roshan G. Ragel. 2014. Plagiarism detection on electronic text based assignments using vector space model. In Proceedings of the 7th International Conference on Information and Automation for Sustainability . 1–5. DOI: 10.1109/ICIAFS.2014.7069593
  • M. A. C. Jiffriya, M. A. C. Akmal Jahan, Roshan G. Ragel, and Sampath Deegalla. 2013. AntiPlag: Plagiarism detection on electronic submissions of text based assignments. In Proceedings of the 2013 IEEE 8th International Conference on Industrial and Information Systems . 376–380. DOI: 10.1109/ICIInfS.2013.6732013
  • Patrick Juola. 2017. Detecting contract cheating via stylometric methods. In Proceedings on the Conference on Plagiarism across Europe and Beyond . 187–198. Retrieved from https://plagiarism.pefka.mendelu.cz/files/proceedings17.pdf .
  • Patrick Juola and Efstathios Stamatatos. 2013. Overview of the author identification task at PAN 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Rune Borge Kalleberg. 2015. Towards detecting textual plagiarism using machine learning methods. University of Agder. Retrieved from https://brage.bibsys.no/xmlui/bitstream/handle/11250/299460/RuneBorgeKalleberg.pdf?sequence=1 .
  • Rafael-Michael Karampatsis. 2015. CDTDS: Predicting paraphrases in twitter via support vector regression. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval’15) . 75–79.
  • Daniel Karaś, Martyna Śpiewak, and Piotr Sobecki. 2017. OPI-JSA at CLEF 2017: Author clustering and style breach detection—Notebook for PAN at CLEF 2017. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’17) .
  • Roman Kern. 2013. Grammar checker features for author identification and author profiling—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Imtiaz H. Khan, Muazzam A. Siddiqui, Kamal M. Jambi, Muhammad Imran, and Abobakr A. Bagais. 2014. Query optimization in Arabic plagiarism detection: An empirical study. Int. J. Intell. Syst. Appl. 7, 1 (2014), 73.
  • Jamal Ahmad Khan. 2017. Style breach detection: An unsupervised detection model—Notebook for PAN at CLEF 2017. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’17) .
  • Mahmoud Khonji and Youssef Iraqi. 2014. A Slightly-modified GI-based Author-verifier with Lots of Features (ASGALF)—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Khadijeh Khoshnavataher, Vahid Zarrabi, Salar Mohtaj, and Habibollah Asghari. 2015. Developing monolingual persian corpus for extrinsic plagiarism detection using artificial obfuscation—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Barbara Kitchenham. 2004. Procedures for performing systematic reviews. Keele University Technical Report TR/SE-0401. Keele University. 33.
  • Barbara Kitchenham, O. Pearl Brereton, David Budgen, Mark Turner, John Bailey, and Stephen Linkman. 2009. Systematic literature reviews in software engineering—A systematic literature review. Inf. Softw. Technol. 51, 1 (2009), 7–15. DOI: 10.1016/j.infsof.2008.09.009
  • Mirco Kocher. 2016. UniNE at CLEF 2016: Author clustering—Notebook for PAN at CLEF 2016. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’16) .
  • Mirco Kocher and Jacques Savoy. 2015. UniNE at CLEF 2015: Author identification—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Mirco Kocher and Jacques Savoy. 2017. UniNE at CLEF 2017: Author clustering—Notebook for PAN at CLEF 2017. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’17) .
  • Leilei Kong, Yong Han, Zhongyuan Han, Haihao Yu, Qibo Wang, Tinglei Zhang, and Haoliang Qi. 2014. Source retrieval based on learning to rank and text alignment based on plagiarism type recognition for plagiarism detection—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Leilei Kong, Zhimao Lu, Yong Han, Haoliang Qi, Zhongyuan Han, Qibo Wang, Zhenyuan Hao, and Jing Zhang. 2015. Source retrieval and text alignment corpus construction for plagiarism detection—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Leilei Kong, Zhimao Lu, Haoliang Qi, and Zhongyuan Han. 2014. Detecting high obfuscation plagiarism: Exploring multi-features fusion via machine learning. Int. J. u-and e-Serv. Sci. Technol. 7, 4 (2014), 385–396.
  • Leilei Kong, Haoliang Qi, Cuixia Du, Mingxing Wang, and Zhongyuan Han. 2013. Approaches for source retrieval and text alignment of plagiarism detection—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Moshe Koppel and Yaron Winter. 2014. Determining if two documents are written by the same author. J. Assoc. Inf. Sci. Technol. 65, 1 (2014), 178–187.
  • Niraj Kumar. 2014. A graph based automatic plagiarism detection technique to handle artificial word reordering and paraphrasing. In Computational Linguistics and Intelligent Text Processing . 481–494.
  • Marcin Kuta and Jacek Kitowski. 2014. Optimisation of character n-gram profiles method for intrinsic plagiarism detection. In Artificial Intelligence and Soft Computing . 500–511.
  • Mikhail Kuznetsov, Anastasia Motrenko, Rita Kuznetsova, and Vadim Strijov. 2016. Methods for intrinsic plagiarism detection and author diarization. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’16 ) . 912–919. Retrieved from http://ceur-ws.org/Vol-1609/ .
  • Robert Layton, Paul Watters, and Richard Dazeley. 2013. Local n-grams for author identification—notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Paola Ledesma, Gibran Fuentes, Gabriela Jasso, Angel Toledo, and and Ivan Meza. 2013. Distance learning for author verification—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Taemin Lee, Jeongmin Chae, Kinam Park, and Soonyoung Jung. 2013. CopyCaptor: Plagiarized source retrieval system using global word frequency and local feedback—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Chi-kiu Lo, Cyril Goutte, and Michel Simard. 2016. CNRC at SemEeval-2016 task 1: Experiments in crosslingual semantic textual similarity. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16) . 668–673.
  • Tara C. Long, Mounir Errami, Angela C. George, Zhaohui Sun, and Harold R. Garner. 2009. Responding to possible plagiarism. Science 323, 5919 (2009), 1293–1294. DOI: 10.1126/science.1167408
  • Ahmed Magooda, Ashraf Y. Mahgoub, Mohsen Rashwan, Magda B. Fayek, and Hazem Raafat. 2015. RDI System for extrinsic plagiarism detection (RDI_RED)—Working Notes for PAN-AraPlagDet at FIRE 2015. In Proceedings of the Forum for Information Retrieval Evaluation (FIRE’15) .
  • Peyman Mahdavi, Zahra Siadati, and Farzin Yaghmaee. 2014. Automatic external persian plagiarism detection using vector space model. In Proceedings of the 2014 4th International eConference on Computer and Knowledge Engineering (ICCKE’14) . 697–702.
  • Ashraf Y. Mahgoub, Ahmed Magooda, Mohsen Rashwan, Magda B. Fayek, and Hazem Raafat. 2015. RDI System for intrinsic plagiarism detection (RDI_RID)—Working Notes for PAN-AraPlagDet at FIRE 2015. In Proceedings of the Forum for Information Retrieval Evaluation (FIRE’15) .
  • Promita Maitra, Souvick Ghosh, and Dipankar Das. 2015. Authorship verification - an approach based on random forest—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Cristhian Mayor, Josue Gutierrez, Angel Toledo, Rodrigo Martinez, Paola Ledesma, Gibran Fuentes, and and Ivan Meza. 2014. A single author style representation for the author verification task—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Norman Meuschke and Bela Gipp. 2013. State-of-the-art in detecting academic plagiarism. Int. J. Educ. Integr. 9, 1 (2013), 50–71.
  • Norman Meuschke and Bela Gipp. 2014. Reducing computational effort for plagiarism detection by using citation characteristics to limit retrieval space. In Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries . 197–200.
  • Norman Meuschke, Christopher Gondek, Daniel Seebacher, Corinna Breitinger, Daniel A. Keim, and Bela Gipp. 2018. An adaptive image-based plagiarism detection approach. In Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL’18) . DOI: 10.1145/3197026.3197042
  • Norman Meuschke, Moritz Schubotz, Felix Hamborg, Tomáš Skopal, and Bela Gipp. 2017. Analyzing mathematical content to detect academic plagiarism. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM’17) . 2211–2214. DOI: 10.1145/3132847.3133144
  • Norman Meuschke, Nicolas Siebeck, Moritz Schubotz, and Bela Gipp. 2017. Analyzing semantic concept patterns to detect academic plagiarism. In Proceedings of the 6th International Workshop on Mining Scientific Publications (WOSP’17) . 46–53. DOI: 10.1145/3127526.3127535
  • Norman Meuschke, Vincent Stange, Moritz Schubotz, Michael Kramer, and Bela Gipp. 2019. Improving academic plagiarism detection for STEM documents by analyzing mathematical content and citations. In Proceeedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL’19) .
  • Pashutan Modaresi and Philipp Gross. 2014. A language independent author verifier using fuzzy c-means clustering—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • H. F. Moed, W. J. M. Burger, J. G. Frankfort, and A. F. J. Van Raan. 1985. The application of bibliometric indicators: Important field- and time-dependent factors to be considered. Scientometrics 8, 3–4 (1985), 177–203. DOI: 10.1007/BF02016935
  • Majid Mohebbi and Alireza Talebpour. 2016. Texts semantic similarity detection based graph approach. Int. Arab J. Inf. Technol. 13, 2 (2016), 246–251.
  • Mozhgan Momtaz, Kayvan Bijari, Mostafa Salehi, and Hadi Veisi. 2016. Graph-based approach to text alignment for plagiarism detection in persian documents. In Proceedings of the Forum for Information Retrieval Evaluation (FIRE’16) . 176–179.
  • Erwan Moreau, Arun Jayapal, and Carl Vogel. 2014. Author verification: exploring a large set of parameters using a genetic algorithm—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Erwan Moreau, Arun Jayapal, Gerard Lynch, and Carl Vogel. 2015. Author verification: Basic stacked generalization applied to predictions from a set of heterogeneous learners—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Erwan Moreau and Carl Vogel. 2013. Style-based distance features for author verification—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Maxim Mozgovoy, Tuomo Kakkonen, and Georgina Cosma. 2010. Automatic student plagiarism detection: Future perspectives. J. Educ. Comput. Res. 43, 4 (2010), 511–531.
  • Aibek Musaev, De Wang, Saajan Shridhar, and Calton Pu. 2015. Fast text classification using randomized explicit semantic analysis. In Proceedings of the 2015 IEEE International Conference on Information Reuse and Integration . 364–371. DOI: 10.1109/IRI.2015.62
  • El Moatez Billah Nagoudi, Ahmed Khorsi, Hadda Cherroun, and Didier Schwab. 2018. 2L-APD: A Two-Level Plagiarism Detection System for Arabic Documents. Cybern. Inf. Technol. 18, 1 (2018), 124–138. DOI: 10.2478/cait-2018-0011
  • Rao Muhammad Adeel Nawab, Mark Stevenson, and Paul Clough. 2017. An IR-based approach utilizing query expansion for plagiarism detection in MEDLINE. IEEE/ACM Trans. Comput. Biol. Bioinforma. 14, 4 (2017), 796–804. DOI: 10.1109/TCBB.2016.2542803
  • Philip M. Newton. 2018. How common is commercial contract cheating in higher education and is it increasing? A Systematic Review. Front. Educ. 3 (2018). DOI: 10.3389/feduc.2018.00067
  • Le Thanh Nguyen, Nguyen Xuan Toan, and Dinh Dien. 2016. Vietnamese plagiarism detection method. In Proceedings of the 7th Symposium on Information and Communication Technology (SoICT’16) . 44–51. DOI: 10.1145/3011077.3011109
  • Gabriel Oberreuter and Juan D. VeláSquez. 2013. Text mining applied to plagiarism detection: The use of words for detecting deviations in the writing style. Exp. Syst. Appl. 40, 9 (2013), 3756–3763.
  • Milan Ojsteršek, Janez Brezovnik, Mojca Kotar, Marko Ferme, Goran Hrovat, Albin Bregant, and Mladen Borovič. 2014. Establishing of a slovenian open access infrastructure: A technical point of view. Program 48, 4 (2014), 394–412. DOI: 10.1108/PROG-02-2014-0005
  • Adeva Oktoveri, Agung Toto Wibowo, and Ari Moesriami Barmawi. 2014. Non-relevant document reduction in anti-plagiarism using asymmetric similarity and AVL tree index. In Proceedings of the 2014 5th International Conference on Intelligent and Advanced Systems (ICIAS’14) . 1–5. DOI: 10.1109/ICIAS.2014.6869547
  • Ahmed Hamza Osman and Naomie Salim. 2013. An improved semantic plagiarism detection scheme based on Chi-squared automatic interaction detection. In Proceedings of the 2013 International Conference on Computing, Electrical and Electronic Engineering (ICCEEE’13) . 640–647. DOI: 10.1109/ICCEEE.2013.6634015
  • Caleb Owens and Fiona A. White. 2013. A 5‐year systematic strategy to reduce plagiarism among first‐year psychology university students. Aust. J. Psychol. 65, 1 (2013), 14–21. DOI: 10.1111/ajpy.12005
  • María Leonor Pacheco, Kelwin Fernandes, and Aldo Porco. 2015. Random forest with increased generalization: A universal background approach for authorship verification—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Yurii Palkovskii and Alexei Belov. 2013. Using hybrid similarity methods for plagiarism detection—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Yurii Palkovskii and Alexei Belov. 2014. Developing high-resolution universal multi-type n-gram plagiarism detector. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) . 984–989.
  • Guy Paré, Marie-Claude Trudel, Mirou Jaana, and Spyros Kitsiou. 2015. Synthesizing information systems knowledge: A typology of literature reviews. Inf. Manag. 52, 2 (2015), 183–199. DOI: 10.1016/j.im.2014.08.008
  • Merin Paul and Sangeetha Jamal. 2015. An Improved SRL based plagiarism detection technique using sentence ranking. Procedia Comput. Sci. 46 (2015), 223–230. DOI: 10.1016/j.procs.2015.02.015
  • Ellie Pavlick, Pushpendre Rastogi, Juri Ganitkevitch, Benjamin Van Durme, and Chris Callison-Burch. 2015. PPDB 2.0: Better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing . 425–430.
  • Jian Peng, Kim-Kwang Raymond Choo, and Helen Ashman. 2016. Bit-level n-gram based forensic authorship analysis on social media: Identifying individuals from linguistic profiles. J. Netw. Comput. Appl. 70 (2016), 171–182. DOI: 10.1016/j.jnca.2016.04.001
  • Solange de L. Pertile, Viviane P. Moreira, and Paolo Rosso. 2015. Comparing and combining content‐ and citation‐based approaches for plagiarism detection. J. Assoc. Inf. Sci. Technol. 67, 10 (2015), 2511–2526. DOI: 10.1002/asi.23593
  • Solange de L. Pertile, Paolo Rosso, and Viviane P. Moreira. 2013. Counting co-occurrences in citations to identify plagiarised text fragments. In Proceedings of the International Conference of the Cross-Language Evaluation Forum for European Languages . 150–154.
  • Timo Petmanson. 2013. Authorship identification using correlations of frequent features—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Mohammad Taher Pilehvar, David Jurgens, and Roberto Navigli. 2013. Align, disambiguate and walk: A unified approach for measuring semantic similarity. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) . 1341–1351.
  • Gaspar Pizarro V. and Juan D. Velásquez. 2017. Docode 5: Building a real-world plagiarism detection system. Eng. Appl. Artif. Intell. 64 (Jun. 2017), 261–271. DOI: 10.1016/j.engappai.2017.06.001
  • Juan-Pablo Posadas-Durán, Grigori Sidorov, Ildar Batyrshin, and Elibeth Mirasol-Meléndez. 2015. Author verification using syntactic n-grams—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Martin Potthast, Tim Gollub, Matthias Hagen, Martin Tippmann, Johannes Kiesel, Paolo Rosso, Efstathios Stamatatos, and Benno Stein. 2013. Overview of the 5th International Competition on Plagiarism Detection. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Martin Potthast, Matthias Hagen, Anna Beyer, Matthias Busse, Martin Tippmann, Paolo Rosso, and Benno Stein. 2014. Overview of the 6th International Competition on Plagiarism Detection. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Martin Potthast, Matthias Hagen, and Benno Stein. 2016. Author Obfuscation: Attacking the state of the art in authorship verification. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’16) .
  • Martin Potthast, Francisco Rangel, Michael Tschuggnall, Efstathios Stamatatos, Paolo Rosso, and Benno Stein. 2017. Overview of PAN’17: Author identification, author profiling, and author obfuscation. In Proceedings of the 7th International Conference of the CLEF Initiative . DOI: 10.1007/978-3-319-65813-1_25
  • Martin Potthast, Benno Stein, Alberto Barrón-Cedeño, and Paolo Rosso. 2010. An Evaluation framework for plagiarism detection. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters (COLING’10) . 997–1005.
  • Martin Potthast, Benno Stein, Andreas Eiselt, Alberto Barrón-Cedeño, and Paolo Rosso. 2009. Overview of the 1st international competition on plagiarism detection. In Proceedings of the SEPLN 09 Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN’09) . 1–9.
  • Amit Prakash and Sujan Kumar Saha. 2014. Experiments on document chunking and query formation for plagiarism source retrieval—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Piotr Przybyła, Nhung T. H. Nguyen, Matthew Shardlow, Georgios Kontonatsios, and Sophia Ananiadou. 2016. NaCTeM at SemEval-2016 Task 1: Inferring sentence-level semantic similarity from an ensemble of complementary lexical and sentence-level features. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16) . 614–620.
  • Javad Rafiei, Salar Mohtaj, Vahid Zarrabi, and Habibollah Asghari. 2015. Source retrieval plagiarism detection based on noun phrase and keyword phrase extraction—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Shima Rakian, Esfahani Faramarz Safi, and Hamid Rastegari. 2015. A Persian fuzzy plagiarism detection approach. J. Inf. Syst. Telecommun. 3, 3 (2015), 182–190.
  • N Riya Ravi and Deepa Gupta. 2015. Efficient paragraph based chunking and download filtering for plagiarism source retrieval—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • N. Riya Ravi, Vani Kanjirangat, and Deepa Gupta. 2016. Exploration of fuzzy C means clustering algorithm in external plagiarism detection system. In Intelligent Systems Technologies and Applications . Springer, 127–138.
  • Andi Rexha, Stefan Klampfl, Mark Kröll, and Roman Kern. 2015. Towards authorship attribution for bibliometrics using stylometric features. In Proceedings of the Conference on Computational Linguistics and Bibliometrics co-located with the International Conference on Scientometrics and Informetrics (CLBib@ ISSI) . 44–49.
  • Diego Antonio Rodríguez Torrejón and José Manuel Martín Ramos. 2014. CoReMo 2.3 Plagiarism detector text alignment module—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Paolo Rosso, Francisco Rangel, Martin Potthast, Efstathios Stamatatos, Michael Tschuggnall, and Benno Stein. 2016. Overview of PAN’16. In Experimental IR Meets Multilinguality, Multimodality, and Interaction . 332–350.
  • Frantz Rowe. 2014. What literature review is not: Diversity, boundaries and recommendations. Eur. J. Inf. Syst. 23, 3 (2014), 241–255. DOI: 10.1057/ejis.2014.7
  • Barbara Rychalska, Katarzyna Pakulska, Krystyna Chodorowska, Wojciech Walczak, and Piotr Andruszkiewicz. 2016. Samsung Poland NLP Team at SemEval-2016 Task 1: Necessity for diversity; combining recursive autoencoders, WordNet and ensemble methods to measure semantic similarity. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16) . 602–608.
  • Kamil Safin and Rita Kuznetsova. 2017. Style breach detection with neural sentence embeddings—Notebook for PAN at CLEF 2017. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’17) .
  • Anuj Saini and Aayushi Verma. 2016. Anuj@ DPIL-FIRE2016: a novel paraphrase detection method in hindi language using machine learning. In Proceedings of the Forum for Information Retrieval Evaluation . 141–152.
  • Miguel A. Sanchez-Perez, Alexander Gelbukh, and Grigori Sidorov. 2015. Dynamically adjustable approach through obfuscation type recognition—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Miguel A Sanchez-Perez, Grigori Sidorov, and Alexander F Gelbukh. 2014. A winning approach to text alignment for text reuse detection at PAN 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) . 1004–1011.
  • Fernando Sánchez-Vega, Esaú Villatoro-Tello, Manuel Montes-y-Gómez, Luis Villaseñor-Pineda, and Paolo Rosso. 2013. Determining and characterizing the reused text for plagiarism detection. J. Assoc. Inf. Sci. Technol. 65, 5 (2013), 1804–1813. DOI: 10.1016/j.eswa.2012.09.021
  • Yunita Sari and Mark Stevenson. 2015. A machine learning-based intrinsic method for cross-topic and cross-genre authorship verification—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Yunita Sari and Mark Stevenson. 2016. Exploring word embeddings and character n-grams for author clustering—Notebook for PAN at CLEF 2016. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’16) .
  • Satyam, Anand, Arnav Kumar Dawn, and and Sujan Kumar Saha. 2014. Statistical analysis approach to author identification using latent semantic analysis—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Taneeya Satyapanich, Hang Gao, and Tim Finin. 2015. Ebiquity: Paraphrase and semantic similarity in twitter using skipgrams. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval’15) . 51–55.
  • Andreas Schmidt, Reinhold Becker, Daniel Kimmig, Robert Senger, and Steffen Scholz. 2014. A concept for plagiarism detection based on compressed bitmaps. In Procceedings of the 6th International Conference on Advances in Databases, Knowledge, and Data Applications . 30–34.
  • Shachar Seidman. 2013. Authorship verification using the impostors method—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF13) .
  • Prasha Shrestha, Suraj Maharjan, and Thamar Solorio. 2014. Machine translation evaluation metric for text alignment—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Prasha Shrestha and Thamar Solorio. 2013. Using a variety of n-grams for the detection of different kinds of plagiarism. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Muazzam Ahmed Siddiqui, Imtiaz Hussain Khan, Kamal Mansoor Jambi, Salma Omar Elhaj, and Abobakr Bagais. 2014. Developing an arabic plagiarism detection corpus. Comput. Sci. Inf. Technol. 4, 2014 (2014), 261–269. DOI: 10.5121/csit.2014.41221
  • L. Sindhu and Sumam Mary Idicula. 2015. Fingerprinting based detection system for identifying plagiarism in malayalam text documents. In Proceedings of the 2015 International Conference on Computing and Network Communications (CoCoNet’15) . 553–558. DOI: 10.1109/CoCoNet.2015.7411242
  • Abdul Sittar, Hafiz Rizwan Iqbal, and Rao Muhammad Adeel Nawab. 2016. Author diarization using cluster-distance approach. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’16) . 1000–1007.
  • Sidik Soleman and Ayu Purwarianti. 2014. Experiments on the Indonesian plagiarism detection using latent semantic analysis. In Proceedings of the 2014 2nd International Conference on Information and Communication Technology (ICoICT’14) . 413–418. DOI: 10.1109/ICoICT.2014.6914098
  • Hussein Soori, Michal Prilepok, Jan Platos, Eshetie Berhan, and Vaclav Snasel. 2014. Text similarity based on data compression in Arabic. In AETA 2013: Recent Advances in Electrical Engineering and Related Sciences . Springer, 211–220.
  • Efstathios Stamatatos, Walter Daelemans, Ben Verhoeven, Martin Potthast, Benno Stein, Patrick Juola, Miguel A. Sanchez-Perez, and Alberto Barrón-Cedeño. 2014. Overview of the author identification task at PAN 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Efstathios Stamatatos, Martin Potthast, Francisco Rangel, Paolo Rosso, and Benno Stein. 2015. Overview of the PAN/CLEF 2015 Evaluation Lab. In Experimental IR Meets Multilinguality, Multimodality, and Interaction: Proceedings of the 6th International Conference of the CLEF Initiative (CLEF’15) . 518–538. DOI: 10.1007/978-3-319-24027-5_49
  • Efstathios Stamatatos, Walter Daelemans Ben Verhoeven, Patrick Juola, Aurelio López-López, Martin Potthast, and Benno Stein. 2015. Overview of the author identification task at PAN 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Benno Stein, Sven zu Eissen, and Martin Potthast. 2007. Strategies for retrieving plagiarized documents. In Proceedings of the 30th Annual International ACM SIGIR Conference . 825–826. DOI: 10.1145/1277741.1277928
  • Imam Much Ibnu Subroto and Ali Selamat. 2014. Plagiarism detection through internet using hybrid artificial neural network and support vectors machine. Telecommun. Comput. Electron. Control. 12, 1 (2014), 209–218.
  • Šimon Suchomel and Michal Brandejs. 2014. Heterogeneous queries for synoptic and phrasal search—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Šimon Suchomel and Michal Brandejs. 2015. Improving synoptic querying for source retrieval. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Šimon Suchomel, Jan Kasprzak, and Michal Brandejs. 2013. Diverse queries and feature type selection for plagiarism discovery—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • M. D. Arafat Sultan, Steven Bethard, and Tamara Sumner. 2014. DLS@CU: Sentence similarity from word alignment. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval’14) . 241–246.
  • M. D. Arafat Sultan, Steven Bethard, and Tamara Sumner. 2014. Back to basics for monolingual alignment: Exploiting word similarity and contextual evidence. Trans. Assoc. Comput. Linguist. 2 (2014), 219–230.
  • M. D. Arafat Sultan, Steven Bethard, and Tamara Sumner. 2015. DLS@CU: Sentence similarity from word alignment and semantic vector composition. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval’15) . 148–153.
  • Junfeng Tian and Man Lan. 2016. ECNU at SemEval-2016 Task 1: Leveraging word embedding from macro and micro views to boost performance for semantic textual similarity. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16) . 621–627.
  • Diego A. Rodríguez Torrejón and José Manuel Martín Ramos. 2013. Text alignment module in CoReMo 2.1 plagiarism detector. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Michael Tschuggnall and Günther Specht. 2013. Detecting plagiarism in text documents through grammar-analysis of authors. Datenbanksysteme für Business, Technologie und Web (BTW) 2028 , Volker Markl, Gunter Saake, Kai-Uwe Sattler, Gregor Hackenbroich, Bernhard Mitschang, Theo Härder, and Veit Köppen (Eds.). Gesellschaft für Informatik e.V., 241--259.
  • Michael Tschuggnall and Günther Specht. 2013. Using grammar-profiles to intrinsically expose plagiarism in text documents. In Natural Language Processing and Information Systems . 297–302.
  • Michael Tschuggnall, Efstathios Stamatatos, Ben Verhoeven, Walter Daelemans, Günther Specht, Benno Stein, and Martin Potthast. 2017. Overview of the author identification task at PAN-2017: Style breach detection and author clustering. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’17) .
  • Alper Kursat Uysal and Serkan Gunal. 2014. Text classification using genetic algorithm oriented latent semantic features. Exp. Syst. Appl. 41, 13 (2014), 5938–5947. DOI: 10.1016/j.eswa.2014.03.041
  • Vani Kanjirangat and Deepa Gupta. 2014. Using K-means cluster based techniques in external plagiarism detection. In Proceedings of the 2014 International Conference on Contemporary Computing and Informatics (IC3I’14) . 1268–1273. DOI: 10.1109/IC3I.2014.7019659
  • Vani Kanjirangat and Deepa Gupta. 2015. Investigating the impact of combined similarity metrics and POS tagging in extrinsic text plagiarism detection system. In Proceedings of the 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI’15) . 1578–1584. DOI: 10.1109/ICACCI.2015.7275838
  • Vani Kanjirangat and Deepa Gupta. 2016. Study on extrinsic text plagiarism detection techniques and tools. J. Eng. Sci. Technol. Rev. 9, 5 (2016), 9–23.
  • Vani Kanjirangat and Deepa Gupta. 2017. Detection of idea plagiarism using syntax–semantic concept extractions with genetic algorithm. Exp. Syst. Appl. 73 (2017), 11–26. DOI: 10.1016/j.eswa.2016.12.022
  • Vani Kanjirangat and Deepa Gupta. 2017. Identifying document-level text plagiarism: A two-phase approach. J. Eng. Sci. Technol. 12, 12 (2017), 3226–3250.
  • Vani Kanjirangat and Deepa Gupta. 2017. Text plagiarism classification using syntax based linguistic features. Exp. Syst. Appl. 88 (2017), 448–464. DOI: 10.1016/j.eswa.2017.07.006
  • Anna Vartapetiance and Lee Gillam. 2013. A textual modus operandi: surrey's simple system for author identification—notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Juan D Velásquez, Yerko Covacevich, Francisco Molina, Edison Marrese-Taylor, Cristián Rodríguez, and Felipe Bravo-Marquez. 2016. DOCODE 3.0 (DOcument COpy DEtector): A system for plagiarism detection by applying an information fusion process from multiple documental data sources. Inf. Fus. 27 (2016), 64–75. DOI: 10.1016/j.inffus.2015.05.006
  • Ondřej Veselý, Tomáš Foltýnek, and Jiří Rybička. 2013. Source retrieval via naïve approach and passage selection heuristics—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Darnes Vilariño, David Pinto, Helena Gómez, Saúl León, and Esteban Castillo. 2013. Lexical-syntactic and graph-based features for authorship verification—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Ngoc Phuoc An Vo, Octavian Popescu, and Tommaso Caselli. 2014. FBK-TR: SVM for semantic relatedeness and corpus patterns for RTE. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval’14) . 289–293.
  • Hai Hieu Vu, Jeanne Villaneau, Farida Saïd, and Pierre-François Marteau. 2014. Sentence similarity by combining explicit semantic analysis and overlapping N-grams. In Text, Speech and Dialogue . 201–208.
  • Elizabeth Wager. 2014. Defining and responding to plagiarism. Learn. Publ. 27, 1 (2014), 33–42. DOI: 10.1087/20140105
  • Wafa Wali, Bilel Gargouri, and Abdelmajid Ben Hamadou. 2015. Supervised learning to measure the semantic similarity between arabic sentences. In Computational Collective Intelligence . 158–167.
  • John Walker. 1998. Student plagiarism in universities: What are we doing about it? High. Educ. Res. Dev. 17, 1 (1998), 89–106. DOI: 10.1080/0729436980170105
  • Shuai Wang, Haoliang Qi, Leilei Kong, and Cuixia Nu. 2013. Combination of VSM and jaccard coefficient for external plagiarism detection. In Proceedings of the 2013 International Conference on Machine Learning and Cybernetics . 1880–1885. DOI: 10.1109/ICMLC.2013.6890902
  • Debora Weber-Wulff. 2014. False feathers: A Perspective on Academic Plagiarism . Springer, Berlin.
  • Debora Weber-Wulff, Christopher Möller, Jannis Touras, and Elin Zincke. 2013. Plagiarism Detection Software Test 2013. Retrieved from http://plagiat.htw-berlin.de/wp-content/uploads/Testbericht-2013-color.pdf .
  • Agung Toto Wibowo, Kadek W. Sudarmadi, and Ari M. Barmawi. 2013. Comparison between fingerprint and winnowing algorithm to detect plagiarism fraud on Bahasa Indonesia documents. In Proceedings of the 2013 International Conference of Information and Communication Technology (ICoICT’13) . 128–133. DOI: 10.1109/ICoICT.2013.6574560
  • Kyle Williams, Hung-Hsuan Chen, Sagnik Ray Chowdhury, and C. Lee Giles. 2013. Unsupervised ranking for plagiarism source retrieval—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Kyle Williams, Hung-Hsuan Chen, and C. Lee Giles. 2014. Classifying and ranking search engine results as potential sources of plagiarism. In Proceedings of the 2014 ACM Symposium on Document Engineering (DocEng’14) . 97–106. DOI: 10.1145/2644866.2644879
  • Kyle Williams, Hung-Hsuan Chen, and C. Lee Giles. 2014. Supervised ranking for plagiarism source retrieval—notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop ( CLEF’ 14) .
  • Xuchen Yao, Benjamin Van Durme, Chris Callison-Burch, and Peter Clark. 2013. A lightweight and high performance monolingual word aligner. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) . 702–707.
  • Takeru Yokoi. 2015. Sentence-based plagiarism detection for japanese document based on common nouns and part-of-speech structure. In Intelligent Software Methodologies, Tools and Techniques . 297–308.
  • Guido Zarrella, John Henderson, Elizabeth M. Merkhofer, and Laura Strickhart. 2015. MITRE: Seven systems for semantic similarity in tweets. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval’15) . 12–17.
  • Chunxia Zhang, Xindong Wu, Zhendong Niu, and Wei Ding. 2014. Authorship identification from unstructured texts. Knowl.-Based Syst. 66 (2014), 99–111. DOI: 10.1016/j.knosys.2014.04.025
  • Jiang Zhao and Man Lan. 2015. Ecnu: Leveraging word embeddings to boost performance for paraphrase in twitter. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval’15) . 34–39.
  • Valentin Zmiycharov, Dimitar Alexandrov, Hristo Georgiev, Yasen Kiprov, Georgi Georgiev, Ivan Koychev, and Preslav Nakov. 2016. Experiments in authorship-link ranking and complete author clustering—Notebook for PAN at CLEF 2016. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’16) .
  • Sven Meyer Zu Eissen and Benno Stein. 2006. Intrinsic plagiarism detection. In Proceedings of the European Conference on Information Retrieval . 565–569.
  • Denis Zubarev and Ilya Sochenkov. 2014. Using sentence similarity measure for plagiarism source retrieval—notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop ( CLEF’ 14) .
  • Teddi Fishman. 2009. We know it when we see it' is not good enough: Toward a standard definition of plagiarism that transcends theft, fraud, and copyright. In Proceedings 4th Asia Pacific Conference on Educational Integrity (4APCEI'09) . 5.
  • 1 http://de.vroniplag.wikia.com .
  • 2 https://beallslist.weebly.com/standalone-journals.html .
  • 3 https://www.ldc.upenn.edu/ .
  • 4 http://github.com/danieldk/citar .
  • 5 http://www.cis.uni-muenchen.de/∼schmid/tools/TreeTagger/ .
  • 6 http://nlp.stanford.edu/software/lex-parser.shtml .
  • 7 https://publications.europa.eu/en/web/eu-vocabularies/th-dataset/-/resource/dataset/eurovoc .
  • 8 https://babelnet.org/ .
  • 9 https://wiki.dbpedia.org/ .
  • 10 http://verbs.colorado.edu/∼mpalmer/projects/ace.html .
  • 11 http://verbs.colorado.edu/∼mpalmer/projects/verbnet.html .
  • 12 https://pan.webis.de/data.html .
  • 13 https://www.microsoft.com/en-us/download/details.aspx?id=52398 .

This work was supported by the EU ESF grant CZ.02.2.69/0.0/0.0/16_027/0007953 “MENDELU international development.”

Authors’ addresses: T. Foltýnek, Department of Informatics, Faculty of Business and Economics, Mendel University in Brno, Zemědělská 1, 613 00 Brno, Czechia; email: [email protected] ; N. Meuschke and B. Gipp, Data & Knowledge Engineering Group, University of Wuppertal, School of Electrical, Information and Media Engineering, Rainer-Gruenter-Str. 21, D-42119 Wuppertal, Germany; emails: [email protected] , [email protected] , [email protected] , [email protected] .

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).

CC-BY share alike license image

This work is licensed under a Creative Commons Attribution-Share Alike International 4.0 License .

©2019 Copyright held by the owner/author(s). 0360-0300/2019/10-ART112 $15.00 DOI: https://doi.org/10.1145/3345317

Publication History: Received March 2019; revised August 2019; accepted August 2019

  • Utility Menu

University Logo

fa3d988da6f218669ec27d6b6019a0cd

A publication of the harvard college writing program.

Harvard Guide to Using Sources 

  • The Honor Code
  • What Constitutes Plagiarism?

In academic writing, it is considered plagiarism to draw any idea or any language from someone else without adequately crediting that source in your paper. It doesn't matter whether the source is a published author, another student, a website without clear authorship, a website that sells academic papers, or any other person: Taking credit for anyone else's work is stealing, and it is unacceptable in all academic situations, whether you do it intentionally or by accident.

The ease with which you can find information of all kinds online means that you need to be extra vigilant about keeping track of where you are getting information and ideas and about giving proper credit to the authors of the sources you use. If you cut and paste from an electronic document into your notes and forget to clearly label the document in your notes, or if you draw information from a series of websites without taking careful notes, you may end up taking credit for ideas that aren't yours, whether you mean to or not.

It's important to remember that every website is a document with an author, and therefore every website must be cited properly in your paper. For example, while it may seem obvious to you that an idea drawn from Professor Steven Pinker's book The Language Instinct should only appear in your paper if you include a clear citation, it might be less clear that information you glean about language acquisition from the Stanford Encyclopedia of Philosophy website warrants a similar citation. Even though the authorship of this encyclopedia entry is less obvious than it might be if it were a print article (you need to scroll down the page to see the author's name, and if you don't do so you might mistakenly think an author isn't listed), you are still responsible for citing this material correctly. Similarly, if you consult a website that has no clear authorship, you are still responsible for citing the website as a source for your paper. The kind of source you use, or the absence of an author linked to that source, does not change the fact that you always need to cite your sources (see Evaluating Web Sources ).

Verbatim Plagiarism

If you copy language word for word from another source and use that language in your paper, you are plagiarizing verbatim . Even if you write down your own ideas in your own words and place them around text that you've drawn directly from a source, you must give credit to the author of the source material, either by placing the source material in quotation marks and providing a clear citation, or by paraphrasing the source material and providing a clear citation.

The passage below comes from Ellora Derenoncourt’s article, “Can You Move to Opportunity? Evidence from the Great Migration.”

Here is the article citation in APA style:

Derenoncourt, E. (2022). Can you move to opportunity? Evidence from the Great Migration. The American Economic Review , 112(2), 369–408. https://doi.org/10.1257/aer.20200002

Source material

Why did urban Black populations in the North increase so dramatically between 1940 and 1970? After a period of reduced mobility during the Great Depression, Black out-migration from the South resumed at an accelerated pace after 1940. Wartime jobs in the defense industry and in naval shipyards led to substantial Black migration to California and other Pacific states for the first time since the Migration began. Migration continued apace to midwestern cities in the 1950s and1960s, as the booming automobile industry attracted millions more Black southerners to the North, particularly to cities like Detroit or Cleveland. Of the six million Black migrants who left the South during the Great Migration, four million of them migrated between 1940 and 1970 alone.

Plagiarized version

While this student has written her own sentence introducing the topic, she has copied the italicized sentences directly from the source material. She has left out two sentences from Derenoncourt’s paragraph, but has reproduced the rest verbatim:

But things changed mid-century. After a period of reduced mobility during the Great Depression, Black out-migration from the South resumed at an accelerated pace after 1940. Wartime jobs in the defense industry and in naval shipyards led to substantial Black migration to California and other Pacific states for the first time since the Migration began. Migration continued apace to midwestern cities in the 1950s and1960s, as the booming automobile industry attracted millions more Black southerners to the North, particularly to cities like Detroit or Cleveland.

Acceptable version #1: Paraphrase with citation

In this version the student has paraphrased Derenoncourt’s passage, making it clear that these ideas come from a source by introducing the section with a clear signal phrase ("as Derenoncourt explains…") and citing the publication date, as APA style requires.

But things changed mid-century. In fact, as Derenoncourt (2022) explains, the wartime increase in jobs in both defense and naval shipyards marked the first time during the Great Migration that Black southerners went to California and other west coast states. After the war, the increase in jobs in the car industry led to Black southerners choosing cities in the midwest, including Detroit and Cleveland.

Acceptable version #2 : Direct quotation with citation or direct quotation and paraphrase with citation

If you quote directly from an author and cite the quoted material, you are giving credit to the author. But you should keep in mind that quoting long passages of text is only the best option if the particular language used by the author is important to your paper. Social scientists and STEM scholars rarely quote in their writing, paraphrasing their sources instead. If you are writing in the humanities, you should make sure that you only quote directly when you think it is important for your readers to see the original language.

In the example below, the student quotes part of the passage and paraphrases the rest.

But things changed mid-century. In fact, as Derenoncourt (2022) explains, “after a period of reduced mobility during the Great Depression, Black out-migration from the South resumed at an accelerated pace after 1940” (p. 379). Derenoncourt notes that after the war, the increase in jobs in the car industry led to Black southerners choosing cities in the midwest, including Detroit and Cleveland.

Mosaic Plagiarism

If you copy bits and pieces from a source (or several sources), changing a few words here and there without either adequately paraphrasing or quoting directly, the result is mosaic plagiarism . Even if you don't intend to copy the source, you may end up with this type of plagiarism as a result of careless note-taking and confusion over where your source's ideas end and your own ideas begin. You may think that you've paraphrased sufficiently or quoted relevant passages, but if you haven't taken careful notes along the way, or if you've cut and pasted from your sources, you can lose track of the boundaries between your own ideas and those of your sources. It's not enough to have good intentions and to cite some of the material you use. You are responsible for making clear distinctions between your ideas and the ideas of the scholars who have informed your work. If you keep track of the ideas that come from your sources and have a clear understanding of how your own ideas differ from those ideas, and you follow the correct citation style, you will avoid mosaic plagiarism.

Indeed, of the more than 3500 hours of instruction during medical school, an average of less than 60 hours are devoted to all of bioethics, health law and health economics combined . Most of the instruction is during the preclinical courses, leaving very little instructional time when students are experiencing bioethical or legal challenges during their hands-on, clinical training. More than 60 percent of the instructors in bioethics, health law, and health economics have not published since 1990 on the topic they are teaching.

--Persad, G.C., Elder, L., Sedig,L., Flores, L., & Emanuel, E. (2008). The current state of medical school education in bioethics, health law, and health economics. Journal of Law, Medicine, and Ethics 36 , 89-94.

Students can absorb the educational messages in medical dramas when they view them for entertainment. In fact, even though they were not created specifically for education, these programs can be seen as an entertainment-education tool [43, 44]. In entertainment-education shows, viewers are exposed to educational content in entertainment contexts, using visual language that is easy to understand and triggers emotional engagement [45]. The enhanced emotional engagement and cognitive development [5] and moral imagination make students more sensitive to training [22].

--Cambra-Badii, I., Moyano, E., Ortega, I., Josep-E Baños, & Sentí, M. (2021). TV medical dramas: Health sciences students’ viewing habits and potential for teaching issues related to bioethics and professionalism. BMC Medical Education, 21 , 1-11. doi: https://doi.org/10.1186/s12909-021-02947-7

Paragraph #1.

All of the ideas in this paragraph after the first sentence are drawn directly from Persad. But because the student has placed the citation mid-paragraph, the final two sentences wrongly appear to be the student’s own idea:

In order to advocate for the use of medical television shows in the medical education system, it is also important to look at the current bioethical curriculum. In the more than 3500 hours of training that students undergo in medical school, only about 60 hours are focused on bioethics, health law, and health economics (Persad et al, 2008). It is also problematic that students receive this training before they actually have spent time treating patients in the clinical setting. Most of these hours are taught by instructors without current publications in the field.

Paragraph #2.

All of the italicized ideas in this paragraph are either paraphrased or taken verbatim from Cambra-Badii, et al., but the student does not cite the source at all. As a result, readers will assume that the student has come up with these ideas himself:

Students can absorb the educational messages in medical dramas when they view them for entertainment. It doesn’t matter if the shows were designed for medical students; they can still be a tool for education. In these hybrid entertainment-education shows, viewers are exposed to educational content that triggers an emotional reaction. By allowing for this emotional, cognitive, and moral engagement, the shows make students more sensitive to training . There may be further applications to this type of education: the role of entertainment as a way of encouraging students to consider ethical situations could be extended to other professions, including law or even education.

The student has come up with the final idea in the paragraph (that this type of ethical training could apply to other professions), but because nothing in the paragraph is cited, it reads as if it is part of a whole paragraph of his own ideas, rather than the point that he is building to after using the ideas from the article without crediting the authors.

Acceptable version

In the first paragraph, the student uses signal phrases in nearly every sentence to reference the authors (“According to Persad et al.,” “As the researchers argue,” “They also note”), which makes it clear throughout the paragraph that all of the paragraph’s information has been drawn from Persad et al. The student also uses a clear APA in-text citation to point the reader to the original article. In the second paragraph, the student paraphrases and cites the source’s ideas and creates a clear boundary behind those ideas and his own, which appear in the final paragraph.

In order to advocate for the use of medical television shows in the medical education system, it is also important to look at the current bioethical curriculum. According to Persad et al. (2008), only about one percent of teaching time throughout the four years of medical school is spent on ethics. As the researchers argue, this presents a problem because the students are being taught about ethical issues before they have a chance to experience those issues themselves. They also note that more than sixty percent of instructors teaching bioethics to medical students have no recent publications in the subject.

The research suggests that medical dramas may be a promising source for discussions of medical ethics. Cambra-Badii et al. (2021) explain that even when watched for entertainment, medical shows can help viewers engage emotionally with the characters and may prime them to be more receptive to training in medical ethics. There may be further applications to this type of education: the role of entertainment as a way of encouraging students to consider ethical situations could be extended to other professions, including law or even education.

Inadequate Paraphrase

When you paraphrase, your task is to distill the source's ideas in your own words. It's not enough to change a few words here and there and leave the rest; instead, you must completely restate the ideas in the passage in your own words. If your own language is too close to the original, then you are plagiarizing, even if you do provide a citation.

In order to make sure that you are using your own words, it's a good idea to put away the source material while you write your paraphrase of it. This way, you will force yourself to distill the point you think the author is making and articulate it in a new way. Once you have done this, you should look back at the original and make sure that you have represented the source’s ideas accurately and that you have not used the same words or sentence structure. If you do want to use some of the author's words for emphasis or clarity, you must put those words in quotation marks and provide a citation.

The passage below comes from Michael Sandel’s article, “The Case Against Perfection.” Here’s the article citation in MLA style:

Sandel, Michael. “The Case Against Perfection.” The Atlantic , April 2004, https://www.theatlantic.com/magazine/archive/2004/04/the-case-against-pe... .

Though there is much to be said for this argument, I do not think the main problem with enhancement and genetic engineering is that they undermine effort and erode human agency. The deeper danger is that they represent a kind of hyperagency—a Promethean aspiration to remake nature, including human nature, to serve our purposes and satisfy our desires. The problem is not the drift to mechanism but the drive to mastery. And what the drive to mastery misses and may even destroy is an appreciation of the gifted character of human powers and achievements.

The version below is an inadequate paraphrase because the student has only cut or replaced a few words: “I do not think the main problem” became “the main problem is not”; “deeper danger” became “bigger problem”; “aspiration” became “desire”; “the gifted character of human powers and achievements” became “the gifts that make our achievements possible.”

The main problem with enhancement and genetic engineering is not that they undermine effort and erode human agency. The bigger problem is that they represent a kind of hyperagency—a Promethean desire to remake nature, including human nature, to serve our purposes and satisfy our desires. The problem is not the drift to mechanism but the drive to mastery. And what the drive to mastery misses and may even destroy is an appreciation of the gifts that make our achievements possible (Sandel).

Acceptable version #1: Adequate paraphrase with citation

In this version, the student communicates Sandel’s ideas but does not borrow language from Sandel. Because the student uses Sandel’s name in the first sentence and has consulted an online version of the article without page numbers, there is no need for a parenthetical citation.

Michael Sandel disagrees with the argument that genetic engineering is a problem because it replaces the need for humans to work hard and make their own choices. Instead, he argues that we should be more concerned that the decision to use genetic enhancement is motivated by a desire to take control of nature and bend it to our will instead of appreciating its gifts.

Acceptable version #2: Direct quotation with citation

In this version, the student uses Sandel’s words in quotation marks and provides a clear MLA in-text citation. In cases where you are going to talk about the exact language that an author uses, it is acceptable to quote longer passages of text. If you are not going to discuss the exact language, you should paraphrase rather than quoting extensively.

The author argues that “the main problem with enhancement and genetic engineering is not that they undermine effort and erode human agency,” but, rather that “they represent a kind of hyperagency—a Promethean desire to remake nature, including human nature, to serve our purposes and satisfy our desires. The problem is not the drift to mechanism but the drive to mastery. And what the drive to mastery misses and may even destroy is an appreciation of the gifts that make our achievements possible” (Sandel).

Uncited Paraphrase

When you use your own language to describe someone else's idea, that idea still belongs to the author of the original material. Therefore, it's not enough to paraphrase the source material responsibly; you also need to cite the source, even if you have changed the wording significantly. As with quoting, when you paraphrase you are offering your reader a glimpse of someone else's work on your chosen topic, and you should also provide enough information for your reader to trace that work back to its original form. The rule of thumb here is simple: Whenever you use ideas that you did not think up yourself, you need to give credit to the source in which you found them, whether you quote directly from that material or provide a responsible paraphrase.

The passage below comes from C. Thi Nguyen’s article, “Echo Chambers and Epistemic Bubbles.”

Here’s the citation for the article, in APA style:

Nguyen, C. (2020). Echo chambers and epistemic bubbles. Episteme, 17 (2), 141-161. doi:10.1017/epi.2018.32

Epistemic bubbles can easily form accidentally. But the most plausible explanation for the particular features of echo chambers is something more malicious. Echo chambers are excellent tools to maintain, reinforce, and expand power through epistemic control. Thus, it is likely (though not necessary) that echo chambers are set up intentionally, or at least maintained, for this functionality (Nguyen, 2020).

The student who wrote the paraphrase below has drawn these ideas directly from Nguyen’s article but has not credited the author. Although she paraphrased adequately, she is still responsible for citing Nguyen as the source of this information.

Echo chambers and epistemic bubbles have different origins. While epistemic bubbles can be created organically, it’s more likely that echo chambers will be formed by those who wish to keep or even grow their control over the information that people hear and understand.

In this version, the student eliminates any possible ambiguity about the source of the ideas in the paragraph. By using a signal phrase to name the author whenever the source of the ideas could be unclear, the student clearly attributes these ideas to Nguyen.

According to Nguyen (2020), echo chambers and epistemic bubbles have different origins. Nguyen argues that while epistemic bubbles can be created organically, it’s more likely that echo chambers will be formed by those who wish to keep or even grow their control over the information that people hear and understand.

Uncited Quotation

When you put source material in quotation marks in your essay, you are telling your reader that you have drawn that material from somewhere else. But it's not enough to indicate that the material in quotation marks is not the product of your own thinking or experimentation: You must also credit the author of that material and provide a trail for your reader to follow back to the original document. This way, your reader will know who did the original work and will also be able to go back and consult that work if they are interested in learning more about the topic. Citations should always go directly after quotations.

The passage below comes from Deirdre Mask’s nonfiction book, The Address Book: What Street Addresses Reveal About Identity, Race, Wealth, and Power.

Here is the MLA citation for the book:

Mask, Deirdre. The Address Book: What Street Addresses Reveal About Identity, Race, Wealth, and Power. St. Martin’s Griffin, 2021.

In New York, even addresses are for sale. The city allows a developer, for the bargain price of $11,000 (as of 2019), to apply to change the street address to something more attractive.

It’s not enough for the student to indicate that these words come from a source; the source must be cited:

After all, “in New York, even addresses are for sale. The city allows a developer, for the bargain price of $11,000 (as of 2019), to apply to change the street address to something more attractive.”

Here, the student has cited the source of the quotation using an MLA in-text citation:

After all, “in New York, even addresses are for sale. The city allows a developer, for the bargain price of $11,000 (as of 2019), to apply to change the street address to something more attractive” (Mask 229).

Using Material from Another Student's Work

In some courses you will be allowed or encouraged to form study groups, to work together in class generating ideas, or to collaborate on your thinking in other ways. Even in those cases, it's imperative that you understand whether all of your writing must be done independently, or whether group authorship is permitted. Most often, even in courses that allow some collaborative discussion, the writing or calculations that you do must be your own. This doesn't mean that you shouldn't collect feedback on your writing from a classmate or a writing tutor; rather, it means that the argument you make (and the ideas you rely on to make it) should either be your own or you should give credit to the source of those ideas.

So what does this mean for the ideas that emerge from class discussion or peer review exercises? Unlike the ideas that your professor offers in lecture (you should always cite these), ideas that come up in the course of class discussion or peer review are collaborative, and often not just the product of one individual's thinking. If, however, you see a clear moment in discussion when a particular student comes up with an idea, you should cite that student. In any case, when your work is informed by class discussions, it's courteous and collegial to include a discursive footnote in your paper that lets your readers know about that discussion. So, for example, if you were writing a paper about the narrator in Tim O'Brien's The Things They Carried and you came up with your idea during a discussion in class, you might place a footnote in your paper that states the following: "I am indebted to the members of my Expos 20 section for sparking my thoughts about the role of the narrator as Greek Chorus in Tim O'Brien's The Things They Carried ."

It is important to note that collaboration policies can vary by course, even within the same department, and you are responsible for familiarizing yourself with each course's expectation about collaboration. Collaboration policies are often stated in the syllabus, but if you are not sure whether it is appropriate to collaborate on work for any course, you should always consult your instructor.

  • The Exception: Common Knowledge
  • Other Scenarios to Avoid
  • Why Does it Matter if You Plagiarize?
  • How to Avoid Plagiarism
  • Harvard University Plagiarism Policy

PDFs for This Section

  • Avoiding Plagiarism
  • Online Library and Citation Tools

Enago Academy

How to Avoid Plagiarism in Research Papers (Part 1)

' src=

Writing a research paper poses challenges in gathering literature and providing evidence for making your paper stronger. Drawing upon previously established ideas and values and adding pertinent information in your paper are necessary steps, but these need to be done with caution without falling into the trap of plagiarism . In order to understand how to avoid plagiarism , it is important to know the different types of plagiarism that exist.

What is Plagiarism in Research?

Plagiarism is the unethical practice of using words or ideas (either planned or accidental) of another author/researcher or your own previous works without proper acknowledgment. Considered as a serious academic and intellectual offense, plagiarism can result in highly negative consequences such as paper retractions and loss of author credibility and reputation. It is currently a grave problem in academic publishing and a major reason for paper retractions .

It is thus imperative for researchers to increase their understanding about plagiarism. In some cultures, academic traditions and nuances may not insist on authentication by citing the source of words or ideas. However, this form of validation is a prerequisite in the global academic code of conduct. Non-native English speakers  face a higher challenge of communicating their technical content in English as well as complying with ethical rules. The digital age too affects plagiarism. Researchers have easy access to material and data on the internet which makes it easy to copy and paste information.

Related: Conducting literature survey and wish to learn more about scientific misconduct? Check out this resourceful infographic today!

How Can You Avoid Plagiarism in a Research Paper?

Guard yourself against plagiarism, however accidental it may be. Here are some guidelines to avoid plagiarism.

1. Paraphrase your content

  • Do not copy–paste the text verbatim from the reference paper. Instead, restate the idea in your own words.
  • Understand the idea(s) of the reference source well in order to paraphrase correctly.
  • Examples on good paraphrasing can be found here ( https://writing.wisc.edu/Handbook/QPA_paraphrase.html )

2. Use Quotations

Use quotes to indicate that the text has been taken from another paper. The quotes should be exactly the way they appear in the paper you take them from.

3. Cite your Sources – Identify what does and does not need to be cited

  • The best way to avoid the misconduct of plagiarism is by self-checking your documents using plagiarism checker tools.
  • Any words or ideas that are not your own but taken from another paper  need to be cited .
  • Cite Your Own Material—If you are using content from your previous paper, you must cite yourself. Using material you have published before without citation is called self-plagiarism .
  • The scientific evidence you gathered after performing your tests should not be cited.
  • Facts or common knowledge need not be cited. If unsure, include a reference.

4. Maintain records of the sources you refer to

  • Maintain records of the sources you refer to. Use citation software like EndNote or Reference Manager to manage the citations used for the paper
  • Use multiple references for the background information/literature survey. For example, rather than referencing a review, the individual papers should be referred to and cited.

5. Use plagiarism checkers

You can use various plagiarism detection tools such as iThenticate or HelioBLAST (formerly eTBLAST) to see how much of your paper is plagiarised .

Tip: While it is perfectly fine to survey previously published work, it is not alright to paraphrase the same with extensive similarity. Most of the plagiarism occurs in the literature review section of any document (manuscript, thesis, etc.). Therefore, if you read the original work carefully, try to understand the context, take good notes, and then express it to your target audience in your own language (without forgetting to cite the original source), then you will never be accused with plagiarism (at least for the literature review section).

Caution: The above statement is valid only for the literature review section of your document. You should NEVER EVER use someone else’s original results and pass them off as yours!

What strategies do you adopt to maintain content originality? What advice would you share with your peers? Please feel free to comment in the section below.

If you would like to know more about patchwriting, quoting, paraphrasing and more, read the next article in this series!

' src=

Nice!! This article gives ideas to avoid plagiarism in a research paper and it is important in a research paper.

the article is very useful to me as a starter in research…thanks a lot!

it’s educative. what a wonderful article to me, it serves as a road map to avoid plagiarism in paper writing. thanks, keep your good works on.

I think this is very important topic before I can proceed with my M.A

it is easy to follow and understand

Nice!! These articles provide clear instructions on how to avoid plagiarism in research papers along with helpful tips.

Amazing and knowledgeable notes on plagiarism

Very helpful and educative, I have easily understood everything. Thank you so much.

Rate this article Cancel Reply

Your email address will not be published.

plagiarism in research papers

Enago Academy's Most Popular Articles

best plagiarism checker

  • Language & Grammar
  • Reporting Research

Best Plagiarism Checker Tool for Researchers — Top 4 to choose from!

While common writing issues like language enhancement, punctuation errors, grammatical errors, etc. can be dealt…

Use synonyms

How to Use Synonyms Effectively in a Sentence? — A way to avoid plagiarism!

Do you remember those school days when memorizing synonyms and antonyms played a major role…

plagiarism detector

  • Manuscripts & Grants

Reliable and Affordable Plagiarism Detector for Students in 2022

Did you know? Our senior has received a rejection from a reputed journal! The journal…

Similarity Report

  • Publishing Research
  • Submitting Manuscripts

3 Effective Tips to Make the Most Out of Your iThenticate Similarity Report

This guest post is drafted by an expert from iThenticate, a plagiarism checker trusted by the world’s…

originality

How Can Researchers Avoid Plagiarism While Ensuring the Originality of Their Manuscript?

How Can Researchers Avoid Plagiarism While Ensuring the Originality of Their…

Is Your Reputation Safe? How to Ensure You’re Passing a Spotless Manuscript to Your…

Should the Academic Community Trust Plagiarism Detectors?

plagiarism in research papers

Sign-up to read more

Subscribe for free to get unrestricted access to all our resources on research writing and academic publishing including:

  • 2000+ blog articles
  • 50+ Webinars
  • 10+ Expert podcasts
  • 50+ Infographics
  • 10+ Checklists
  • Research Guides

We hate spam too. We promise to protect your privacy and never spam you.

  • Industry News
  • AI in Academia
  • Promoting Research
  • Career Corner
  • Diversity and Inclusion
  • Infographics
  • Expert Video Library
  • Other Resources
  • Enago Learn
  • Upcoming & On-Demand Webinars
  • Peer-Review Week 2023
  • Open Access Week 2023
  • Conference Videos
  • Enago Report
  • Journal Finder
  • Enago Plagiarism & AI Grammar Check
  • Editing Services
  • Publication Support Services
  • Research Impact
  • Translation Services
  • Publication solutions
  • AI-Based Solutions
  • Thought Leadership
  • Call for Articles
  • Call for Speakers
  • Author Training
  • Edit Profile

I am looking for Editing/ Proofreading services for my manuscript Tentative date of next journal submission:

plagiarism in research papers

In your opinion, what is the most effective way to improve integrity in the peer review process?

Quetext

Recognizing & Avoiding Plagiarism in Your Research Paper

  • Posted on January 26, 2024 January 26, 2024

Recognizing and Avoiding Plagiarism in Your Research Paper

Plagiarism in research is unfortunately still a serious problem today. Research papers with plagiarism contain unauthorized quoting from other authors; the writer may even try to pass off others’ work as their own. This damages the individual’s reputation, but also the entire class, school, or field, because one can never fully trust that writer’s work is genuine. Naturally, you don’t want to contribute to that problem.

Unfortunately, plagiarism doesn’t have to be intentional to be damaging. College students and even professionals often fail to properly cite their sources for anything that isn’t common knowledge. While accidental plagiarism is more innocent, it is not less dangerous as it can still get you in a great deal of academic trouble.

The good news is, as long as you put research integrity first, and do your plagiarism due diligence, you shouldn’t have anything to worry about.

Ready to learn all about research paper citation rules, and how to avoid getting caught in this trap? Let’s take a look.

What is Plagiarism in Research?

Plagiarism is the act of using someone else’s work or intellectual property without acknowledging what you’re doing. Before you can truly understand plagiarism rules, it’s critical to know what plagiarism is in academic writing. In a nutshell, work is considered plagiarized if it is not your own original work and is also not cited as someone else’s.

Plagiarism is unethical because it takes the blood, sweat, and tears of other writers and passes it off as yours, without real effort on your part. This can dilute someone else’s standing or lead to confusion down the line about where credit is due. If readers can’t tell whose work led to certain academic papers, data sets, or theories, the science and art worlds suffer.

Science writers, journalists, marketing experts, medical and dental researchers, and students, among others, have all been stung by misunderstanding these rules. Even a simple copy and paste without attribution or referencing the original author is enough to signal professional and/or academic dishonesty.

The bottom line is, if you’re using someone else’s text word-for-word, you absolutely must note where that work came from. This protects the ideas of others and upholds publication ethics for all of us. That means, of course, you need to spot any plagiarism red flags from the get-go.

What Types of Plagiarism Can Occur in a Research Paper?

Some of the most common forms of plagiarism that occur in research papers and other forms of academic writing are:

  • In-text citations, in parentheses, without a corresponding citation in a bibliography or works cited page, which means people have trouble finding the true source
  • Citing work incorrectly
  • Not following the prescribed citation style, whether that’s APA, MLA, or Chicago, making it difficult or impossible for others to find the source
  • Paraphrasing someone else’s work too closely without citing the source
  • Using data or statistics from someone else without a proper citation
  • Following the format of someone else’s work in a section or in the paper as a whole
  • Attributing research to the wrong person, such as cutting and pasting someone else’s quote and attributing it either to an incorrect author, or simply not providing attribution at all
  • Relying too heavily on just a few sources, meaning you are taking their ideas wholesale

The truth is, most of this plagiarism isn’t even on purpose. Indeed, unintentional plagiarism is a major source of confusion in academia, where you yourself don’t realize that you have committed it. Self-plagiarism poses a problem too and is when you reuse your own work without citing it. This is definitely considered plagiarism even though you are the original author. Although re-using your old work is allowed with citations, doing so without them passes off old work as original, which has two drawbacks:

  • You are not obeying the spirit of the assignment, which is to put in the time to create something new with your own ideas.
  • You create downstream confusion when people are searching for your work, which conflicts with the entire goal of citing sources.

Direct plagiarism also occurs; however, direct plagiarism is intentional. Intentional thievery is even worse because it is often disguised by the person committing it and therefore more harmful to the original author. Again, this leads to severe moral and ethical problems, as it dilutes the hard work of others. Considering the fact that it’s generally quite easy to detect direct plagiarism, it’s worthwhile for students to realize that committing plagiarism intentionally is never worth it.

In summary, there are many examples of plagiarism of which to be aware. All of these can lead to serious trouble if you’re not continually wary of the plagiarism research paper traps. Students should know that Blackboard and other online academic portals check for plagiarism . Professionals should know that serious plagiarizing can cost them licenses, grants, and standing among their peers.

In other words, it’s no joke. To avoid potential consequences, keep an eye out for the following plagiarism research paper warning signs.

Warning Signs of Plagiarism in a Research Paper

To avoid plagiarism, research papers must be free of uncited work that uses the ideas of others. That means indicating the original source every single time you use one, with a proper citation, in the correct style as dictated by your professor or industry.

While unintentional plagiarism can happen to anyone, knowing its signs can help students and professional authors realize when they need to rewrite or add a citation to their writing. That will help you stay on the good side of academia, respect others’ work and ensure your own work is always improving. When reviewing your paper, look for the following signs that you may have failed to cite sources properly.

Infrequent Use of Citations

If you simply don’t have very many citations in a long research paper, you are likely using the ideas of others without proper credit. Most well-researched papers use dozens of sources for a 10-page paper. That indicates that you are weaving together others’ work to express your own ideas.

However, if your work contains close to five or six citations, chances are you are relying too heavily on ideas that are not your own. This indicates that you need to search more carefully for ideas that belong to others in your writing and cite them. As another suggestion, you should probably seek additional different sources to support your argument.

Using Words That You Don’t Normally Use

Any section of your paper that contains a smattering of words that don’t fit into your existing vocabulary hints you’ve likely nabbed them from somewhere else. While it’s fine (and good!) to build your word bank, inserting non-typical words into your text is a good indication that you are also inserting the ideas of others without credit. Comb over such sections carefully to ensure you have properly accredited the original writer.

Changes in Tone and Sentence Structure

As with words you don’t use, tone and sentence structure that is alien to your writing should be a red flag. Look carefully at these sections, asking yourself:

  • Are any of these sentences just reconstructions of someone else’s writing?
  • If I rewrote this idea from the ground up, would it sound different?
  • Is this the tone I’m even going for in this paper?

Changes in Font

Changes in the font used in your research paper is a dead giveaway. It indicates clearly that you have copied and pasted something into your paper, be that from an outside source or your own previous work. If you spot such a section, you should either rewrite it or source it accurately, and be sure to change the font to match the rest of your paper.

Tips to Avoid Plagiarizing

Avoiding plagiarism is truly easy. Simply provide citations for all research and ideas that you didn’t create yourself, in the correct styles. These styles include APA, most common for science and medical writing; MLA, common for the arts; and Chicago Style, usually used for publishing. You can also use the following tips for beating a plagiarism checker :

  • Paraphrase the thoughts of others in your own words instead of copying their work verbatim. This reduces the chances that your work will pull up in a search ahead of theirs, which is the fair thing to do. Make sure that you don’t confuse paraphrasing with complete freedom to forego citations, though, as both are important together.
  • Link your own ideas together using the ideas of others, but rely most heavily on your original work. Others’ thoughts and words should be used to support yours, not vice versa. Before you turn to sources for your paper, outline your own approach thoroughly. This will minimize the chances of unintended theft and maximize the impact of your contributions.
  • Always use quotation marks if you are using someone’s ideas word for word. Depending on the citation style you are using, you may instead use blocked and indented text to indicate a quote from someone else. Be sure to format your paper correctly, according to the style that has been assigned to you by a professor or superior.
  • Never use words you’re not familiar with. Not only can that lead to you expressing your ideas incorrectly, but it can also trigger plagiarism checkers if you haven’t made ideas your own.
  • Provide a full works-cited or bibliography page with every assignment you submit. Again, adhere to the citation style that was given to you, which will allow others to easily locate the sources you used. Make sure to properly cite sources in the text, footnotes, and at the end of your paper, as dictated by your style guide.
  • Be honest with professors or bosses. If you truly cannot finish something in time and are motivated to act unethically, resist the urge and take your concerns to the proper authority figure. Even turning in a botched assignment is far better for your reputation and your own ethics than using someone else’s work without the proper citations.
  • Use a citation checker to ensure that you haven’t ripped off someone else’s work without meaning it. This protects them, protects you, and protects academics as a whole.

Remember, as long as you go into an assignment with the intention to create something original that reflects your honest opinion, you will likely be fine. However, you do yourself a huge disservice if you don’t take that extra step and check your sources with a plagiarism checker.

Using Quetext To Avoid Plagiarizing in Research Papers

A citation generator can help professionals, researchers, and university students alike cite web pages, journal articles, books, newspapers, and more. With proper citation, you’ll never have to worry about accusations of plagiarism again.

Using the Quetext plagiarism checker before submitting the assignment can provide reassurance that no unauthorized quoting is taking place in your research paper. It will help you by flagging any spots in your paper that still require citations, such as a missing attribution for a book or original source.

Quetext is not only reliable, but it is also easy to use. If plagiarism of any kind is detected, the tool automatically generates the proper citation, in the required citation style, right inside the text. The citation generator will create the citations your paper needs in APA, MLA, or Chicago Style. All you have to do is enter the citation components, and voilà: your works cited page, bibliography, footnotes, and the paper as a whole will appear in the proper style.

The tool works whether the source is private or published, personal, academic, professional, or anything else. Now you can be sure to honor others’ work and avoid any negative consequences from plagiarised work. This will keep you in good standing with academic institutions and free you from any shadow of scientific misconduct.

As long as you make the effort to do your own work, respect your school’s academic integrity and use a plagiarism checker, you should have nothing to worry about. Don’t wait any longer to get peace of mind … start today.

Sign Up for Quetext Today!

Click below to find a pricing plan that fits your needs.

plagiarism in research papers

You May Also Like

plagiarism in research papers

The Benefits of Peer Review: How to Give and Receive Constructive Feedback on Your Writing

  • Posted on August 9, 2024

plagiarism in research papers

Teaching Students About Plagiarism: Strategies for Promoting Academic Integrity

  • Posted on August 2, 2024

plagiarism in research papers

Encouraging Proper Citation Practices: Tips for Teaching Students How to Cite Sources Correctly and Ethically

  • Posted on July 22, 2024

plagiarism in research papers

A Guide to Paraphrasing Poetry, With Examples

  • Posted on July 12, 2024

plagiarism in research papers

Preparing Students for the Future: AI Literacy and Digital Citizenship

  • Posted on July 5, 2024

plagiarism in research papers

How to Summarize a Paper, a Story, a Book, a Report or an Essay

  • Posted on June 25, 2024 June 25, 2024

plagiarism in research papers

How to Use AI to Enhance Your Storytelling Process

  • Posted on June 12, 2024

plagiarism in research papers

Essential Comma Rules for Business Emails

  • Posted on June 7, 2024

Input your search keywords and press Enter.

American Psychological Association

In-Text Citations

In scholarly writing, it is essential to acknowledge how others contributed to your work. By following the principles of proper citation, writers ensure that readers understand their contribution in the context of the existing literature—how they are building on, critically examining, or otherwise engaging the work that has come before.

APA Style provides guidelines to help writers determine the appropriate level of citation and how to avoid plagiarism and self-plagiarism.

We also provide specific guidance for in-text citation, including formats for interviews, classroom and intranet sources, and personal communications; in-text citations in general; and paraphrases and direct quotations.

plagiarism in research papers

Academic Writer ®

Master academic writing with APA’s essential teaching and learning resource

illustration or abstract figure and computer screen

Course Adoption

Teaching APA Style? Become a course adopter of the 7th edition Publication Manual

illustration of woman using a pencil to point to text on a clipboard

Instructional Aids

Guides, checklists, webinars, tutorials, and sample papers for anyone looking to improve their knowledge of APA Style

plagiarism in research papers

7 Best Plagiarism Checkers in 2024 (Free & Paid)

Plagiarism remains a significant concern in the writing community. When an author claims another person's work as their own, it constitutes a serious breach of ethics, both in academic and professional settings. Hence, getting a reliable plagiarism check is essential for writers to ensure their content is original and free from potential infringements.

In response to this need, Wordvice AI has compiled a comprehensive list of the top 7 plagiarism checkers available in 2024. We have evaluated and ranked each tool to help you find the one that best meets your requirements. Although we cannot promise any results in your particular use case, we do recommend giving one of these plagiarism checkers a try to ensure your work is authentic and free of copied text.

  • How to Avoid Plagiarism in Research and Essays

What do plagiarism checkers do?

Plagiarism checkers are digital tools designed to identify instances of plagiarism in a piece of writing. They work by scanning the submitted text and highlighting any sections that match other content, often providing a similarity percentage and detailed reports that indicate where the matches were found. To ensure work is original, good plagiarism check tools compare your text to a vast database of existing content on the internet, academic papers, books, and other sources.

Plagiarism check tools are used by a wide range of writers, including students, teachers, researchers, and professional writers. Students use plagiarism checkers to ensure their academic papers, assignments, and admissions essays are original. Teachers and professors use them to verify the authenticity of students' work. Researchers rely on a research paper plagiarism checker to check their papers before publication to avoid unintentional plagiarism, including self-plagiarism . Finally, professional writers and bloggers use plagiarism software to ensure their content is unique and credible and avoid SEO and reputational penalties.

A good plagiarism checker will detect plagiarism accurately, even when there are alterations made to the original phrasing. Inclusion of a plagiarism report is also an important factor in determining how complete a tool is.

Do I need a plagiarism check for my document?

Whether you’re a student, a professional writer, or a researcher, running your work through a plagiarism checker is a step to include in your writing preparation workflow.

But do you NEED to use a plagiarism checker? It depends on what you’re writing and if you are intending to publish your writing somewhere it will be accessible–whether in an academic journal or a large blogsite. If you care about retaining your original voice and keeping your reputation as a communicator intact, using plagiarism tools is an easy step to take that can save you from potential problems down the road.

Free vs. Paid Plagiarism Checkers: Some Conclusions

When comparing free and paid plagiarism checkers, several patterns emerge. As one might expect, paid tools generally offer more accurate and comprehensive results as they often have access to larger databases, including academic journals, proprietary content, and extensive web sources, which enhances their detection capabilities.

On the other hand, free plagiarism check tools, while useful for basic checks, tend to miss more subtle or less blatant instances of plagiarism due to their limited databases and less sophisticated algorithms. Paid versions also provide better user experiences, including detailed reports and customer support, and some even offer a grammar checker or citation generator as additional resources for writers.

In summary, while free plagiarism checkers can be helpful for initial reviews, paid versions are typically more reliable and effective for ensuring thorough plagiarism detection and maintaining the integrity of your work.

7 Best Plagiarism Checkers for Research, Academic, and Professional Writing

1. wordvice ai plagiarism checker–best plagiarism checker for academic papers.

best plagiarism checker - wordvice ai plagiarism checker example

Pros & Cons

✅ Detects a high amount of plagiarism. Capable of identifying plagiarism even in paraphrased or edited texts ❌ No free version available
✅ Includes a free plagiarism report and self-plagiarism detection feature ❌ Only available on the website platform
✅ Easy to use and analyze results
✅ A suite of free AI revision tools to enhance your writing after getting a plagiarism check
✅ User data and uploaded content is protected and confidential

Plagiarism checker overview

The Wordvice AI Plagiarism Checker excels at detecting plagiarism, it catches nearly all instances of plagiarism in academic documents and accurately identifies copied content from a wide range of sources, including academic journals, websites, and dissertations.

This highly accurate plagiarism checker is particularly effective in catching paraphrased or heavily edited plagiarism, ensuring comprehensive detection.

User experience

Wordvice AI provides a user-friendly experience with a clear, downloadable report highlighting different sources in distinct colors for easy reading.

Users can check for self-plagiarism by checking their unpublished text against the database, preventing self-plagiarism (or using your own previously published work, whether intentionally or unintentionally).

However, users cannot edit their text directly within the plagiarism check tool, but can use free revision tools (especially for academic and admissions writing) provided by the Wordvice AI Writing Assistant , including the AI Proofreader , AI Paraphraser , AI Translator , AI Summarizer , and AI Detector .

Wordvice AI states in the site’s terms and conditions that they prioritize privacy and security. Uploaded documents are never sold or licensed to third-party data brokers.

Wordvice AI does not offer a free version of their plagiarism checker, but a subscription to Wordvice AI Premium grants users access to all of the tools in the AI Writing Assistant, including full reports and detailed similarity scores in the Plagiarism Checker. Pricing is $9.95/month and Team plans are also available for greater collaboration.

9.5 9 8.7

2. Copyleaks Plagiarism Checker–one of the all-around best plagiarism checkers

copyleaks plagiarism checker example

✅ Accesses a vast database of sources ❌ Requires a subscription to access the plagiarism checker
✅ Points-based system with purchased credits ❌ Credit-based pricing can become costly for heavy users
✅ Offers reliable plagiarism detection with the ability to compare source

Copyleaks is a robust tool designed for educators, businesses, and content creators to ensure originality in their documents. It uses advanced AI algorithms to scan texts against billions of web pages, academic databases, and private repositories. The tool provides detailed reports highlighting potential matches and the percentage of duplicate content. Copyleaks excels in detecting paraphrased content and subtle similarities, making it highly effective in maintaining academic integrity and content authenticity.

Copyleaks effectively identifies plagiarism across various sources, including public and specialized databases. It is particularly adept at comparing sources, making it easy to pinpoint specific instances of plagiarism.

The tool’s detailed reports and ability to compare sources enhance the overall user experience, providing clear insights into potential plagiarism. Overall, multiplatform support and comprehensive databases make Copyleaks a user-friendly option.

Copyleaks employs a points-based pricing model, offering flexibility for users based on their needs. Credits can be purchased as needed, with 100 points covering approximately 25,000 words. While the pricing is flexible, it can be slightly high for heavier users. Monthly plans range from around $10 for low-end usage to $1100 for extensive organizational use.

9 9.3 8.5

3. Quetext Plagiarism Checker–a solid plagiarism and ai checker

quetext plagiarism checker example

✅ Built-in citation assistant to help add missing citations ❌ Does not detect all instances of plagiarism
✅ Ensures that documents are not stored in a database ❌ Monthly subscription starts at $9.6 after free trial, and another $7.99 for the AI Detector
❌ Not a highly effective academic plagiarism checker for most research documents

Quetext’s plagiarism checker performs better than most free tools, they ensures the privacy and security of submitted texts, stating that documents are not saved in a database and are kept private and encrypted.

Quetext provides a straightforward plagiarism report, displaying a percentage of similarity and highlighting plagiarized text.

While it claims to check against academic sources, its performance in detecting plagiarism from journal articles and dissertations is less impressive. The built-in citation assistant offers help in adding citations but requiring manual input of additional information.

Quetext offers a limited free trial of up to 500 words. After the trial, users need to subscribe to a premium plan starting at $9.6 per month, which allows checking up to 100,000 words. Higher-priced plans are available for users needing to check more extensive content. File uploads are only available in the Premium version; otherwise, users must copy and paste their text.

8.8 9.3 8.2

4. Grammarly Plagiarism Checker–one of the most popular tools to check for plagiarism

grammarly plagiarism checker page

✅ Includes tools to improve language, style, and citations ❌ Requires a subscription to access detailed plagiarism reports
✅ Documents are ensured not to be sold or shared with third parties ❌ Relatively low percentage of plagiarism detected overall
❌ Same color used to highlight different sources, making it hard to differentiate

Grammarly's plagiarism checker is capable of identifying more complete matches than most free plagiarism checking software. As one of the most popular AI revision tools for years, Grammarly has earned a reputation as a reliable writing and editing platform. It is committed to user privacy, ensuring that documents are not stored, sold, or shared with third parties.

Grammarly offers a premium user experience with a clean, visually appealing interface. The tool maintains the original formatting of documents, which is a plus. However, according to our test result, it struggles with detecting academic and online sources, and the use of the same color to highlight all sources can make the results difficult to interpret. Therefore, the plagiarism check tool is better suited for general text rather than specialized academic documents.

Subscribers benefit from additional features, including a language and style assistant and a citation helper, enhancing the overall utility of the service. Overall, while Grammarly's plagiarism checker offers a polished user experience and additional writing tools, its effectiveness in detecting plagiarism (especially from academic sources) is somewhat limited compared to some other plagiarism check tools.

Grammarly's plagiarism detection software is not available for free. To access detailed plagiarism reports, users must subscribe to the premium service, which costs $30 per month. The premium package also grants access to Grammarly's comprehensive suite of language and writing tools, but the need to pay for a subscription may also be a drawback for some users.

8.4 9 8.5

5. PaperRater Plagiarism Checker–a plagiarism checker for basic detection

paperrater plagiarism checker example

✅ Utilizes Google and Bing to search the entire internet for potential matches ❌ Does not detect all instances of plagiarism in tests, which shows the weakness of plagiarism detection than other plagiarism checking tools
✅ Offers a free version that allows checks of up to 1,500 words per document ❌ Premium plan allows only 25 checks per month for documents
❌ Only available as a web-based tool with no mobile apps or browser add-ons
❌ Understanding what report details mean can be somewhat confusing

While PaperRater's plagiarism detection scans a wide range of internet sources, its accuracy does not match that of some competitors, which often detect over 80% of plagiarism in our test. This makes PaperRater less reliable and competitive for thorough plagiarism checks, especially for academic or professional use.

PaperRater offers a straightforward web-based interface, but its functionality is limited as it did not detect all instances of plagiarism for academic content. The free version is helpful for budget-conscious students, but the word limit and the number of checks can be restrictive.

PaperRater’s Premium plan is priced at $14.95 for a monthly subscription. This plan includes enhanced features but remains limited to 25 checks per month for documents. These restrictions make it less suitable for heavy users, such as writers, editors, or educators. Additionally, the lack of PDF support further diminishes its appeal compared to plagiarism checkers like Wordvice AI or Copyleaks.

8.5 7.9 8

6. Copyscape–top plagiarism checker for online contents

copyscape plagiarism checker example

✅ Works with most languages, excluding some Asian languages ❌ Free plagiarism search for online URLs; file upload only available in Premium Plan.
✅ Pay-as-you-go model allows prepayment used towards multiple checks ❌ Accessible only via desktop or laptop; no support for mobile devices, browsers, or Microsoft Word integration
❌ Google and Bing provide a comprehensive range of sources

Copyscape effectively scans a wide range of sources provided by Google and Bing, ensuring comprehensive checks. Its robust database helps identify copied content across various languages, though it performs less effectively with some East Asian languages.

Copyscape provides a straightforward, user-friendly experience. The interface is intuitive, with a URL text box for easy input. However, the inability to save reports and the lack of device support can be limiting for users who need to keep a record of previous checks. Although Copyscape is effective for detecting plagiarism in online content, other plagiarism tools with more comprehensive features and better support may be more suitable for those needing frequent and detailed plagiarism checks.

Copyscape employs a flexible pay-as-you-go pricing model. Users can prepay to receive multiple plagiarism checks, costing just a few cents per hundred words. This model allows you to pay only for what you use, but may require frequent recharges for heavy users.

8.5 8 7.5

7. Search Engine Reports Plagiarism Checker–good plagiarism checker for limited uses

search engine reports plagiarism checker page

✅ Up to 1,000 words per scan without registration ❌ Site is cluttered with ads, disrupting the user experience somewhat
✅ Uploaded documents are not stored in their database ❌ Same color used for different sources, making it hard to distinguish between them

Search Engine Reports' plagiarism checker works by searching Google for each sentence in your document and marking it as plagiarism if it finds a match. This method resulted in high percentages of plagiarism being identified in online resources. The tool struggles to find full matches and performs poorly with academic sources such as dissertations and journal articles.

The usability of the tool is hampered by distracting ads and a lackluster interface. The report lists individual sentences marked as plagiarized or original, with links to Google searches for each sentence. The downloadable report highlights plagiarized text and lists sources. Additionally, a language statistics feature helps identify frequently used words to limit repetition. While it is free and maintains document privacy, the user experience is further hindered by distracting ads and unclear reports.

Search Engine Reports is free for scanning documents up to 1,000 words. For larger documents, users can pay to increase the limit to 30,000 words per scan. This tiered approach allows basic functionality for free, with additional capacity available for a fee.

7.5 8.5 7.3

Best Plagiarism Checkers: Final Thoughts

9.5 9 8.7 27.2
9 9.3 8.5 26.8
8.8 9.3 8.2 26.3
8.4 9 8.5 25.9
8.5 7.9 8 24.4
8.5 8 7.5 24.0
7.5 8.5 7.3 23.3

As our review demonstrates, there are currently dozens of reliable and free plagiarism checkers on the market today, many of which are powered by AI technology to comb through databases and look for similar phrases and sentences. However, keep in mind that no plagiarism checking software is 100% accurate and that it is ultimately up to you, the author, to ensure that you have not borrowed from other sources without including citations and/or quotes.

For writers who require more than just a plagiarism check, Wordvice offers professional proofreading services for all types of documents by highly qualified editors with Master’s and PhD degrees. Perfect your admissions essays and other application documents with our Admissions Editing Services . Prepare your paper for publication in journals with our Academic Editing Services . And create compelling copy and internal communications text with our Business Editing Services .

Whichever revision tools or services you use, make sure that it matches the specific needs of your document, tone, and style. And read our AI tool reviews before buying, including reviews of the best online paraphrasing tools , best online translators , and best AI grammar check tools .

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • J Korean Med Sci
  • v.35(27); 2020 Jul 13

Logo of jkms

Similarity and Plagiarism in Scholarly Journal Submissions: Bringing Clarity to the Concept for Authors, Reviewers and Editors

Aamir raoof memon.

Institute of Physiotherapy & Rehabilitation Sciences, Peoples University of Medical & Health Sciences for Women, Nawabshah (Shaheed Benazirabad), Sindh, Pakistan.

INTRODUCTION

What constitutes plagiarism? What are the methods to detect plagiarism? How do “plagiarism detection tools” assist in detecting plagiarism? What is the difference between plagiarism and similarity index? These are probably the most common questions regarding plagiarism that many research experts in scientific writing are usually faced with, but a definitive answer to them is less known to many. According to a report published in 2018, papers retracted for plagiarism have sharply increased over the last two decades, with higher rates in developing and non-English speaking countries. 1 Several studies have reported similar findings with Iran, China, India, Japan, Korea, Italy, Romania, Turkey, and France amongst the countries with highest number of retractions due to plagiarism. 1 , 2 , 3 , 4 A study reported that duplication of text, figures or tables without appropriate referencing accounted for 41.3% of post-2009 retractions of papers published from India. 5 In Pakistan, Journal of Pakistan Medical Association started a special section titled “Learning Research” and published a couple of papers on research writing skills, research integrity and scientific misconduct. 6 , 7 However, the problem has not been adequately addressed and specific issues about it remain unresolved and unclear. According to an unpublished data based on 1,679 students from four universities of Pakistan, 85.5% did not have a clear understanding of the difference between similarity index and plagiarism (unpublished data). Smart et al. 8 in their global survey of editors reported that around 63% experienced some plagiarized submissions, with Asian editors experiencing the highest levels of plagiarized/duplicated content. In some papers, journals from non-English speaking countries have specifically discussed the cases of plagiarized submissions to them and have highlighted the drawbacks in relying on similarity checking programs. 9 , 10 , 11 The cases of plagiarism in non-English speaking countries have a strong message for honest researchers that they should improve their English writing skills and credit used sources by properly citing and referencing them. 12

Despite aggregating literature on plagiarism from non-Anglophonic countries, the answers to the aforementioned questions remain unclear. In order to answer these questions, it is important to have a thorough understanding of plagiarism and bring clarity to the less known issues about it. Therefore, this paper aims to 1) define plagiarism and growth in its prevalence as well as literature on it; 2) explain the difference between similarity and plagiarism; 3) discuss the role of similarity checking tools in detecting plagiarism and the flaws on completely relying on them; and 4) discuss the phenomenon called Trojan citation. At the end, suggestions are provided for authors and editors from developing countries so that this issue maybe collectively addressed.

Defining plagiarism and its prevalence in manuscripts

To begin with, plagiarism maybe defined as “ when somebody presents the published or unpublished work of others, including ideas, scholarly text, images, research design and data, as new and original rather than crediting the existing source of it. ” 13 The common types of plagiarism, including direct, mosaic, paraphrasing, intentional (covert) or unintentional (accidental) plagiarism, and self-plagiarism have been discussed in previous reviews. 14 , 15 , 16

Evidence suggests that the first paper accused for plagiarism was published in 1979 and there has been a substantial growth in the cases of plagiarism over time. 1 , 2 , 3 , 4 , 5 , 8 , 17 Previous studies have pointed that plagiarism is prevalent in developing and non-English speaking countries but the occurrence of plagiarism in developed countries suggests that it is rather a global problem. 1 , 2 , 3 , 4 , 18 , 19 , 20 As of today (1 April 2020), the search conducted in Retraction Database ( http://retractiondatabase.org/RetractionSearch.aspx ?) for papers retracted for plagiarism found 2,280 documents. Similarly, Scopus search for plagiarism in title of journal articles found 2,159 results. This suggests that the papers retracted for plagiarism are in fact higher than the papers published on this issue. However, what we see now may not necessary be true i.e., the cases of plagiarism might be higher than we know. Certainly, database search for papers tagged for plagiarism is limited to indexed journals only, which keeps non-indexed journals (both low-quality and deceptive journals) out of focus. 5 , 21 Moreover, journal coverage may vary from one database to the other as reported in a recent paper on research dissemination in South Asia. 22 Therefore, both the prevalence of plagiarism and literature published on it as reported by database search are most likely “ understated as of today .” 5

Reasons for plagiarism: lack of understanding and poor citing practices

Although reasons for plagiarism are complex, previous papers have suggested possible causes for plagiarism by authors. 16 , 23 , 24 , 25 , 26 One of the major but less known reason for this might be that the students, naïve researchers, and even some faculty members either lack clarity about what constitutes plagiarism or are unable to differentiate similarity index versus plagiarism. 24 , 26 , 27 For example, a recent online survey conducted on the participants in the AuthorAID MOOC on Research Writing found that 84.4% of the survey participants were unaware of the difference between similarity index and plagiarism, though almost all of them had reported having an understanding of plagiarism. 24 The same paper reported that one in three participants admitted that they had plagiarized at some point during their academic career. 24 Therefore, it is important to have clarity about what constitutes plagiarism and the difference between similarity index and plagiarism so that the increasing rates of plagiarism could be deterred.

The ‘existing source’ or ‘original source’ in the definition of plagiarism refers to the main (primary) source and not the source (secondary) from where the author extracts the information. For example, someone cites a paper for a passage on mechanism of how exercise affects sleep but the cited paper aims to determine the prevalence of sleep disorders and exercise level rather than the mechanistic association. A thorough evaluation finds that the cited paper had used the text from another review paper that talked about the mechanisms relating sleep with exercise behavior. This phenomenon of improper secondary (or indirect) citations may be common among students and novice researchers, particularly from developing countries, and should be discouraged. 27

SIMILARITY INDEX

Plagiarism vs. similarity index and the role of similarity checking tools.

Plagiarism as defined above refers to the intentional (covert) or unintentional (accidental) theft of published or unpublished intellectual property (i.e., words or ideas), whereas similarity index refers to “ the extent of overlap or match between an author's work compared to other existing sources (books, websites, student thesis, and research articles) in the databases of similarity checking tools. ” 9 , 24 The advancements in information technology has helped researchers get help from various freely available (i.e., Viper, eTBLAST/HelioBLAST, PlagScan, PlagiarismDetect, Antiplagiat, Plagiarisma, DupliChecker) and subscription-based (i.e., iThenticate, Turnitin, Similarity Check) similarity checking tools. 8 , 24 Many journal editors use iThenticate and/or Similarity Check (Crossref) for screening submitted manuscripts for similarity detection whereas Turnitin is commonly used by universities and faculty to assess text similarity in students' work; however, there is a fairness issue that not every journal or university, particularly those from developing countries, can afford to pay for using these subscription-based services. 28 For instance, an online survey found that only about 18% participants could use Turnitin through their university subscription. 24 Another problem is the way these tools are commonly referred to as i.e., plagiarism detection tools, plagiarism checking software, or plagiarism detection programs. However, based on the function they perform, it would be appropriate to call them differently, such as similarity checking tools, similarity checkers, text-matching tools, or simply text-duplicity detection tools. 5 , 8 , 23 This means that these tools help locate matching or overlapping text (similarity) in submitted work, without directly flagging up plagiarism. 24

Taking Turnitin as an example, these tools reflect the text similarity through color codes, each linked to an online source of it; details for this have been described elsewhere. 23 , 28 Journal editors, universities and some organizations consider text above specific cutoff values for the percentage of similarity as problematic. According to a paper, 5% or less text similarity (overlap of the text in the manuscript with text in the online literature) is acceptable to some journal editors, while others might want to put the manuscript under scrutiny if the text similarity is over 20%. 29 , 30 Another paper observed that journal editors tend to reject a manuscript if text similarity is above 10%. 31 The study on participants completing the AuthorAID MOOC on Research Writing also found that some participants reported that their institutions consider text similarity of less than 20% as acceptable. 24 As an example, the guidelines of the University Grants Commission of India allow for similarity up to 10% as acceptable or minor (Level 0), but anything above is categorized into different levels (based on the percentages), each with separate list of repercussions for students and researchers. 32 This approach might miss the cases where the acceptable similarity of 10% comes from a single source, especially if the editors relied on the numbers only. In addition, this approach has the potential for punishing authors who have not committed plagiarism at all. To illustrate this, the randomly written text presented in Fig. 1 would be considered plagiarism based on the rule of cutoff values. Some authors opine that text with over four consecutive words or a number of word strings should be treated as plagiarized. 28 , 33 This again is not a good idea as the text “the International Physical Activity Questionnaire was used to measure …” would be same in several papers, but this is definitely not plagiarism because the methodology of different papers on the same topic could be similar; so, the decision should not be based on the numbers reflected by similarity detection tools. 28 Therefore, it would be prudent not to set any cutoff values for text similarity as it will lead to a slippery slope (“a course of action that seems to lead inevitably from one action or result to another with unintended consequences” –defined by Merriam-Webster Dictionary ) and give “a sense of impunity to the perpetrators.” 32

An external file that holds a picture, illustration, etc.
Object name is jkms-35-e217-g001.jpg

Drawbacks of similarity checking tools

There are a few drawbacks on completely relying on the similarity checking tools. First, these tools are not foolproof and might miss the incidents of translational plagiarism and figure plagiarism. 24 Translational plagiarism is the most invisible type of copying in non-Anglophone countries where an article published in languages other than English is copied (with or without minor modifications) and published in an English journal or vice versa. 10 This is indeed extremely difficult type of plagiarism to detect, and different approaches (e.g., use of Google translator) to address it have been recently reported. 34 , 35 Nevertheless, there might be some cases where this practice maybe acceptable, such as publishing policy papers (see “ Identifying predatory or pseudo-journals” – this paper was published in International Journal of Occupational and Environmental Medicine , National Medical Journal of India , and Biochemia Medica in 2017 by authors affiliated with World Association of Medical Editors (WAME) – or “The revised guidelines of the Medical Council of India for academic promotions: Need for a rethink” – this paper was published in over ten journals during 2016 by four journal editors and endorsed by members (not all) of the Indian Association of Medical Journal Editors, for example). Second, text similarity in some parts of manuscript (i.e., methods and results) should be weighed differently from other sections (i.e., introduction and discussions) and its conclusions. 31 In addition, based on the personal experience of the author of this paper, some individuals might use a sophisticated technique to avoid detection of high similarity through the use of inappropriate synonyms, jargon, and deliberate grammatical and structural errors in the text of the manuscript. Third, plagiarism of ideas may be missed by these tools as they can only detect plagiarism of words. 23 , 32 Therefore, similarity checking tools tend to underestimate plagiarized text or sometimes overestimate non-plagiarized material as problematic ( Fig. 1 ). 24 , 36 It should be noted that these tools serve as only an aid to determine suspected instances of plagiarism and the text of the manuscript should always be evaluated by experts, so “a careful human cannot be replaced.” 31 , 37 A few papers published in the Journal of Korean Medical Science have presented the examples where plagiarized content was missed by similarity checking tools and later noticed after a careful examination of the text. 9 , 10 Finally, plagiarism of unpublished work cannot be detected by these tools as they are limited to online sources only. 23 This is particularly important in the context of developing countries where research theses/dissertations of students are not deposited in research repositories, and where commercial, predatory editing and brokering services exist. 10 , 38 For example, the research repository of the Higher Education Commission of Pakistan allows deposition of doctoral theses only, and less than five universities (out of over 150) across the country have a research repository allowing for deposition of scholarly content. 38 Recently some strange trend of predatory editing and brokering services has emerged that offer clones of previously published papers or unpublished work to non-Anglophone or some lazy authors demanding quick and easy route to publications for promotion and career advancement. 10 Although plagiarism of unpublished work would not be easy for experts to detect, this may be possible through their previous experience and scholarly networks.

TROJAN CITATION: PERSONAL EXPERIENCE

A recent experience worth discussion in context to plagiarism comes in the shape of the Trojan citation where someone “ makes reference to a source one time to in order to evade detection (by editors and readers) of bad intentions and provide cover for a deeper, more pervasive plagiarism. ” 39 This practice is particularly common in those with an intent of deceiving the readers and playing with the system. A few months ago, the author of this paper was invited to review a manuscript on predatory publishing by a journal. The content of the manuscript appeared suspicious but was not labelled “plagiarized” during the first round of the review. However, during the second round, it was noticed that this was a case of Trojan citation where the author(s) cited the main source for a minor point and copied the major part of the manuscript from a paper published in Biochemia Medica (a Croatian journal) with slight modification in the content. 40 The editor of the journal was informed about this and the manuscript was rejected further processing. This example suggests that careful human intervention by experts is required to highlight the cases of plagiarism.

In conclusion, what we know about the growth in the prevalence of plagiarism may be ‘just the tip of the iceberg’. Therefore, collective contribution from authors, reviewers, and editors, particularly from Asia-Pacific region, is required. Authors from the Asia-Pacific region and developing countries, with an expertise on this topic, should play their role by supporting journal editors and through their mentorship skills. Furthermore, senior researchers should encourage and help their honors and master students to publish their unpublished work before it gets stolen by commercial, brokering agencies. They should also work in close collaboration with universities and organizations related with higher education in countries where this issue is not properly addressed, and should facilitate education and training sessions on plagiarism as previous evidence suggests that workshops and online training sessions may be helpful. 5 On the other hand, journal editors from Asia-Pacific region and developing countries should not judge the manuscripts solely on the basis of percentage of similarity as reflected by similarity checking services. They should have a database of their own where manuscripts about plagiarism in scientific writing, for example, should be sent for review to the experts on this subject. As journal editors may not be experts in all fields, networking and seeking help from experts would be helpful in avoiding the cases of plagiarism in the future. It would be appropriate that the journal editors and the trainee editors, particularly from the resource-limited countries, are educated about the concept of scientific misconduct and the advancement in knowledge around this area. Moreover, journal editors should publish and publically discuss the cases of plagiarism as a learning experience for others. The Journal of Korean Medical Science has used this approach regarding cases of plagiarism, which other journals from the region are encouraged to adopt. 9 , 10 Likewise, a paper discussing case scenarios of salami publication (i.e., “ a distinct form of redundant publication which is usually characterized by similarity of hypothesis, methodology or results but not text similarity ”) serves as a good example of how journal editors may facilitate authors to utilize their mentorship skills and support journals in educating researchers. 41 There should be strict penalties on cases of plagiarism, and safety measures for security of whistleblowers should be in place and be ensured. By doing so, evil and lazy authors who bypass the system would be punished and honest authors would be served. Thus, the take-home message for editors from Asia-Pacific region is that a collective effort and commitment from authors, reviewers, editors and policy-makers is required to address the problem of plagiarism, especially in the developing and non-English speaking countries.

Disclosure: The author has no potential conflicts of interest to disclose.

Free Al Office Suite with PDF Editor

Edit Word, Excel, and PPT for FREE.

Read, edit, and convert PDFs with the powerful PDF toolkit.

Microsoft-like interface, easy to use.

Windows • MacOS • Linux • iOS • Android

banner

  • Articles of Word

How to Write a Research Paper [Steps & Examples]

As a student, you are often required to complete numerous academic tasks, which can demand a lot of extra effort. Writing a research paper is one of these tasks. If researching for the topic isn't challenging enough, writing it down in a specific format adds another layer of difficulty. Having gone through this myself, I want to help you have a smoother journey in writing your research paper. I'll guide you through everything you need to know about writing a research paper, including how to write a research paper and all the necessary factors you need to consider while writing one.

Order for Preparation of your research paper

Before beginning your research paper, start planning how you will organize your paper. Follow the specific order I have laid out to ensure you assemble everything correctly, cover all necessary components, and write more effectively. This method will help you avoid missing important elements and improve the overall quality of your paper.

Figures and Tables

Assemble all necessary visual aids to support your data and findings. Ensure they are labeled correctly and referenced appropriately in your text.

Detail the procedures and techniques used in your research. This section should be thorough enough to allow others to replicate your study.

Summarize the findings of your research without interpretation. Use figures and tables to illustrate your data clearly.

Interpret the results, discussing their implications and how they relate to your research question. Address any limitations and suggest areas for future research.

Summarize the key points of your research, restating the significance of your findings and their broader impact.

Introduction

Introduce the topic, provide background information, and state the research problem or hypothesis. Explain the purpose and scope of your study.

Write a concise summary of your research, including the objective, methods, results, and conclusion. Keep it brief and to the point.

Create a clear and informative title that accurately reflects the content and focus of your research paper.

Identify key terms related to your research that will help others find your paper in searches.

Acknowledgements

Thank those who contributed to your research, including funding sources, advisors, and any other significant supporters.

Compile a complete list of all sources cited in your paper, formatted according to the required citation style. Ensure every reference is accurate and complete.

Types of Research Papers

There are multiple types of research papers, each with distinct characteristics, purposes, and structures. Knowing which type of research paper is required for your assignment is crucial, as each demands different preparation and writing strategies. Here, we will delve into three prominent types: argumentative, analytical, and compare and contrast papers. We will discuss their characteristics, suitability, and provide detailed examples to illustrate their application.

A.Argumentative Papers

Characteristics:

An argumentative or persuasive paper is designed to present a balanced view of a controversial issue, but ultimately aims to persuade the reader to adopt the writer's perspective. The key characteristics of this type of paper include:

Purpose: The primary goal is to convince the reader to support a particular stance on an issue. This is achieved by presenting arguments, evidence, and refuting opposing viewpoints.

Structure: Typically structured into an introduction, a presentation of both sides of the issue, a refutation of the opposing arguments, and a conclusion that reinforces the writer’s position.

Tone: While the tone should be logical and factual, it should not be overly emotional. Arguments must be supported with solid evidence, such as statistics, expert opinions, and factual data.

Suitability:

Argumentative papers are suitable for topics that have clear, opposing viewpoints. They are often used in debates, policy discussions, and essays aimed at influencing public opinion or academic discourse.

Topic: "Should governments implement universal basic income?"

Pro Side: Universal basic income provides financial security, reduces poverty, and can lead to a more equitable society.

Con Side: It could discourage work, lead to higher government expenditure, and might not be a sustainable long-term solution.

Argument: After presenting both sides, the paper would argue that the benefits of reducing poverty and financial insecurity outweigh the potential drawbacks, using evidence from various studies and real-world examples.

Writing Tips:

Clearly articulate your position on the issue from the beginning.

Present balanced arguments by including credible sources that support both sides.

Refute counterarguments effectively with logical reasoning and evidence.

Maintain a factual and logical tone, avoiding excessive emotional appeals.

B.Analytical Papers

An analytical research paper is focused on breaking down a topic into its core components, examining various perspectives, and drawing conclusions based on this analysis. The main characteristics include:

Purpose: To pose a research question, collect data from various sources, analyze different viewpoints, and synthesize the information to arrive at a personal conclusion.

Structure: Includes an introduction with a clear research question, a literature review that summarizes existing research, a detailed analysis, and a conclusion that summarizes findings.

Tone: Objective and neutral, avoiding personal bias or opinion. The focus is on data and logical analysis.

Analytical research papers are ideal for topics that require detailed examination and evaluation of various aspects. They are common in disciplines such as social sciences, humanities, and natural sciences, where deep analysis of existing research is crucial.

Topic: "The impact of social media on mental health."

Research Question: How does social media usage affect mental well-being among teenagers?

Analysis: Examine studies that show both positive (e.g., social support) and negative (e.g., anxiety and depression) impacts of social media. Analyze the methodologies and findings of these studies.

Conclusion: Based on the analysis, conclude whether the overall impact is more beneficial or harmful, remaining neutral and presenting evidence without personal bias.

Maintain an objective and neutral tone throughout the paper.

Synthesize information from multiple sources, ensuring a comprehensive analysis.

Develop a clear thesis based on the findings from your analysis.

Avoid inserting personal opinions or biases.

C.Compare and Contrast Papers

Compare and contrast papers are used to analyze the similarities and differences between two or more subjects. The key characteristics include:

Purpose: To identify and examine the similarities and differences between two or more subjects, providing a comprehensive understanding of their relationship.

Structure: Can be organized in two ways:

Point-by-Point: Each paragraph covers a specific point of comparison or contrast.

Subject-by-Subject: Each subject is discussed separately, followed by a comparison or contrast.

Tone: Informative and balanced, aiming to provide a thorough and unbiased comparison.

Compare and contrast papers are suitable for topics where it is important to understand the distinctions and similarities between elements. They are commonly used in literature, history, and various comparative studies.

Topic: "Compare and contrast the leadership styles of Martin Luther King Jr. and Malcolm X."

Comparison Points: Philosophies (non-violence vs. militant activism), methods (peaceful protests vs. more radical approaches), and impacts on the Civil Rights Movement.

Analysis: Describe each leader's philosophy and method, then analyze how these influenced their effectiveness and legacy.

Conclusion: Summarize the key similarities and differences, and discuss how both leaders contributed uniquely to the movement.

Provide equal and balanced coverage to each subject.

Use clear criteria for comparison, ensuring logical and coherent analysis.

Highlight both similarities and differences, ensuring a nuanced understanding of the subjects.

Maintain an informative tone, focusing on objective analysis rather than personal preference.

How to Write A Research Paper [Higher Efficiency & Better Results]

Conduct Preliminary Research

Before we get started with the research, it's important to gather relevant information related to it. This process, also known as the primary research method, helps researchers gain preliminary knowledge about the topic and identify research gaps. Whenever I begin researching a topic, I usually utilize Google and Google Scholar. Another excellent resource for conducting primary research is campus libraries, as they provide a wealth of great articles that can assist with your research.

Now, let's see how WPS Office and AIPal can be great research partners:

Let's say that I have some PDFs which I have gathered from different sources. With WPS Office, these PDFs can be directly uploaded not just to extract key points but also to interact with the PDF with special help from WPS AI.

Step 1: Let's open the PDF article or research paper that we have downloaded on WPS Office.

Step 2: Now, click on the WPS AI widget at the top right corner of the screen.

Step 3: This will open the WPS PDF AI pane on the right side of the screen. Click on "Upload".

Step 4: Once the upload is complete, WPS PDF AI will return with the key points from the PDF article, which can then be copied to a fresh new document on WPS Writer.

Step 5: To interact further with the document, click on the "Inquiry" tab to talk with WPS AI and get more information on the contents of the PDF.

Research is incomplete without a Google search, but what exactly should you search for? AIPal can help you with these answers. AIPal is a Chrome extension that can help researchers make their Google searches and interactions with Chrome more effective and efficient. If you haven't installed AIPal on Chrome yet, go ahead and download the extension; it's completely free to use:

Step 1: Let's search for a term on Google related to our research.

Step 2: An AIPal widget will appear right next to the Google search bar, click on it.

Step 3: Upon clicking it, an AIPal window will pop up. In this window, you will find a more refined answer for your searched term, along with links most relevant to your search, providing a more refined search experience.

WPS AI can also be used to extract more information with the help of WPS Writer.

Step 1: We might have some information saved in a Word document, either from lectures or during preliminary research. We can use WPS AI within Writer to gain more insights.

Step 2: Select the entire text you want to summarize or understand better.

Step 3: Once the text is selected, a hover menu will appear. Click on the "WPS AI" icon in this menu.

Step 4: From the list of options, click on "Explain" to understand the content more deeply, or click on "Summarize" to shorten the paragraph.

Step 5: The results will be displayed in a small WPS AI window.

Develop the Thesis statement

To develop a strong thesis statement, start by formulating a central question your paper will address. For example, if your topic is about the impact of social media on mental health, your thesis statement might be:

"Social media use has a detrimental effect on mental health by increasing anxiety, depression, and loneliness among teenagers."

This statement is concise, contentious, and sets the stage for your research. With WPS AI, you can use the "Improve" feature to refine your thesis statement, ensuring it is clear, coherent, and impactful.

Write the First draft

Begin your first draft by focusing on maintaining forward momentum and clearly organizing your thoughts. Follow your outline as a guide, but be flexible if new ideas emerge. Here's a brief outline to get you started:

Using WPS AI’s "Make Longer" feature, you can quickly elaborate key ideas and points of your studies and articles into a descriptive format to include in your draft, saving time and ensuring clarity.

Compose Introduction, Body and Conclusion paragraphs

When writing a research paper, it’s essential to transform your key points into detailed, descriptive paragraphs. WPS AI can help you streamline this process by enhancing your key points, ensuring each section of your paper is well-developed and coherent. Here’s how you can use WPS AI to compose your introduction, body, and conclusion paragraphs:

Let's return to the draft and start composing our introduction. The introduction should provide the background of the research paper and introduce readers to what the research paper will explore.

If your introduction feels too brief or lacks depth, use WPS AI’s "Make Longer" feature to expand on key points, adding necessary details and enhancing the overall narrative.

Once the introduction is completed, the next step is to start writing the body paragraphs and the conclusion of our research paper. Remember, the body paragraphs will incorporate everything about your research: methodologies, challenges, results, and takeaways.

If this paragraph is too lengthy or repetitive, WPS AI’s "Make Shorter" feature can help you condense it without losing essential information.

Write the Second Draft

In the second draft, refine your arguments, ensure logical flow, and check for clarity. Focus on eliminating any unnecessary information, ensuring each paragraph supports your thesis statement, and improving transitions between ideas. Incorporate feedback from peers or advisors, and ensure all citations are accurate and properly formatted. The second draft should be more polished and coherent, presenting your research in a clear and compelling manner.

WPS AI’s "Improve Writing" feature can be particularly useful here to enhance the overall quality and readability of your paper.

WPS Spellcheck can assist you in correcting spelling and grammatical errors, ensuring your paper is polished and professional. This tool helps you avoid common mistakes and enhances the readability of your paper, making a significant difference in the overall quality.

Bonus Tips: How to Get Inspiration for your Research Paper- WPS AI

WPS Office is a phenomenal office suite that students find to be a major blessing. Not only is it a free office suite equipped with advanced features that make it competitive in the market, but it also includes a powerful AI that automates and enhances many tasks, including writing a research paper. In addition to improving readability with its AI Proofreader tool, WPS AI offers two features, "Insight" and "Inquiry", that can help you gather information and inspiration for your research paper:

Insight Feature:

The Insight feature provides deep insights and information on various topics and fields. It analyzes literature to extract key viewpoints, trends, and research directions. For instance, if you're writing a research paper on the impact of social media on mental health, you can use the Insight feature to gather a comprehensive overview of the latest studies, key arguments, and emerging trends in this field. This helps you build a solid foundation for your paper and ensure you are covering all relevant aspects.

Inquiry Feature:

The Inquiry feature allows you to ask specific questions related to your research topic. This helps you gather necessary background information and refine your research focus effectively. For example, if you need detailed information on how social media usage affects teenagers' self-esteem, you can use the Inquiry feature to ask targeted questions and receive relevant answers based on the latest research.

FAQs about writing a research paper

1. can any source be used for academic research.

No, it's essential to use credible and relevant sources. Here is why:

Developing a Strong Argument: Your research paper relies on evidence to substantiate its claims. Using unreliable sources can undermine your argument and harm the credibility of your paper.

Avoiding Inaccurate Information: The internet is abundant with data, but not all sources can be considered reliable. Credible sources guarantee accuracy.

2. How can I avoid plagiarism?

To avoid plagiarism, follow these steps:

Keep Records of Your Sources: Maintain a record of all the sources you use while researching. This helps you remember where you found specific ideas or phrases and ensures proper attribution.

Quote and Paraphrase Correctly: When writing a paper, use quotation marks for exact words from a source and cite them properly. When paraphrasing, restate the idea in your own words and include a citation to acknowledge the original source.

Utilize a Plagiarism Checker: Use a plagiarism detection tool before submitting your paper. This will help identify unintentional plagiarism, ensuring your paper is original and properly referenced.

3. How can I cite sources properly?

Adhere to the citation style guide (e.g., APA, MLA) specified by your instructor or journal. Properly citing all sources both within the text and in the bibliography or references section is essential for maintaining academic integrity and providing clear credit to the original authors. This practice also helps readers locate and verify the sources you've used in your research.

4. How long should a research paper be?

The length of a research paper depends on its topic and specific requirements. Generally, research papers vary between 4,000 to 6,000 words, with shorter papers around 2,000 words and longer ones exceeding 10,000 words. Adhering to the length requirements provided for academic assignments is essential. More intricate subjects or extensive research often require more thorough explanations, which can impact the overall length of the paper.

Write Your Research Paper with the Comfort of Using WPS Office

Writing a research paper involves managing numerous complicated tasks, such as ensuring the correct formatting, not missing any crucial information, and having all your data ready. The process of how to write a research paper is inherently challenging. However, if you are a student using WPS Office, the task becomes significantly simpler. WPS Office, especially with the introduction of WPS AI, provides all the resources you need to write the perfect research paper. Download WPS Office today and discover how it can transform your research paper writing experience for the better.

  • 1. Free Graph Paper: Easy Steps to Make Printable Graph Paper PDF
  • 2. How to Write a Proposal [ Steps & Examples]
  • 3. How to Write a Conclusion - Steps with Examples
  • 4. How to Write an Abstract - Steps with Examples
  • 5. How to Write a Hook- Steps With Examples
  • 6. How to Use WPS AI/Chatgpt to Write Research Papers: Guide for Beginners

plagiarism in research papers

15 years of office industry experience, tech lover and copywriter. Follow me for product reviews, comparisons, and recommendations for new apps and software.

Scribbr Plagiarism Checker

Plagiarism checker software for students who value accuracy

Extensive research shows that Scribbr's plagiarism checker, in partnership with Turnitin, detects plagiarism more accurately than other tools, making it the no. 1 choice for students.

plagiarism-checker-comparison-2022

What you get with a premium plagiarism check

Plagiarism report

Plagiarism Checker

Catch accidental plagiarism with high accuracy with Scribbr’s Plagiarism Checker in partnership with Turnitin.

AI detector

AI Detector

Detect AI-generated content, like ChatGPT3.5 and GPT4, with Scribbr’s AI Detector.

Grammar checked document

AI Proofreader

Find and fix spelling and grammar issues with Scribbr’s AI Proofreader.

* Only available when uploading an English .docx (Word) document

How Scribbr detects plagiarism better

Scribbr is an authorized Turnitin partner

Powered by leading plagiarism checking software

Scribbr is an authorized partner of Turnitin, a leader in plagiarism prevention. Its software detects everything from exact word matches to synonym swapping .

Exclusive content databases

Access to exclusive content databases

Your submissions are compared to the world’s largest content database , covering 99 billion webpages, 8 million publications, and over 20 languages.

Upload documents to check for self-plagiarism

Comparison against unpublished works

You can upload your previous assignments, referenced works, or a classmate’s paper or essay to catch (self-)plagiarism that is otherwise difficult to detect.

Turnitin Similarity Report

The Scribbr Plagiarism Checker is perfect for you if:

  • Are a student writing an essay or paper
  • Value the confidentiality of your submissions
  • Prefer an accurate plagiarism report
  • Want to compare your work against publications

This tool is not for you if you:

  • Prefer a free plagiarism checker despite a less accurate result
  • Are a copywriter, SEO, or business owner

Get started

Trusted by students and academics worldwide

University applicant checking their essay for plagiarism

University applicants

Ace your admissions essay to your dream college.

Compare your admissions essay to billions of web pages, including other essays.

  • Avoid having your essay flagged or rejected for accidental plagiarism.
  • Make a great first impression on the admissions officer.

Student checking for plagiarism

Submit your assignments with confidence.

Detect plagiarism using software similar to what most universities use.

  • Spot missing citations and improperly quoted or paraphrased content.
  • Avoid grade penalties or academic probation resulting from accidental plagiarism.

Academic working to prevent plagiarism

Take your journal submission to the next level.

Compare your submission to millions of scholarly publications.

  • Protect your reputation as a scholar.
  • Get published by the journal of your choice.

Money-back guarantee

Happiness guarantee

Scribbr’s services are rated 4.9 out of 5 based on 13,544 reviews. We aim to make you just as happy. If not, we’re happy to refund you !

Privacy guarantee

Privacy guarantee

Your submissions will never be added to our content database, and you’ll never get a 100% match at your academic institution.

Price per document

Select your currency

Prices are per check, not a subscription

  • Turnitin-powered plagiarism checker
  • Access to 99.3B web pages & 8M publications
  • Comparison to private papers to avoid self-plagiarism
  • Downloadable plagiarism report
  • Live chat with plagiarism experts
  • Private and confidential

Volume pricing available for institutions. Get in touch.

Request volume pricing

Institutions interested in buying more than 50 plagiarism checks can request a discounted price. Please fill in the form below.

Name * Email * Institution Name * Institution’s website * Country * Phone number Give an indication of how many checks you need * Please indicate how you want to use the checks * Depending of the size of your request, you will be contacted by a representative of either Scribbr or Turnitin. * Required

Avoiding accidental plagiarism

You don't need a plagiarism checker, right?

You would never copy-and-paste someone else’s work, you’re great at paraphrasing, and you always keep a tidy list of your sources handy.

But what about accidental plagiarism ? It’s more common than you think! Maybe you paraphrased a little too closely, or forgot that last citation or set of quotation marks.

Even if you did it by accident, plagiarism is still a serious offense. You may fail your course, or be placed on academic probation. The risks just aren’t worth it.

Scribbr & academic integrity

Scribbr is committed to protecting academic integrity. Our plagiarism checker software, Citation Generator , proofreading services , and free Knowledge Base content are designed to help educate and guide students in avoiding unintentional plagiarism.

We make every effort to prevent our software from being used for fraudulent or manipulative purposes.

Ask our team

Want to contact us directly? No problem.  We  are always here for you.

Support team - Nina

Frequently asked questions

No, the Self-Plagiarism Checker does not store your document in any public database.

In addition, you can delete all your personal information and documents from the Scribbr server as soon as you’ve received your plagiarism report.

Scribbr’s Plagiarism Checker is powered by elements of Turnitin’s Similarity Checker , namely the plagiarism detection software and the Internet Archive and Premium Scholarly Publications content databases .

The add-on AI detector is powered by Scribbr’s proprietary software.

Extensive testing proves that Scribbr’s plagiarism checker is one of the most accurate plagiarism checkers on the market in 2022.

The software detects everything from exact word matches to synonym swapping. It also has access to a full range of source types, including open- and restricted-access journal articles, theses and dissertations, websites, PDFs, and news articles.

At the moment we do not offer a monthly subscription for the Scribbr Plagiarism Checker . This means you won’t be charged on a recurring basis – you only pay for what you use. We believe this provides you with the flexibility to use our service as frequently or infrequently as you need, without being tied to a contract or recurring fee structure.

You can find an overview of the prices per document here:

Small document (up to 7,500 words) $19.95
Normal document (7,500-50,000 words) $29.95
Large document (50,000+ words) $39.95

Please note that we can’t give refunds if you bought the plagiarism check thinking it was a subscription service as communication around this policy is clear throughout the order process.

Your document will be compared to the world’s largest and fastest-growing content database , containing over:

  • 99.3 billion current and historical webpages.
  • 8 million publications from more than 1,700 publishers such as Springer, IEEE, Elsevier, Wiley-Blackwell, and Taylor & Francis.

Note: Scribbr does not have access to Turnitin’s global database with student papers. Only your university can add and compare submissions to this database.

Scribbr’s plagiarism checker offers complete support for 20 languages, including English, Spanish, German, Arabic, and Dutch.

The add-on AI Detector and AI Proofreader are only available in English.

The complete list of supported languages:

If your university uses Turnitin, the result will be very similar to what you see at Scribbr.

The only possible difference is that your university may compare your submission to a private database containing previously submitted student papers. Scribbr does not have access to these private databases (and neither do other plagiarism checkers).

To cater to this, we have the Self-Plagiarism Checker at Scribbr. Just upload any document you used and start the check. You can repeat this as often as you like with all your sources. With your Plagiarism Check order, you get a free pass to use the Self-Plagiarism Checker. Simply upload them to your similarity report and let us do the rest!

Your writing stays private. Your submissions to Scribbr are not published in any public database, so no other plagiarism checker (including those used by universities) will see them.

List of Famous Plagiarism Lawsuit Cases that Reached Court

Table of contents

  • 1 What Is A Plagiarism Lawsuit Case?
  • 2 Consequences of Plagiarism Lawsuits
  • 3 List of the Most Famous Plagiarism Lawsuit Cases
  • 4 Let’s Make A Change!

plagiarism in research papers

The purpose of this text is to explain illegal copying of other people’s words and ideas and list the nine most famous plagiarism lawsuit cases.

What Is A Plagiarism Lawsuit Case?

When we say that we have a case of plagiarism, we mean that a particular person used another individual’s intellectual property like research, text, or idea and presented them as their own. Naturally, the person who plagiarizes doesn’t give any credit to the rightful owner.

Such cases can happen in different spheres of education and culture, such as writing and poetry, music, cinematography, scientific research, etc. Sometimes, there are even plagiarism cases where a country plagiarizes a national flag or another country’s national anthem.

Consequences of Plagiarism Lawsuits

Numerous countries have strict policies and laws on illegal copying in their legislative systems, so some cases become plagiarism lawsuit cases. After the plagiarism court case ends, the person who plagiarized has to bear a particular punishment.

These consequences will depend on the type of appropriation in question, the volume of plagiarized content, the country’s attitude towards borrowings and originality, and the status of the “author.” Copying among undergraduates can lead to a serious warning or a notice.

Sometimes, a student may get expelled, but plagiarism in dissertation theses or government officials’ speeches may lead to severe consequences such as dismissal from the designated position or a bad reputation.

In line with this, when a famous person gets accused of plagiarism, the case will usually end in a lawsuit and reach court. The consequences of such cases can lead to the destruction of careers and lives. For example, in cases that involve famous writers, politicians, compositors, artists, and filmmakers.

A lot of celebrities were involved in appropriation that ended in a scandal. Some cases are so high-profile that they are interesting for each individual who digs a little deeper into plagiarism. Let’s see the nine most famous examples of such cases of “borrowing”.

List of the Most Famous Plagiarism Lawsuit Cases

Here’s a list of nine famous copying cases where each incident was reached and resolved in court. You’ll also be able to see what the plagiarism consequences in such cases can be.

  • “Blurred Lines” uses blurred sources.

Marvin Gaye’s children accused the duo that created the song “Blurred Lines” in 2013 of plagiarism. The children claimed that the duo copied parts for “Blurred Lines” from Gaye’s original “Got to Give Up.” They requested an astounding $7.3 million.

The members of the duo, Robin Thicke and Pharell Williams, admitted that there were certain similarities between the two songs but stated that those were coincidental rather than a case of plagiarizing. Whatever the case, $7.3 million is a lot of money to pay for a coincidence.

  • Taylor Swift to “Shake it Off” plagiarism.

Taylor Swift’s case is one of the most ridiculous ones in the history of music. She was accused of allegedly plagiarizing the lyrics of Jesse Brahan, an R&B singer. He accuses Swift of using parts from his song called “Haters Gonna Hate” in her hit “Shake it Off.”

For this coincidence, Braham wanted Swift to pay the enormous $42 million. However, after numerous reviews, the board concluded that some phrases in the song were not the case of stolen intellectual property and are often used in rhyme. The case was a ridiculous attempt to make money.

  • Joseph Biden’s self-plagiarism.

Senator Joe Biden was accused of plagiarism, proclaimed guilty, and forced to remove himself from the 1988 presidential election campaign. He admitted that he plagiarized his text from an introductory methodology class and failed to provide the right citations.

The case reached the Delaware Supreme Court, where the court dismissed all allegations. However, Biden was again accused of appropriation later in his career. This time, copying was detected in his public speech where he didn’t give credit to Senator Kennedy. Because of that, the court removed Biden from the Presidential Campaign.

  • Karl-Theodor Zu Guttenberg and his dissertation scam.

Karl-Theodor Zu Guttenberg was the German Minister of Defense and was accused of copying the greater part of his doctoral dissertation. When an analytical review took place, it was apparent that he plagiarized 63% of his work without giving credits. He was removed from his position instantly and lost his Doctoral title.

  • Annette Shcavan’s Ph.D. thesis plagiarism.

As the German Minister of Education and Research, the University of Dusseldorf accused Annette Schavan of plagiarizing her Ph.D. thesis. When an investigation led by the University of Dusseldorf took place, it was apparent that Schavan used 60 secondary sources in her work without ever citing or giving credit to the original authors.

After all this information came to light, the university revoked Annette’s doctoral dissertation and permanently removed Schavan from the minister of education and research position. If the University of Dusseldorf hadn’t performed a check for plagiarism, the issue would never have been discovered and resolved.

  • Victor Ponta and his copy-paste Ph.D. thesis.

Victor Ponta, the Romanian Prime Minister, was accused of plagiarism regarding his Ph.D. thesis, where, according to reviews, 27% of the thesis was an account of copy-paste. Ponta denied all accusations and said that his political enemies wanted to crush him.

However, later on, Victor admitted that some parts of his doctoral dissertation were a case of copied content. The most surprising moment of this case of plagiarism was the Ministry of Education’s reaction. They disapproved of the allegations, took back the claim, and dissolved the committee working on the case.

  • Florida Times and plagiarized content.

Florida Times led an investigation in 2021, claiming that there was plagiarism in the editorial content. The paper found numerous cases of paraphrasing and copying in this investigation. One of the most respected editors of the Florida Times, Lloyd Brown, admitted there was copying in the editorial content.

However, he mentioned that this type of plagiarism was never intentional. Because he came clean in this case, feeling shame, he resigned from the position as the content editor of Florida Times.

  • The New York Times and plagiarized content.

A content editor at New York Times, Jayson Blair, went an extra mile than Florida Times’ Lloyd Brown. He was engaged in intellectual fraud daily, and what’s even worse, he did it intentionally. When the investigators reviewed his work, they discovered counterfeit info, photos, facts, etc.

The board of editors at the New York Times characterized this issue due to bad coordination among the paper’s staff. In any case, this unqualified content that entered the NYT was colossal damage to the company.

  • The Hungarian Deputy Prime Minister and his plagiarized Ph.D. thesis content.

According to a news portal in Hungary, Zsolt Semjen, a Hungarian Deputy Prime Minister, plagiarized around 40% of his Ph.D. thesis. He didn’t cite any of the content he plagiarized. However, the university where he received his doctoral degree couldn’t decide upon the original authors of the content Semjen plagiarized and attribute it to them.

For this reason, even though the board revoked his Ph.D. degree, Semjen continued to hold his original position as the Deputy Prime Minister. Not only did he continue to hold his political status, but he continued to hold his Ph.D. status as well.

Let’s Make A Change!

As academic individuals, we can all do our part and help prevent plagiarism. First and foremost, it’s essential to keep notes and track all references citing the original authors. When a sentence is used word-for-word from another source, quotation marks are a must.

Most importantly, when we write essays, any thesis, or a Ph.D. work, it’s vital to make sure that we never copy-paste any material from other authors and always check for plagiarism before submitting your final work. We should always focus on making original arguments, and the cited statements from others can only help support our claims.

Readers also enjoyed

How to Ask Your Professor for an Extension on Assignment

We use cookies to give you the best experience possible. By continuing we’ll assume you board with our cookie policy.

plagiarism in research papers

IMAGES

  1. Examples of plagiarism: Types of Plagiarism in Academic Research

    plagiarism in research papers

  2. Lecture: Research Paper (Plagiarism)

    plagiarism in research papers

  3. What is Plagiarism?

    plagiarism in research papers

  4. Plagiarism in Research and How to Avoid It

    plagiarism in research papers

  5. ⭐ How to avoid plagiarism when writing a research paper. How to Avoid

    plagiarism in research papers

  6. Plagiarism in Research: Is It Worth the Hassle?

    plagiarism in research papers

COMMENTS

  1. Plagiarism detection and prevention: a primer for researchers

    Creative thinking and plagiarism. Plagiarism is often revealed in works of novice non-Anglophone authors who are exposed to a conservative educational environment that encourages copying and memorizing and rejects creative thinking [12, 13].The gaps in training on research methodology, ethical writing, and acceptable editing support are also viewed as barriers to targeting influential journals ...

  2. (PDF) Plagiarism in research

    Plagiarism is the act of using someone else's work or ideas as if they were your own, without giving proper credit to the original source. [9, 10] In the context of research, plagiarism can take ...

  3. How to Avoid Plagiarism

    To avoid plagiarism, you need to correctly incorporate these sources into your text. You can avoid plagiarism by: Keeping track of the sources you consult in your research. Paraphrasing or quoting from your sources (by using a paraphrasing tool and adding your own ideas) Crediting the original author in an in-text citation and in your reference ...

  4. What is plagiarism and how to avoid it?

    Self plagiarism: "Publication of one's own data that have already been published is not acceptable since it distorts scientific record." 1 Self-plagiarized publications do not contribute to scientific work; they just increase the number of papers published without justification in scientific research. 8 The authors get benefit in the form of increased number of published papers. 8 Self ...

  5. What Is Plagiarism?

    The accuracy depends on the plagiarism checker you use. Per our in-depth research, Scribbr is the most accurate plagiarism checker. Many free plagiarism checkers fail to detect all plagiarism or falsely flag text as plagiarism. Plagiarism checkers work by using advanced database software to scan for matches between your text and existing texts.

  6. Plagiarism in Research explained: The complete Guide

    These aspects help institutions and publishers define plagiarism types more accurately. The agreed-upon forms of plagiarism that occur in research writing include: 1. Global or Complete Plagiarism. Global or Complete plagiarism is inarguably the most severe form of plagiarism — It is as good as stealing.

  7. Plagiarism in Scientific Research and Publications and How to Prevent

    There are ways to avoid plagiarism, and should just be followed simple steps when writing a paper. There are several ways to avoid plagiarism ( 1, 6 ): Paraphrasing - When information is found that is great for research, it is read and written with own words. Quote - Very efficient way to avoid plagiarism.

  8. How to Avoid Plagiarism

    How to Avoid Plagiarism. It's not enough to know why plagiarism is taken so seriously in the academic world or to know how to recognize it. You also need to know how to avoid it. The simplest cases of plagiarism to avoid are the intentional ones: If you copy a paper from a classmate, buy a paper from the Internet, copy whole passages from a ...

  9. Research Guides: Citing Sources: How to Avoid Plagiarism

    The entire section below came from a research guide from Iowa State University. To avoid plagiarism, one must provide a reference to that source to indicate where the original information came from (see the "Source:" section below). ... As you prepare your paper or research, and as you begin drafting your paper. One good practice is to clearly ...

  10. Academic Plagiarism Detection: A Systematic Literature Review

    The papers we retrieved during our research fall into three broad categories: plagiarism detection methods, plagiarism detection systems, and plagiarism policies. Ordering these categories by the level of abstraction at which they address the problem of academic plagiarism yields the three-layered model shown in Figure 1 .

  11. PDF 7th Edition Avoiding Plagiarism Guide

    Avoiding Idea Plagiarism. To avoid idea plagiarism, use (a) signal phrases (e.g., "I believe that") to designate your own idea, or (b) include an in-text citation to a source to signal someone else's idea. Most important, always search the literature to find a source for any ideas, facts, or findings that you put in your paper.

  12. What Constitutes Plagiarism?

    In academic writing, it is considered plagiarism to draw any idea or any language from someone else without adequately crediting that source in your paper. It doesn't matter whether the source is a published author, another student, a website without clear authorship, a website that sells academic papers, or any other person: Taking credit for anyone else's work is stealing, and it is ...

  13. Plagiarism

    Plagiarism. Plagiarism is the act of presenting the words, ideas, or images of another as your own; it denies authors or creators of content the credit they are due. Whether deliberate or unintentional, plagiarism violates ethical standards in scholarship ( see APA Ethics Code Standard 8.11, Plagiarism ). Writers who plagiarize disrespect the ...

  14. The 5 Types of Plagiarism

    Table of contents. Global plagiarism: Plagiarizing an entire text. Verbatim plagiarism: Copying words directly. Paraphrasing plagiarism: Rephrasing ideas. Patchwork plagiarism: Stitching together sources. Self-plagiarism: Plagiarizing your own work. Frequently asked questions about plagiarism.

  15. How to Avoid Plagiarism in Research Papers (Part 1)

    The quotes should be exactly the way they appear in the paper you take them from. 3. Cite your Sources - Identify what does and does not need to be cited. The best way to avoid the misconduct of plagiarism is by self-checking your documents using plagiarism checkertools.

  16. How to avoid plagiarism in research papers

    Citing a source is a simple way to avoid plagiarism, but you must have the correct details of each source that you cite. Although tracing original papers is a lot easier now, it is also easier to make mistakes while copying or transcribing. Always cross-check all the citations and references. Conduct a plagiarism check on your manuscript.

  17. Recognizing & Avoiding Plagiarism in Your Research Paper

    Recognizing and Avoiding Plagiarism in Your Research Paper. Plagiarism in research is unfortunately still a serious problem today. Research papers with plagiarism contain unauthorized quoting from other authors; the writer may even try to pass off others' work as their own. This damages the individual's reputation, but also the entire class ...

  18. Knowing and Avoiding Plagiarism During Scientific Writing

    Committee on publication ethics definition. In 1999, the Committee on Publication Ethics (COPE) defined plagiarism as, "plagiarism ranges from the unreferenced use of others' published and unpublished ideas, including research grant applications to submission under "new" authorship of a complete paper, sometimes in a different language.

  19. Full article: The case for academic plagiarism education: A PESA

    Recent research testing tools for plagiarism detection 'show that although some systems can indeed help identify some plagiarized content, they clearly do not find all plagiarism and at times also identify non-plagiarized material as problematic' (Foltýnek et al, Citation 2020). There are now more than twenty major PDS on the market.

  20. Free Plagiarism Checker Online for Students

    Luckily, a free plagiarism checker online can solve this issue quickly and easily. Many cheap essay writing services use a plagiarism checker for research paper. However, students sometimes forget that they should too.

  21. In-text citations

    APA Style provides guidelines to help writers determine the appropriate level of citation and how to avoid plagiarism and self-plagiarism. We also provide specific guidance for in-text citation, including formats for interviews, classroom and intranet sources, and personal communications; in-text citations in general; and paraphrases and direct quotations.

  22. 10 Best Free Plagiarism Checkers

    Research methodology. The plagiarism tools in this research are tested using 4 test documents, ... Scribbr offers a limited free version that's perfect for checking if your paper contains potential plagiarism. To view the full report, you need to buy the premium version, which costs between $19.95 and $39.95 depending on the word count. ...

  23. 7 Best Plagiarism Checkers in 2024 (Free & Paid)

    Students use plagiarism checkers to ensure their academic papers, assignments, and admissions essays are original. Teachers and professors use them to verify the authenticity of students' work. Researchers rely on a research paper plagiarism checker to check their papers before publication to avoid unintentional plagiarism, including self ...

  24. Similarity and Plagiarism in Scholarly Journal Submissions: Bringing

    Certainly, database search for papers tagged for plagiarism is limited to indexed journals only, which keeps non-indexed journals (both low-quality and deceptive journals) out of focus.5,21 Moreover, journal coverage may vary from one database to the other as reported in a recent paper on research dissemination in South Asia.22 Therefore, both ...

  25. How to Write a Research Paper [Steps & Examples]

    The length of a research paper depends on its topic and specific requirements. Generally, research papers vary between 4,000 to 6,000 words, with shorter papers around 2,000 words and longer ones exceeding 10,000 words. Adhering to the length requirements provided for academic assignments is essential.

  26. Free Plagiarism Checker in Partnership with Turnitin

    Our plagiarism checker, AI Detector, Citation Generator, proofreading services, paraphrasing tool, grammar checker, summarize, and free Knowledge Base content are designed to help students produce quality academic papers. We make every effort to prevent our software from being used for fraudulent or manipulative purposes.

  27. List of Famous Plagiarism Lawsuit Cases that Reached Court

    Annette Shcavan's Ph.D. thesis plagiarism. As the German Minister of Education and Research, the University of Dusseldorf accused Annette Schavan of plagiarizing her Ph.D. thesis. When an investigation led by the University of Dusseldorf took place, it was apparent that Schavan used 60 secondary sources in her work without ever citing or ...