Robot

Last few days of free access to Embibe

Click on Get Started to access Learning Outcomes today

Embibe Logo

Share this article

link

Table of Contents

Latest updates.

Ways To Improve Learning Outcomes: Learn Tips & Tricks

Ways To Improve Learning Outcomes: Learn Tips & Tricks

The Three States of Matter: Solids, Liquids, and Gases

The Three States of Matter: Solids, Liquids, and Gases

Types of Motion: Introduction, Parameters, Examples

Types of Motion: Introduction, Parameters, Examples

Understanding Frequency Polygon: Detailed Explanation

Understanding Frequency Polygon: Detailed Explanation

Uses of Silica Gel in Packaging?

Uses of Silica Gel in Packaging?

Visual Learning Style for Students: Pros and Cons

Visual Learning Style for Students: Pros and Cons

Air Pollution: Know the Causes, Effects & More

Air Pollution: Know the Causes, Effects & More

Sexual Reproduction in Flowering Plants

Sexual Reproduction in Flowering Plants

Integers Introduction: Check Detailed Explanation

Integers Introduction: Check Detailed Explanation

Human Respiratory System – Detailed Explanation

Human Respiratory System – Detailed Explanation

Tag cloud :.

  • entrance exams
  • engineering
  • ssc cgl 2024
  • Written By Priya_Singh
  • Last Modified 24-01-2023

Data Representation: Definition, Types, Examples

Data Representation: Data representation is a technique for analysing numerical data. The relationship between facts, ideas, information, and concepts is depicted in a diagram via data representation. It is a fundamental learning strategy that is simple and easy to understand. It is always determined by the data type in a specific domain. Graphical representations are available in many different shapes and sizes.

In mathematics, a graph is a chart in which statistical data is represented by curves or lines drawn across the coordinate point indicated on its surface. It aids in the investigation of a relationship between two variables by allowing one to evaluate the change in one variable’s amount in relation to another over time. It is useful for analysing series and frequency distributions in a given context. On this page, we will go through two different types of graphs that can be used to graphically display data. Continue reading to learn more.

Learn Informative Blog

Data Representation in Maths

Definition: After collecting the data, the investigator has to condense them in tabular form to study their salient features. Such an arrangement is known as the presentation of data.

Any information gathered may be organised in a frequency distribution table, and then shown using pictographs or bar graphs. A bar graph is a representation of numbers made up of equally wide bars whose lengths are determined by the frequency and scale you choose.

The collected raw data can be placed in any one of the given ways:

  • Serial order of alphabetical order
  • Ascending order
  • Descending order

Data Representation Example

Example: Let the marks obtained by \(30\) students of class VIII in a class test, out of \(50\)according to their roll numbers, be:

\(39,\,25,\,5,\,33,\,19,\,21,\,12,41,\,12,\,21,\,19,\,1,\,10,\,8,\,12\)

\(17,\,19,\,17,\,17,\,41,\,40,\,12,41,\,33,\,19,\,21,\,33,\,5,\,1,\,21\)

The data in the given form is known as raw data or ungrouped data. The above-given data can be placed in the serial order as shown below:

Data Representation Example

Now, for say you want to analyse the standard of achievement of the students. If you arrange them in ascending or descending order, it will give you a better picture.

Ascending order:

\(1,\,1,\,5,\,5,\,8,\,10,\,12,12,\,12,\,12,\,17,\,17,\,17,\,19,\,19\)

\(19,\,19,\,21,\,21,\,21,\,25,\,33,33,\,33,\,39,\,40,\,41,\,41,\,41\)

Descending order:

\(41,\,41,\,41,\,40,\,39,\,33,\,33,33,\,25,\,21,\,21,\,21,\,21,\,19,\,19\)

\(19,\,19,\,17,\,17,\,17,\,12,\,12,12,\,12,\,10,\,8,\,5,\,5,1,\,1\)

When the raw data is placed in ascending or descending order of the magnitude is known as an array or arrayed data.

Graph Representation in Data Structure

A few of the graphical representation of data is given below:

  • Frequency distribution table

Pictorial Representation of Data: Bar Chart

The bar graph represents the ​qualitative data visually. The information is displayed horizontally or vertically and compares items like amounts, characteristics, times, and frequency.

The bars are arranged in order of frequency, so more critical categories are emphasised. By looking at all the bars, it is easy to tell which types in a set of data dominate the others. Bar graphs can be in many ways like single, stacked, or grouped.

Bar Chart

Graphical Representation of Data: Frequency Distribution Table

A frequency table or frequency distribution is a method to present raw data in which one can easily understand the information contained in the raw data.

The frequency distribution table is constructed by using the tally marks. Tally marks are a form of a numerical system with the vertical lines used for counting. The cross line is placed over the four lines to get a total of \(5\).

Frequency Distribution Table

Consider a jar containing the different colours of pieces of bread as shown below:

Frequency Distribution Table Example

Construct a frequency distribution table for the data mentioned above.

Frequency Distribution Table Example

Graphical Representation of Data: Histogram

The histogram is another kind of graph that uses bars in its display. The histogram is used for quantitative data, and ranges of values known as classes are listed at the bottom, and the types with greater frequencies have the taller bars.

A histogram and the bar graph look very similar; however, they are different because of the data level. Bar graphs measure the frequency of the categorical data. A categorical variable has two or more categories, such as gender or hair colour.

Histogram

Graphical Representation of Data: Pie Chart

The pie chart is used to represent the numerical proportions of a dataset. This graph involves dividing a circle into different sectors, where each of the sectors represents the proportion of a particular element as a whole. Thus, it is also known as a circle chart or circle graph.

Pie Chart

Graphical Representation of Data: Line Graph

A graph that uses points and lines to represent change over time is defined as a line graph. In other words, it is the chart that shows a line joining multiple points or a line that shows the link between the points.

The diagram illustrates the quantitative data between two changing variables with the straight line or the curve that joins a series of successive data points. Linear charts compare two variables on the vertical and the horizontal axis.

Line Graph

General Rules for Visual Representation of Data

We have a few rules to present the information in the graphical representation effectively, and they are given below:

  • Suitable Title:  Ensure that the appropriate title is given to the graph, indicating the presentation’s subject.
  • Measurement Unit:  Introduce the measurement unit in the graph.
  • Proper Scale:  To represent the data accurately, choose an appropriate scale.
  • Index:  In the Index, the appropriate colours, shades, lines, design in the graphs are given for better understanding.
  • Data Sources:  At the bottom of the graph, include the source of information wherever necessary.
  • Keep it Simple:  Build the graph in a way that everyone should understand easily.
  • Neat:  You have to choose the correct size, fonts, colours etc., in such a way that the graph must be a model for the presentation of the information.

Solved Examples on Data Representation

Q.1. Construct the frequency distribution table for the data on heights in \(({\rm{cm}})\) of \(20\) boys using the class intervals \(130 – 135,135 – 140\) and so on. The heights of the boys in \({\rm{cm}}\) are: 

Data Representation Example 1

Ans: The frequency distribution for the above data can be constructed as follows:

Data Representation Example

Q.2. Write the steps of the construction of Bar graph? Ans: To construct the bar graph, follow the given steps: 1. Take a graph paper, draw two lines perpendicular to each other, and call them horizontal and vertical. 2. You have to mark the information given in the data like days, weeks, months, years, places, etc., at uniform gaps along the horizontal axis. 3. Then you have to choose the suitable scale to decide the heights of the rectangles or the bars and then mark the sizes on the vertical axis. 4. Draw the bars or rectangles of equal width and height marked in the previous step on the horizontal axis with equal spacing. The figure so obtained will be the bar graph representing the given numerical data.

Q.3. Read the bar graph and then answer the given questions: I. Write the information provided by the given bar graph. II. What is the order of change of the number of students over several years? III. In which year is the increase of the student maximum? IV. State whether true or false. The enrolment during \(1996 – 97\) is double that of \(1995 – 96\)

pictorial representation of data

Ans: I. The bar graph represents the number of students in class \({\rm{VI}}\) of a school during the academic years \(1995 – 96\,to\,1999 – 2000\). II. The number of stcccccudents is changing in increasing order as the heights of bars are growing. III. The increase in the number of students in uniform and the increase in the height of bars is uniform. Hence, in this case, the growth is not maximum in any of the years. The enrolment in the years is \(1996 – 97\, = 200\). and the enrolment in the years is \(1995 – 96\, = 150\). IV. The enrolment in \(1995 – 97\,\) is not double the enrolment in \(1995 – 96\). So the statement is false.

Q.4. Write the frequency distribution for the given information of ages of \(25\) students of class VIII in a school. \(15,\,16,\,16,\,14,\,17,\,17,\,16,\,15,\,15,\,16,\,16,\,17,\,15\) \(16,\,16,\,14,\,16,\,15,\,14,\,15,\,16,\,16,\,15,\,14,\,15\) Ans: Frequency distribution of ages of \(25\) students:

Data Representation Example

Q.5. There are \(20\) students in a classroom. The teacher asked the students to talk about their favourite subjects. The results are listed below:

Data Representation Example

By looking at the above data, which is the most liked subject? Ans: Representing the above data in the frequency distribution table by using tally marks as follows:

Data Representation Example

From the above table, we can see that the maximum number of students \((7)\) likes mathematics.

Also, Check –

  • Diagrammatic Representation of Data

In the given article, we have discussed the data representation with an example. Then we have talked about graphical representation like a bar graph, frequency table, pie chart, etc. later discussed the general rules for graphic representation. Finally, you can find solved examples along with a few FAQs. These will help you gain further clarity on this topic.

Test Informative Blog

FAQs on Data Representation

Q.1: How is data represented? A: The collected data can be expressed in various ways like bar graphs, pictographs, frequency tables, line graphs, pie charts and many more. It depends on the purpose of the data, and accordingly, the type of graph can be chosen.

Q.2: What are the different types of data representation? A : The few types of data representation are given below: 1. Frequency distribution table 2. Bar graph 3. Histogram 4. Line graph 5. Pie chart

Q.3: What is data representation, and why is it essential? A: After collecting the data, the investigator has to condense them in tabular form to study their salient features. Such an arrangement is known as the presentation of data. Importance: The data visualization gives us a clear understanding of what the information means by displaying it visually through maps or graphs. The data is more natural to the mind to comprehend and make it easier to rectify the trends outliners or trends within the large data sets.

Q.4: What is the difference between data and representation? A: The term data defines the collection of specific quantitative facts in their nature like the height, number of children etc., whereas the information in the form of data after being processed, arranged and then presented in the state which gives meaning to the data is data representation.

Q.5: Why do we use data representation? A: The data visualization gives us a clear understanding of what the information means by displaying it visually through maps or graphs. The data is more natural to the mind to comprehend and make it easier to rectify the trends outliners or trends within the large data sets.

Related Articles

Ways To Improve Learning Outcomes: With the development of technology, students may now rely on strategies to enhance learning outcomes. No matter how knowledgeable a...

The Three States of Matter: Anything with mass and occupied space is called ‘Matter’. Matters of different kinds surround us. There are some we can...

Motion is the change of a body's position or orientation over time. The motion of humans and animals illustrates how everything in the cosmos is...

Understanding Frequency Polygon: Students who are struggling with understanding Frequency Polygon can check out the details here. A graphical representation of data distribution helps understand...

When you receive your order of clothes or leather shoes or silver jewellery from any online shoppe, you must have noticed a small packet containing...

Visual Learning Style: We as humans possess the power to remember those which we have caught visually in our memory and that too for a...

Air Pollution: In the past, the air we inhaled was pure and clean. But as industrialisation grows and the number of harmful chemicals in the...

In biology, flowering plants are known by the name angiosperms. Male and female reproductive organs can be found in the same plant in flowering plants....

Integers Introduction: To score well in the exam, students must check out the Integers introduction and understand them thoroughly. The collection of negative numbers and whole...

Human Respiratory System: Students preparing for the NEET and Biology-related exams must have an idea about the human respiratory system. It is a network of tissues...

Place Value of Numbers: Detailed Explanation

Place Value of Numbers: Students must understand the concept of the place value of numbers to score high in the exam. In mathematics, place value...

The Leaf: Types, Structures, Parts

The Leaf: Students who want to understand everything about the leaf can check out the detailed explanation provided by Embibe experts. Plants have a crucial role...

Factors Affecting Respiration: Definition, Diagrams with Examples

In plants, respiration can be regarded as the reversal of the photosynthetic process. Like photosynthesis, respiration involves gas exchange with the environment. Unlike photosynthesis, respiration...

General Terms Related to Spherical Mirrors

General terms related to spherical mirrors: A mirror with the shape of a portion cut out of a spherical surface or substance is known as a...

Number System: Types, Conversion and Properties

Number System: Numbers are highly significant and play an essential role in Mathematics that will come up in further classes. In lower grades, we learned how...

Types of Respiration

Every living organism has to "breathe" to survive. The process by which the living organisms use their food to get energy is called respiration. It...

Animal Cell: Definition, Diagram, Types of Animal Cells

Animal Cell: An animal cell is a eukaryotic cell with membrane-bound cell organelles without a cell wall. We all know that the cell is the fundamental...

Conversion of Percentages: Conversion Method & Examples

Conversion of Percentages: To differentiate and explain the size of quantities, the terms fractions and percent are used interchangeably. Some may find it difficult to...

Arc of a Circle: Definition, Properties, and Examples

Arc of a circle: A circle is the set of all points in the plane that are a fixed distance called the radius from a fixed point...

Ammonia (NH3): Preparation, Structure, Properties and Uses

Ammonia, a colourless gas with a distinct odour, is a chemical building block and a significant component in producing many everyday items. It is found...

CGPA to Percentage: Calculator for Conversion, Formula, & More

CGPA to Percentage: The average grade point of a student is calculated using their cumulative grades across all subjects, omitting any supplemental coursework. Many colleges,...

Uses of Ether – Properties, Nomenclature, Uses, Disadvantages

Uses of Ether:  Ether is an organic compound containing an oxygen atom and an ether group connected to two alkyl/aryl groups. It is formed by the...

General and Middle Terms: Definitions, Formula, Independent Term, Examples

General and Middle terms: The binomial theorem helps us find the power of a binomial without going through the tedious multiplication process. Further, the use...

Mutually Exclusive Events: Definition, Formulas, Solved Examples

Mutually Exclusive Events: In the theory of probability, two events are said to be mutually exclusive events if they cannot occur simultaneously or at the...

Geometry: Definition, Shapes, Structure, Examples

Geometry is a branch of mathematics that is largely concerned with the forms and sizes of objects, their relative positions, and the qualities of space....

Bohr’s Model of Hydrogen Atom: Expressions for Radius, Energy

Rutherford’s Atom Model was undoubtedly a breakthrough in atomic studies. However, it was not wholly correct. The great Danish physicist Niels Bohr (1885–1962) made immediate...

Types of Functions: Definition, Classification and Examples

Types of Functions: Functions are the relation of any two sets. A relation describes the cartesian product of two sets. Cartesian products of two sets...

meaning of representation data

39 Insightful Publications

World Economic Forum

Embibe Is A Global Innovator

accenture

Innovator Of The Year Education Forever

Interpretable And Explainable AI

Interpretable And Explainable AI

Tedx

Revolutionizing Education Forever

Amazon AI Conclave

Best AI Platform For Education

Forbes India

Enabling Teachers Everywhere

ACM

Decoding Performance

World Education Summit

Leading AI Powered Learning Solution Provider

Journal of Educational Data Mining

Auto Generation Of Tests

BW Disrupt

Disrupting Education In India

Springer

Problem Sequencing Using DKT

Fortune India Forty Under Fourty

Help Students Ace India's Toughest Exams

Edtech Digest

Best Education AI Platform

Nasscom Product Connect

Unlocking AI Through Saas

Tech In Asia

Fixing Student’s Behaviour With Data Analytics

Your Story

Leveraging Intelligence To Deliver Results

City AI

Brave New World Of Applied AI

vccircle

You Can Score Higher

INK Talks

Harnessing AI In Education

kstart

Personalized Ed-tech With AI

StartUpGrind

Exciting AI Platform, Personalizing Education

Digital Women Award

Disruptor Award For Maximum Business Impact

The Mumbai Summit 2020 AI

Top 20 AI Influencers In India

USPTO

Proud Owner Of 9 Patents

StartUpGrind

Innovation in AR/VR/MR

StartUpGrind

Best Animated Frames Award 2024

Close

Trending Searches

Previous year question papers, sample papers.

Unleash Your True Potential With Personalised Learning on EMBIBE

Pattern

Ace Your Exam With Personalised Learning on EMBIBE

Enter mobile number.

By signing up, you agree to our Privacy Policy and Terms & Conditions

Graphical Representation of Data

Graphical representation of data is an attractive method of showcasing numerical data that help in analyzing and representing quantitative data visually. A graph is a kind of a chart where data are plotted as variables across the coordinate. It became easy to analyze the extent of change of one variable based on the change of other variables. Graphical representation of data is done through different mediums such as lines, plots, diagrams, etc. Let us learn more about this interesting concept of graphical representation of data, the different types, and solve a few examples.

1.
2.
3.
4.
5.
6.
7.

Definition of Graphical Representation of Data

A graphical representation is a visual representation of data statistics-based results using graphs, plots, and charts. This kind of representation is more effective in understanding and comparing data than seen in a tabular form. Graphical representation helps to qualify, sort, and present data in a method that is simple to understand for a larger audience. Graphs enable in studying the cause and effect relationship between two variables through both time series and frequency distribution. The data that is obtained from different surveying is infused into a graphical representation by the use of some symbols, such as lines on a line graph, bars on a bar chart, or slices of a pie chart. This visual representation helps in clarity, comparison, and understanding of numerical data.

Representation of Data

The word data is from the Latin word Datum, which means something given. The numerical figures collected through a survey are called data and can be represented in two forms - tabular form and visual form through graphs. Once the data is collected through constant observations, it is arranged, summarized, and classified to finally represented in the form of a graph. There are two kinds of data - quantitative and qualitative. Quantitative data is more structured, continuous, and discrete with statistical data whereas qualitative is unstructured where the data cannot be analyzed.

Principles of Graphical Representation of Data

The principles of graphical representation are algebraic. In a graph, there are two lines known as Axis or Coordinate axis. These are the X-axis and Y-axis. The horizontal axis is the X-axis and the vertical axis is the Y-axis. They are perpendicular to each other and intersect at O or point of Origin. On the right side of the Origin, the Xaxis has a positive value and on the left side, it has a negative value. In the same way, the upper side of the Origin Y-axis has a positive value where the down one is with a negative value. When -axis and y-axis intersect each other at the origin it divides the plane into four parts which are called Quadrant I, Quadrant II, Quadrant III, Quadrant IV. This form of representation is seen in a frequency distribution that is represented in four methods, namely Histogram, Smoothed frequency graph, Pie diagram or Pie chart, Cumulative or ogive frequency graph, and Frequency Polygon.

Principle of Graphical Representation of Data

Advantages and Disadvantages of Graphical Representation of Data

Listed below are some advantages and disadvantages of using a graphical representation of data:

  • It improves the way of analyzing and learning as the graphical representation makes the data easy to understand.
  • It can be used in almost all fields from mathematics to physics to psychology and so on.
  • It is easy to understand for its visual impacts.
  • It shows the whole and huge data in an instance.
  • It is mainly used in statistics to determine the mean, median, and mode for different data

The main disadvantage of graphical representation of data is that it takes a lot of effort as well as resources to find the most appropriate data and then represent it graphically.

Rules of Graphical Representation of Data

While presenting data graphically, there are certain rules that need to be followed. They are listed below:

  • Suitable Title: The title of the graph should be appropriate that indicate the subject of the presentation.
  • Measurement Unit: The measurement unit in the graph should be mentioned.
  • Proper Scale: A proper scale needs to be chosen to represent the data accurately.
  • Index: For better understanding, index the appropriate colors, shades, lines, designs in the graphs.
  • Data Sources: Data should be included wherever it is necessary at the bottom of the graph.
  • Simple: The construction of a graph should be easily understood.
  • Neat: The graph should be visually neat in terms of size and font to read the data accurately.

Uses of Graphical Representation of Data

The main use of a graphical representation of data is understanding and identifying the trends and patterns of the data. It helps in analyzing large quantities, comparing two or more data, making predictions, and building a firm decision. The visual display of data also helps in avoiding confusion and overlapping of any information. Graphs like line graphs and bar graphs, display two or more data clearly for easy comparison. This is important in communicating our findings to others and our understanding and analysis of the data.

Types of Graphical Representation of Data

Data is represented in different types of graphs such as plots, pies, diagrams, etc. They are as follows,

Data Representation Description

A group of data represented with rectangular bars with lengths proportional to the values is a .

The bars can either be vertically or horizontally plotted.

The is a type of graph in which a circle is divided into Sectors where each sector represents a proportion of the whole. Two main formulas used in pie charts are:

The represents the data in a form of series that is connected with a straight line. These series are called markers.

Data shown in the form of pictures is a . Pictorial symbols for words, objects, or phrases can be represented with different numbers.

The is a type of graph where the diagram consists of rectangles, the area is proportional to the frequency of a variable and the width is equal to the class interval. Here is an example of a histogram.

The table in statistics showcases the data in ascending order along with their corresponding frequencies.

The frequency of the data is often represented by f.

The is a way to represent quantitative data according to frequency ranges or frequency distribution. It is a graph that shows numerical data arranged in order. Each data value is broken into a stem and a leaf.

Scatter diagram or is a way of graphical representation by using Cartesian coordinates of two variables. The plot shows the relationship between two variables.

Related Topics

Listed below are a few interesting topics that are related to the graphical representation of data, take a look.

  • x and y graph
  • Frequency Polygon
  • Cumulative Frequency

Examples on Graphical Representation of Data

Example 1 : A pie chart is divided into 3 parts with the angles measuring as 2x, 8x, and 10x respectively. Find the value of x in degrees.

We know, the sum of all angles in a pie chart would give 360º as result. ⇒ 2x + 8x + 10x = 360º ⇒ 20 x = 360º ⇒ x = 360º/20 ⇒ x = 18º Therefore, the value of x is 18º.

Example 2: Ben is trying to read the plot given below. His teacher has given him stem and leaf plot worksheets. Can you help him answer the questions? i) What is the mode of the plot? ii) What is the mean of the plot? iii) Find the range.

Stem Leaf
1 2 4
2 1 5 8
3 2 4 6
5 0 3 4 4
6 2 5 7
8 3 8 9
9 1

Solution: i) Mode is the number that appears often in the data. Leaf 4 occurs twice on the plot against stem 5.

Hence, mode = 54

ii) The sum of all data values is 12 + 14 + 21 + 25 + 28 + 32 + 34 + 36 + 50 + 53 + 54 + 54 + 62 + 65 + 67 + 83 + 88 + 89 + 91 = 958

To find the mean, we have to divide the sum by the total number of values.

Mean = Sum of all data values ÷ 19 = 958 ÷ 19 = 50.42

iii) Range = the highest value - the lowest value = 91 - 12 = 79

go to slide go to slide

meaning of representation data

Book a Free Trial Class

Practice Questions on Graphical Representation of Data

Faqs on graphical representation of data, what is graphical representation.

Graphical representation is a form of visually displaying data through various methods like graphs, diagrams, charts, and plots. It helps in sorting, visualizing, and presenting data in a clear manner through different types of graphs. Statistics mainly use graphical representation to show data.

What are the Different Types of Graphical Representation?

The different types of graphical representation of data are:

  • Stem and leaf plot
  • Scatter diagrams
  • Frequency Distribution

Is the Graphical Representation of Numerical Data?

Yes, these graphical representations are numerical data that has been accumulated through various surveys and observations. The method of presenting these numerical data is called a chart. There are different kinds of charts such as a pie chart, bar graph, line graph, etc, that help in clearly showcasing the data.

What is the Use of Graphical Representation of Data?

Graphical representation of data is useful in clarifying, interpreting, and analyzing data plotting points and drawing line segments , surfaces, and other geometric forms or symbols.

What are the Ways to Represent Data?

Tables, charts, and graphs are all ways of representing data, and they can be used for two broad purposes. The first is to support the collection, organization, and analysis of data as part of the process of a scientific study.

What is the Objective of Graphical Representation of Data?

The main objective of representing data graphically is to display information visually that helps in understanding the information efficiently, clearly, and accurately. This is important to communicate the findings as well as analyze the data.

What Is Data Visualization: Brief Theory, Useful Tips and Awesome Examples

  • Share on Facebook
  • Share on Twitter

By Al Boicheva

in Insights , Inspiration

3 years ago

Viewed 10,974 times

Spread the word about this article:

What Is Data Visualization Brief Theory, Useful Tips and Awesome Examples

Updated: June 23, 2022

To create data visualization in order to present your data is no longer just a nice to have skill. Now, the skill to effectively sort and communicate your data through charts is a must-have for any business in any field that deals with data. Data visualization helps businesses quickly make sense of complex data and start making decisions based on that data. This is why today we’ll talk about what is data visualization. We’ll discuss how and why does it work, what type of charts to choose in what cases, how to create effective charts, and, of course, end with beautiful examples.

So let’s jump right in. As usual, don’t hesitate to fast-travel to a particular section of your interest.

Article overview: 1. What Does Data Visualization Mean? 2. How Does it Work? 3. When to Use it? 4. Why Use it? 5. Types of Data Visualization 6. Data Visualization VS Infographics: 5 Main Differences 7. How to Create Effective Data Visualization?: 5 Useful Tips 8. Examples of Data Visualization

1. What is Data Visualization?

Data Visualization is a graphic representation of data that aims to communicate numerous heavy data in an efficient way that is easier to grasp and understand . In a way, data visualization is the mapping between the original data and graphic elements that determine how the attributes of these elements vary. The visualization is usually made by the use of charts, lines, or points, bars, and maps.

  • Data Viz is a branch of Descriptive statistics but it requires both design, computer, and statistical skills.
  • Aesthetics and functionality go hand in hand to communicate complex statistics in an intuitive way.
  • Data Viz tools and technologies are essential for making data-driven decisions.
  • It’s a fine balance between form and functionality.
  • Every STEM field benefits from understanding data.

2. How Does it Work?

If we can see it, our brains can internalize and reflect on it. This is why it’s much easier and more effective to make sense of a chart and see trends than to read a massive document that would take a lot of time and focus to rationalize. We wouldn’t want to repeat the cliche that humans are visual creatures, but it’s a fact that visualization is much more effective and comprehensive.

In a way, we can say that data Viz is a form of storytelling with the purpose to help us make decisions based on data. Such data might include:

  • Tracking sales
  • Identifying trends
  • Identifying changes
  • Monitoring goals
  • Monitoring results
  • Combining data

3. When to Use it?

Data visualization is useful for companies that deal with lots of data on a daily basis. It’s essential to have your data and trends instantly visible. Better than scrolling through colossal spreadsheets. When the trends stand out instantly this also helps your clients or viewers to understand them instead of getting lost in the clutter of numbers.

With that being said, Data Viz is suitable for:

  • Annual reports
  • Presentations
  • Social media micronarratives
  • Informational brochures
  • Trend-trafficking
  • Candlestick chart for financial analysis
  • Determining routes

Common cases when data visualization sees use are in sales, marketing, healthcare, science, finances, politics, and logistics.

4. Why Use it?

Short answer: decision making. Data Visualization comes with the undeniable benefits of quickly recognizing patterns and interpret data. More specifically, it is an invaluable tool to determine the following cases.

  • Identifying correlations between the relationship of variables.
  • Getting market insights about audience behavior.
  • Determining value vs risk metrics.
  • Monitoring trends over time.
  • Examining rates and potential through frequency.
  • Ability to react to changes.

5. Types of Data Visualization

As you probably already guessed, Data Viz is much more than simple pie charts and graphs styled in a visually appealing way. The methods that this branch uses to visualize statistics include a series of effective types.

Map visualization is a great method to analyze and display geographically related information and present it accurately via maps. This intuitive way aims to distribute data by region. Since maps can be 2D or 3D, static or dynamic, there are numerous combinations one can use in order to create a Data Viz map.

COVID-19 Spending Data Visualization POGO by George Railean

The most common ones, however, are:

  • Regional Maps: Classic maps that display countries, cities, or districts. They often represent data in different colors for different characteristics in each region.
  • Line Maps: They usually contain space and time and are ideal for routing, especially for driving or taxi routes in the area due to their analysis of specific scenes.
  • Point Maps: These maps distribute data of geographic information. They are ideal for businesses to pinpoint the exact locations of their buildings in a region.
  • Heat Maps: They indicate the weight of a geographical area based on a specific property. For example, a heat map may distribute the saturation of infected people by area.

Charts present data in the form of graphs, diagrams, and tables. They are often confused with graphs since graphs are indeed a subcategory of charts. However, there is a small difference: graphs show the mathematical relationship between groups of data and is only one of the chart methods to represent data.

Gluten in America - chart data visualization

Infographic Data Visualization by Madeline VanRemmen

With that out of the way, let’s talk about the most basic types of charts in data visualization.

Finance Statistics - Bar Graph visualization

They use a series of bars that illustrate data development.  They are ideal for lighter data and follow trends of no more than three variables or else, the bars become cluttered and hard to comprehend. Ideal for year-on-year comparisons and monthly breakdowns.

Pie chart visualization type

These familiar circular graphs divide data into portions. The bigger the slice, the bigger the portion. They are ideal for depicting sections of a whole and their sum must always be 100%. Avoid pie charts when you need to show data development over time or lack a value for any of the portions. Doughnut charts have the same use as pie charts.

Line graph - common visualization type

They use a line or more than one lines that show development over time. It allows tracking multiple variables at the same time. A great example is tracking product sales by a brand over the years. Area charts have the same use as line charts.

Scatter Plot

Scatter Plot - data visualization idea

These charts allow you to see patterns through data visualization. They have an x-axis and a y-axis for two different values. For example, if your x-axis contains information about car prices while the y-axis is about salaries, the positive or negative relationship will tell you about what a person’s car tells about their salary.

Unlike the charts we just discussed, tables show data in almost a raw format. They are ideal when your data is hard to present visually and aim to show specific numerical data that one is supposed to read rather than visualize.

Creative data table visualization

Data Visualisation | To bee or not to bee by Aishwarya Anand Singh

For example, charts are perfect to display data about a particular illness over a time period in a particular area, but a table comes to better use when you also need to understand specifics such as causes, outcomes, relapses, a period of treatment, and so on.

6. Data Visualization VS Infographics

5 main differences.

They are not that different as both visually represent data. It is often you search for infographics and find images titled Data Visualization and the other way around. In many cases, however, these titles aren’t misleading. Why is that?

  • Data visualization is made of just one element. It could be a map, a chart, or a table. Infographics , on the other hand, often include multiple Data Viz elements.
  • Unlike data visualizations that can be simple or extremely complex and heavy, infographics are simple and target wider audiences. The latter is usually comprehensible even to people outside of the field of research the infographic represents.
  • Interestingly enough, data Viz doesn’t offer narratives and conclusions, it’s a tool and basis for reaching those. While infographics, in most cases offer a story and a narrative. For example, a data visualization map may have the title “Air pollution saturation by region”, while an infographic with the same data would go “Areas A and B are the most polluted in Country C”.
  • Data visualizations can be made in Excel or use other tools that automatically generate the design unless they are set for presentation or publishing. The aesthetics of infographics , however, are of great importance and the designs must be appealing to wider audiences.
  • In terms of interaction, data visualizations often offer interactive charts, especially in an online form. Infographics, on the other hand, rarely have interaction and are usually static images.

While on topic, you could also be interested to check out these 50 engaging infographic examples that make complex data look great.

7. Tips to Create Effective Data Visualization

The process is naturally similar to creating Infographics and it revolves around understanding your data and audience. To be more precise, these are the main steps and best practices when it comes to preparing an effective visualization of data for your viewers to instantly understand.

1. Do Your Homework

Preparation is half the work already done. Before you even start visualizing data, you have to be sure you understand that data to the last detail.

Knowing your audience is undeniable another important part of the homework, as different audiences process information differently. Who are the people you’re visualizing data for? How do they process visual data? Is it enough to hand them a single pie chart or you’ll need a more in-depth visual report?

The third part of preparing is to determine exactly what you want to communicate to the audience. What kind of information you’re visualizing and does it reflect your goal?

And last, think about how much data you’ll be working with and take it into account.

2. Choose the Right Type of Chart

In a previous section, we listed the basic chart types that find use in data visualization. To determine best which one suits your work, there are a few things to consider.

  • How many variables will you have in a chart?
  • How many items will you place for each of your variables?
  • What will be the relation between the values (time period, comparison, distributions, etc.)

With that being said, a pie chart would be ideal if you need to present what portions of a whole takes each item. For example, you can use it to showcase what percent of the market share takes a particular product. Pie charts, however, are unsuitable for distributions, comparisons, and following trends through time periods. Bar graphs, scatter plots,s and line graphs are much more effective in those cases.

Another example is how to use time in your charts. It’s way more accurate to use a horizontal axis because time should run left to right. It’s way more visually intuitive.

3. Sort your Data

Start with removing every piece of data that does not add value and is basically excess for the chart. Sometimes, you have to work with a huge amount of data which will inevitably make your chart pretty complex and hard to read. Don’t hesitate to split your information into two or more charts. If that won’t work for you, you could use highlights or change the entire type of chart with something that would fit better.

Tip: When you use bar charts and columns for comparison, sort the information in an ascending or a descending way by value instead of alphabetical order.

4. Use Colors to Your Advantage

In every form of visualization, colors are your best friend and the most powerful tool. They create contrasts, accents, and emphasis and lead the eye intuitively. Even here, color theory is important.

When you design your chart, make sure you don’t use more than 5 or 6 colors. Anything more than that will make your graph overwhelming and hard to read for your viewers. However, color intensity is a different thing that you can use to your advantage. For example, when you compare the same concept in different periods of time, you could sort your data from the lightest shade of your chosen color to its darker one. It creates a strong visual progression, proper to your timeline.

Things to consider when you choose colors:

  • Different colors for different categories.
  • A consistent color palette for all charts in a series that you will later compare.
  • It’s appropriate to use color blind-friendly palettes.

5. Get Inspired

Always put your inspiration to work when you want to be at the top of your game. Look through examples, infographics, and other people’s work and see what works best for each type of data you need to implement.

This Twitter account Data Visualization Society is a great way to start. In the meantime, we’ll also handpick some amazing examples that will get you in the mood to start creating the visuals for your data.

8. Examples for Data Visualization

As another art form, Data Viz is a fertile ground for some amazing well-designed graphs that prove that data is beautiful. Now let’s check out some.

Dark Souls III Experience Data

We start with Meng Hsiao Wei’s personal project presenting his experience with playing Dark Souls 3. It’s a perfect example that infographics and data visualization are tools for personal designs as well. The research is pretty massive yet very professionally sorted into different types of charts for the different concepts. All data visualizations are made with the same color palette and look great in infographics.

Data of My Dark Souls 3 example

My dark souls 3 playing data by Meng Hsiao Wei

Greatest Movies of all Time

Katie Silver has compiled a list of the 100 greatest movies of all time based on critics and crowd reviews. The visualization shows key data points for every movie such as year of release, oscar nominations and wins, budget, gross, IMDB score, genre, filming location, setting of the film, and production studio. All movies are ordered by the release date.

Greatest Movies visualization chart

100 Greatest Movies Data Visualization by Katie Silver

The Most Violent Cities

Federica Fragapane shows data for the 50 most violent cities in the world in 2017. The items are arranged on a vertical axis based on population and ordered along the horizontal axis according to the homicide rate.

The Most Violent Cities example

The Most Violent Cities by Federica Fragapane

Family Businesses as Data

These data visualizations and illustrations were made by Valerio Pellegrini for Perspectives Magazine. They show a pie chart with sector breakdown as well as a scatter plot for contribution for employment.

Family Businesses as Data Visual

PERSPECTIVES MAGAZINE – Family Businesses by Valerio Pellegrini

Orbit Map of the Solar System

The map shows data on the orbits of more than 18000 asteroids in the solar system. Each asteroid is shown at its position on New Years’ Eve 1999, colored by type of asteroid.

Orbit Map of the Solar System graphic

An Orbit Map of the Solar System by Eleanor Lutz

The Semantics Of Headlines

Katja Flükiger has a take on how headlines tell the story. The data visualization aims to communicate how much is the selling influencing the telling. The project was completed at Maryland Institute College of Art to visualize references to immigration and color-coding the value judgments implied by word choice and context.

The Semantics Of Headlines graph

The Semantics of Headlines by Katja Flükiger

Moon and Earthquakes

This data visualization works on answering whether the moon is responsible for earthquakes. The chart features the time and intensity of earthquakes in response to the phase and orbit location of the moon.

Moon and Earthquakes statistics visual

Moon and Earthquakes by Aishwarya Anand Singh

Dawn of the Nanosats

The visualization shows the satellites launched from 2003 to 2015. The graph represents the type of institutions focused on projects as well as the nations that financed them. On the left, it is shown the number of launches per year and satellite applications.

Dawn of the Nanosats visualization

WIRED UK – Dawn of the by Nanosats by Valerio Pellegrini

Final Words

Data visualization is not only a form of science but also a form of art. Its purpose is to help businesses in any field quickly make sense of complex data and start making decisions based on that data. To make your graphs efficient and easy to read, it’s all about knowing your data and audience. This way you’ll be able to choose the right type of chart and use visual techniques to your advantage.

You may also be interested in some of these related articles:

  • Infographics for Marketing: How to Grab and Hold the Attention
  • 12 Animated Infographics That Will Engage Your Mind from Start to Finish
  • 50 Engaging Infographic Examples That Make Complex Ideas Look Great
  • Good Color Combinations That Go Beyond Trends: Inspirational Examples and Ideas

meaning of representation data

Add some character to your visuals

Cartoon Characters, Design Bundles, Illustrations, Backgrounds and more...

Like us on Facebook

Subscribe to our newsletter

Be the first to know what’s new in the world of graphic design and illustrations.

  • [email protected]

Browse High Quality Vector Graphics

E.g.: businessman, lion, girl…

Related Articles

The best 15 places to find web design agencies, mood board examples and mega inspiration for your upcoming projects, 70 inspiring presentation slides with cartoon designs, 15 incredible character design books on the market, how to start an online store in 2022: 4 powerful ecommerce solutions, check out our infographics bundle with 500+ infographic templates:, enjoyed this article.

Don’t forget to share!

  • Comments (2)

meaning of representation data

Al Boicheva

Al is an illustrator at GraphicMama with out-of-the-box thinking and a passion for anything creative. In her free time, you will see her drooling over tattoo art, Manga, and horror movies.

meaning of representation data

Thousands of vector graphics for your projects.

Hey! You made it all the way to the bottom!

Here are some other articles we think you may like:

15 Brand Name Generators to Save the Day

The Best Brand Name Generators to Save the Day

by Bilyana Nikolaeva

15 Inspiring Blog Design Examples

Inspiration

15 inspiring blog design examples: creative and ultra modern.

by Lyudmil Enchev

Ecommerce Website Design Tips for More Online Sells

8 Tips for Successful Ecommerce Website Design + Amazing Examples

Looking for design bundles or cartoon characters.

A source of high-quality vector graphics offering a huge variety of premade character designs, graphic design bundles, Adobe Character Animator puppets, and more.

meaning of representation data

High Impact Tutoring Built By Math Experts

Personalized standards-aligned one-on-one math tutoring for schools and districts

In order to access this I need to be confident with:

Representing data

Here you will learn about representing data, including how to create and interpret the different tables, charts, diagrams and graphs you can use to represent data.

Students first learn how to represent and interpret data in the first grade and expand their knowledge as they progress through elementary school, middle school and high school. Being data literate is essential for success in the real world.

What is representing data?

Representing data allows you to display and interpret collected data. Data literacy is essential to understanding the world around us.

There are different types of data that can be represented in different formats.

For example,

Stem and leaf plot

  • Frequency distribution (such as bar graphs, vertical line graphs & line plots)

Cumulative frequency

Let’s take a look in detail at some of the different ways to represent data.

A histogram is a graphical representation used to display quantitative continuous data (numeric data). The graphical display uses bars that are different heights and each bar groups numbers into ranges. The horizontal axis represents the numerical range, and the vertical axis represents the frequency, which is the number of times the data falls in the particular numerical range.

For example, the frequency table shows the salaries of 157 employees at a small company. Create a histogram from the data.

Representing Data 1 US

Step by step guide : Histograms

[FREE] Representing Data Worksheet (Grade 6 to 7)

[FREE] Representing Data Worksheet (Grade 6 to 7)

Use this quiz to check your grade 6th to 7th students’ understanding of representing data. 10+ questions with answers covering a range of grades 6 and 7 representing data topics to identify areas of strength and support!

A stem and leaf plot is a method of organizing numerical data based on the place value of the numbers.

Each number is split into two parts.

The first digit(s) form the stem,

The last digit forms the leaf.

For example, the data below represents the age of all the employees at Millstown Elementary School. Create a stem and leaf plot from the data

Representing Data 3 US

Step by step guide: Stem and Leaf Plot

Frequency distribution

A frequency distribution is a way of representing data from a frequency distribution table. Frequency distributions can be represented by frequency graphs such as pie graphs, bar graphs, line plots, vertical line graphs, and/or frequency polygons where the frequency is displayed on the vertical axis (y- axis ).

There are two types of data that can be represented using a frequency graph.

Categorical data – data that are words rather than numbers, for example, colors, makes of cars, types of music.

Numerical data – data that is in the form of numbers. There are two types of numerical data.

Here are some examples of frequency graphs:

Representing Data 4 US

Step by step guide: Frequency distribution

A cumulative frequency graph, also called an ogive, shows the frequencies of each category accumulated together. This allows you to analyze the distribution of the data in more detail than if you used a frequency polygon and calculate statistics.

Here is an example of a cumulative frequency graph along with the data set.

Representing Data 5 US

Similar to a frequency graph, the horizontal axis (x- axis ) represents the numerical interval and the vertical axis (y- axis ) represents the cumulative frequency.

A pie chart also known as a circle chart or pie graph is a visual representation of data that is made by a circle divided into sectors (pie slices). Each sector represents a part of the whole (whole pie). Pie charts are used to represent categorical data.

Here is an example of a pie chart that displays students’ favorite subjects in percentages at a particular school. Notice how each sector represents a percent of the whole circle.

Representing Data 6 US

The sectors of the circle graphs can be represented as the number data points in the category or as percents.

Step by step guide: Pie chart

A box plot also known as a box and whisker plot is a graph that represents the five number summary of a set of data.

The five number summary includes the following:

  • Lowest value or smallest value
  • Lower quartile or first quartile (Q1)
  • Median , middle number , middle value , or second quartile (M)
  • Upper quartile or third quartile (Q3)
  • Highest value or largest value

Here is an example of a box plot for the given data set:

7, \, 4, \, 5, \, 6, \, 3, \, 4, \, 7, \, 10, \, 11, \, 8, \, 9, \, 2, \, 3, \, 8, \, 11, \, 12, \, 10

Like with a stem and leaf plot, it is helpful to put the data points in order from least to greatest.

2, \, 3, \, 3, \, 4, \, 4, \, 5, \, 6, \, 7, \, 7, \, 8, \, 8, \, 9, \, 10, \, 10, \, 11, \, 11, \, 12

Representing Data 7 US

Quartiles are values that divide the data set into three quarters. From the box plot, you can see that the first quartile is the value where the 25\% of the data set falls under.

The median or the second quartile is the value where 50\% of the data falls under and the third quartile (Q3) is the value where 75\% of the data set falls under.

From the box plot, you can also determine the interquartile range (IQR) which is found by finding the difference between Q1 and Q3.

Step by step guide: Box plot

Step by step guide: Quartile

Step by step guide: Interquartile range

What is representing data?

Common Core State Standards

How does this relate to 6 th and 7 th grade math?

  • Grade 6 – Statistics and Probability (6.SP.B.4) Display numerical data in plots on a number line, including dot plots, histograms, and box plots.
  • Grade 6 – Statistics and Probability (6.SP.A.3) Recognize that a measure of center for a numerical data set summarizes all of its values with a single number, while a measure of variation describes how its values vary with a single number.
  • Grade 6 – Ratios and Proportional Relationships (6.RP.3.c) Find a percent of a quantity as a rate per 100 (for example, 30\% of a quantity means \cfrac{30}{100} times the quantity); solve problems involving finding the whole, given a part and the percent.
  • Grade 7 – Statistics and Probability (7.SP.B.3) Informally assess the degree of visual overlap of two numerical data distributions with similar variabilities, measuring the difference between the centers by expressing it as a multiple of a measure of variability. For example, the mean height of players on the basketball team is 10~{cm} greater than the mean height of players on the soccer team, about twice the variability (mean absolute deviation) on either team; on a dot plot, the separation between the two distributions of heights is noticeable.
  • Grade 7 – Statistics and Probability (7.SP.B.4) Use measures of center and measures of variability for numerical data from random samples to draw informal comparative inferences about two populations. For example, decide whether the words in a chapter of a seventh-grade science book are generally longer than the words in a chapter of a fourth-grade science book.

How to represent data

For a more detailed step-by-step approach on how to represent data, go to the links highlighted in the “What is representing data” section above or follow the examples below.

Representing data examples

Example 1: stem and leaf diagram.

The data below represents the heights of trees at a tree farm in feet.

20ft, \, 15ft, \, 17ft, \, 29ft, \, 22ft, \, 13ft, \, 30ft, \, 25ft, \, 18ft, \, 27ft, \, 31ft

  • Order the numbers from smallest to largest.

13, \, 15, \, 17, \, 18, \, 20, \, 22, \, 25, \, 27, \, 29, \, 30, \, 31

2 Split the numbers into two parts; the last part must be one digit only.

The numbers in the data will be split into tens and ones, so 13 will be 1 and 3 (1 represents 10 or 1 ten and 3 is 3 ones ).

3 Put the values into the diagram and create a key.

Representing Data 8 US

Example 2: histogram

Create histogram for the given test scores.

\begin{aligned} &82, \, 78, \, 77, \, 89, \, 90, \, 99, \, 97, \, 65, \, 66, \, 74, \, 78, \, 80, \, 78, \\ &92, \, 70, \, 85, \, 75, \, 85, \, 88, \, 79, \, 69, \, 88, \, 99, \, 84, \, 83, \, 91 \end{aligned}

Decide what bin size to use and how many bins are needed.

Bin size is the same as interval size.

The lowest test score is 65 , and the highest test score is 100. Let’s use bins (intervals) of 5. There are 7 bins all together.

Group the data by the bin sizes to find the frequency.

Representing Data 9 US

Create bars based on the bin sizes and frequencies within the bins.

Representing Data 10 US

Label the \textbf{x} and \textbf{y} axes with units.

Representing Data 11 US

Example 3: pie chart

24 pupils were asked which subject was their favorite. Here is a pie chart to show the results. How many students said science was their favorite subject?

Representing Data 12 US

Identify the categories.

There are 5 categories, science 25\%, English 13\%, history 20\%, art 10\%, and other 32\%.

Calculate and analyze the data.

There are 24 students that were surveyed, and 25\% said that science was their favorite subject.

24 \times 0.25=6

6 students say that science is their favorite subject.

Example 4: box plot

Create a box plot for the data below.

15, \, 11, \, 24, \, 13, \, 22, \, 17, \, 20, \, 25, \, 19, \, 10, \, 24

Determine the median and quartiles.

Placing the data in order from least to greatest.

10, \, 11, \, 13, \, 15, \, 17, \, 19, \, 20, \, 22, \, 24, \, 24, \, 25

Representing Data 13 US

Draw a scale, and mark the five key values: minimum value, lower quartile (LQ), median, upper quartile (UQ), and maximum value.

Representing Data 14 US

Join the lower quartile and upper quartile to form the box, and draw horizontal lines to the minimum and maximum values.

Representing Data 15 US

Example 5: frequency distribution

At the local zoo, the zoologist was taking a count of the animals.

Create a dot plot representing this data.

Representing Data 16 US

Read the question and determine what type of graph you need to create.

This question asks you to create a dot plot to represent the data.

Use the data to create the specific frequency graph.

Representing Data 17 US

Example 6: frequency polygon

A vet weighs all the dogs she sees in a week. Here are the results.

Representing Data 18 US

Draw a frequency polygon to show the results.

This question asks you to create a frequency polygon to represent the data.

Representing Data 19 US

Use the midpoints of the groups representing mass, 5, \, 15, \, 25 and so on, to label the horizontal axis. The frequency is on the vertical axis.

Plot points with a sharp pencil and crosses to be accurate, and then connect the points to create the frequency polygon.

Representing Data 20 US

Teaching tips for representing data

  • Utilize interactive programs like Excel to allow students to spend time exploring how changing the bin size for a set of data affects the distribution of the data, therefore affecting the conclusions that might be drawn.
  • Use project based learning activities where students can collect their own raw data and create and interpret tables, diagrams, and graphs.
  • Provide visual aids or display examples of data representation around the classroom for students to differentiate between graph types including stem and leaf plots, bar graphs, histograms, dot plots, and box plots.

Easy mistakes to make

  • Knowing when to label the horizontal axis as discrete groups or having it be a continuous scale The horizontal axis of a bar chart is divided into discrete categorical variables with gaps between the bars. Whereas on a histogram, values are on a continuous scale, so there are no gaps between the bins.
  • Forgetting to order the data set before creating graphs If you are given a data set to represent on a box plot, stem and leaf plot, histogram, line plots, and/or bar graphs, make sure the list of values is in order before you start finding the key values. Listing the data in order is always a good practice to use when graphing and analyzing data.
  • Not being precise when labeling axes When drawing graphs, diagrams, and charts, use a sharp pencil and a ruler so that you can be as accurate as possible. For pie charts, use a protractor to measure the angles accurately.

Representing Data 21 US

Practice representing data questions

1. Use the stem and leaf plot to determine the mode.

Representing Data 22 US

The value 157~{cm} occurs twice. Therefore, the mode is 157.

2. Use the box plot to determine the median.

Representing Data 23 US

On a box plot, the line in the box represents the median.

Representing Data 24 US

3. Which histogram represents the data?

The table shows the number of deer Karen sees in her yard over the course of a month.

Representing Data 25 US

Look to make sure the axes are numbered correctly. The horizontal axis should be labeled 0 to 20 counting by 5’ s. The vertical axis should be numbered 0 to 10.

Each bar height should be equal to the frequency in each interval.

4. Which pie chart represents the data in this frequency table?

Representing Data 30 US

The total of the frequencies is 40. The frequency of A is 10, which is a quarter of 40. So, section A needs to be a quarter of the pie chart.

Similarly section C needs to be a quarter too. The frequency of B is 20, which is half of 40. Section B needs to be half of the pie chart.

5. The table below shows the number of flowers in a garden.

Which dot plot represents the data in the table?

Representing Data 35 US

There are 4   types of flowers: tulip, lily, rose, and marigold. Label the horizontal axes with the flower types. For each flower, place the number of dots vertically that matches the frequency.

6. Which of these is the correct frequency polygon for the frequency table below?

Representing Data 40 US

The points should be plotted using the midpoints of the groups: 10, \, 30, \, 50, \, 70 and 90. They should be plotted using the correct frequencies: 1, \, 9, \, 8, \, 3 and 2. The points need joining up, but NOT the last and the first points.

Representing data FAQs

Continuous data can be any value within a range of values or in an interval. Discrete data is a specific value within a range.

When you study algebra 1, you will learn how to create scatter plots. A straight line going through the points on a scatter plot is known as a line of best fit. Step-by-step guide: Scatterplots

The next lessons are

  • Frequency table
  • Frequency graph
  • Sampling methods
  • Two way tables

Still stuck?

At Third Space Learning, we specialize in helping teachers and school leaders to provide personalized math support for more of their students through high-quality, online one-on-one math tutoring delivered by subject experts.

Each week, our tutors support thousands of students who are at risk of not meeting their grade-level expectations, and help accelerate their progress and boost their confidence.

One on one math tuition

Find out how we can help your students achieve success with our math tutoring programs .

[FREE] Common Core Practice Tests (Grades 3 to 6)

Prepare for math tests in your state with these Grade 3 to Grade 6 practice assessments for Common Core and state equivalents.

40 multiple choice questions and detailed answers to support test prep, created by US math experts covering a range of topics!

Privacy Overview

  • Reviews / Why join our community?
  • For companies
  • Frequently asked questions

Data Representation

Literature on data representation.

Here’s the entire UX literature on Data Representation by the Interaction Design Foundation, collated in one place:

Learn more about Data Representation

Take a deep dive into Data Representation with our course AI for Designers .

In an era where technology is rapidly reshaping the way we interact with the world, understanding the intricacies of AI is not just a skill, but a necessity for designers . The AI for Designers course delves into the heart of this game-changing field, empowering you to navigate the complexities of designing in the age of AI. Why is this knowledge vital? AI is not just a tool; it's a paradigm shift, revolutionizing the design landscape. As a designer, make sure that you not only keep pace with the ever-evolving tech landscape but also lead the way in creating user experiences that are intuitive, intelligent, and ethical.

AI for Designers is taught by Ioana Teleanu, a seasoned AI Product Designer and Design Educator who has established a community of over 250,000 UX enthusiasts through her social channel UX Goodies. She imparts her extensive expertise to this course from her experience at renowned companies like UiPath and ING Bank, and now works on pioneering AI projects at Miro.

In this course, you’ll explore how to work with AI in harmony and incorporate it into your design process to elevate your career to new heights. Welcome to a course that doesn’t just teach design; it shapes the future of design innovation.

In lesson 1, you’ll explore AI's significance, understand key terms like Machine Learning, Deep Learning, and Generative AI, discover AI's impact on design, and master the art of creating effective text prompts for design.

In lesson 2, you’ll learn how to enhance your design workflow using AI tools for UX research, including market analysis, persona interviews, and data processing. You’ll dive into problem-solving with AI, mastering problem definition and production ideation.

In lesson 3, you’ll discover how to incorporate AI tools for prototyping, wireframing, visual design, and UX writing into your design process. You’ll learn how AI can assist to evaluate your designs and automate tasks, and ensure your product is launch-ready.

In lesson 4, you’ll explore the designer's role in AI-driven solutions, how to address challenges, analyze concerns, and deliver ethical solutions for real-world design applications.

Throughout the course, you'll receive practical tips for real-life projects. In the Build Your Portfolio exercises, you’ll practice how to integrate AI tools into your workflow and design for AI products, enabling you to create a compelling portfolio case study to attract potential employers or collaborators.

All open-source articles on Data Representation

Visual mapping – the elements of information visualization.

meaning of representation data

  • 3 years ago

Rating Scales in UX Research: The Ultimate Guide

meaning of representation data

Open Access—Link to us!

We believe in Open Access and the  democratization of knowledge . Unfortunately, world-class educational materials such as this page are normally hidden behind paywalls or in expensive textbooks.

If you want this to change , cite this page , link to us, or join us to help us democratize design knowledge !

Privacy Settings

Our digital services use necessary tracking technologies, including third-party cookies, for security, functionality, and to uphold user rights. Optional cookies offer enhanced features, and analytics.

Experience the full potential of our site that remembers your preferences and supports secure sign-in.

Governs the storage of data necessary for maintaining website security, user authentication, and fraud prevention mechanisms.

Enhanced Functionality

Saves your settings and preferences, like your location, for a more personalized experience.

Referral Program

We use cookies to enable our referral program, giving you and your friends discounts.

Error Reporting

We share user ID with Bugsnag and NewRelic to help us track errors and fix issues.

Optimize your experience by allowing us to monitor site usage. You’ll enjoy a smoother, more personalized journey without compromising your privacy.

Analytics Storage

Collects anonymous data on how you navigate and interact, helping us make informed improvements.

Differentiates real visitors from automated bots, ensuring accurate usage data and improving your website experience.

Lets us tailor your digital ads to match your interests, making them more relevant and useful to you.

Advertising Storage

Stores information for better-targeted advertising, enhancing your online ad experience.

Personalization Storage

Permits storing data to personalize content and ads across Google services based on user behavior, enhancing overall user experience.

Advertising Personalization

Allows for content and ad personalization across Google services based on user behavior. This consent enhances user experiences.

Enables personalizing ads based on user data and interactions, allowing for more relevant advertising experiences across Google services.

Receive more relevant advertisements by sharing your interests and behavior with our trusted advertising partners.

Enables better ad targeting and measurement on Meta platforms, making ads you see more relevant.

Allows for improved ad effectiveness and measurement through Meta’s Conversions API, ensuring privacy-compliant data sharing.

LinkedIn Insights

Tracks conversions, retargeting, and web analytics for LinkedIn ad campaigns, enhancing ad relevance and performance.

LinkedIn CAPI

Enhances LinkedIn advertising through server-side event tracking, offering more accurate measurement and personalization.

Google Ads Tag

Tracks ad performance and user engagement, helping deliver ads that are most useful to you.

Share Knowledge, Get Respect!

or copy link

Cite according to academic standards

Simply copy and paste the text below into your bibliographic reference list, onto your blog, or anywhere else. You can also just hyperlink to this page.

New to UX Design? We’re Giving You a Free ebook!

The Basics of User Experience Design

Download our free ebook The Basics of User Experience Design to learn about core concepts of UX design.

In 9 chapters, we’ll cover: conducting user interviews, design thinking, interaction design, mobile UX design, usability, UX research, and many more!

Your Article Library

Graphic representation of data: meaning, principles and methods.

meaning of representation data

ADVERTISEMENTS:

Read this article to learn about the meaning, principles and methods of graphic representation of data.

Meaning of Graphic Representation of Data:

Graphic representation is another way of analysing numerical data. A graph is a sort of chart through which statistical data are represented in the form of lines or curves drawn across the coordinated points plotted on its surface.

Graphs enable us in studying the cause and effect relationship between two variables. Graphs help to measure the extent of change in one variable when another variable changes by a certain amount.

Graphs also enable us in studying both time series and frequency distribution as they give clear account and precise picture of problem. Graphs are also easy to understand and eye catching.

General Principles of Graphic Representation:

There are some algebraic principles which apply to all types of graphic representation of data. In a graph there are two lines called coordinate axes. One is vertical known as Y axis and the other is horizontal called X axis. These two lines are perpendicular to each other. Where these two lines intersect each other is called ‘0’ or the Origin. On the X axis the distances right to the origin have positive value (see fig. 7.1) and distances left to the origin have negative value. On the Y axis distances above the origin have a positive value and below the origin have a negative value.

General Principles of Graphic Representation

Methods to Represent a Frequency Distribution:

Generally four methods are used to represent a frequency distribution graphically. These are Histogram, Smoothed frequency graph and Ogive or Cumulative frequency graph and pie diagram.

1. Histogram:

Histogram is a non-cumulative frequency graph, it is drawn on a natural scale in which the representative frequencies of the different class of values are represented through vertical rectangles drawn closed to each other. Measure of central tendency, mode can be easily determined with the help of this graph.

How to draw a Histogram :

Represent the class intervals of the variables along the X axis and their frequencies along the Y-axis on natural scale.

Start X axis with the lower limit of the lowest class interval. When the lower limit happens to be a distant score from the origin give a break in the X-axis n to indicate that the vertical axis has been moved in for convenience.

Now draw rectangular bars in parallel to Y axis above each of the class intervals with class units as base: The areas of rectangles must be proportional to the frequencies of the cor­responding classes.

Plot the following Data by a Histogram

In this graph we shall take class intervals in the X axis and frequencies in the Y axis. Before plotting the graph we have to convert the class into their exact limits.

Histogram Plotted from the Data

Advantages of histogram :

1. It is easy to draw and simple to understand.

2. It helps us to understand the distribution easily and quickly.

3. It is more precise than the polygene.

Limitations of histogram :

1. It is not possible to plot more than one distribution on same axes as histogram.

2. Comparison of more than one frequency distribution on the same axes is not possible.

3. It is not possible to make it smooth.

Uses of histogram :

1. Represents the data in graphic form.

2. Provides the knowledge of how the scores in the group are distributed. Whether the scores are piled up at the lower or higher end of the distribution or are evenly and regularly distributed throughout the scale.

3. Frequency Polygon. The frequency polygon is a frequen­cy graph which is drawn by joining the coordinating points of the mid-values of the class intervals and their corresponding fre­quencies.

Let us discuss how to draw a frequency polygon:

Draw a horizontal line at the bottom of graph paper named ‘OX’ axis. Mark off the exact limits of the class intervals along this axis. It is better to start with c.i. of lowest value. When the lowest score in the distribution is a large number we cannot show it graphically if we start with the origin. Therefore put a break in the X axis () to indicate that the vertical axis has been moved in for convenience. Two additional points may be added to the two extreme ends.

Draw a vertical line through the extreme end of the horizontal axis known as OY axis. Along this line mark off the units to represent the frequencies of the class intervals. The scale should be chosen in such a way that it will make the largest frequency (height) of the polygon approximately 75 percent of the width of the figure.

Plot the points at a height proportional to the frequencies directly above the point on the horizontal axis representing the mid-point of each class interval.

After plotting all the points on the graph join these points by a series of short straight lines to form the frequency polygon. In order to complete the figure two additional intervals at the high end and low end of the distribution should be included. The frequency of these two intervals will be zero.

Illustration: No. 7.3 :

Draw a frequency polygon from the following data:

Frequency Polygon

In this graph we shall take the class intervals (marks in mathematics) in X axis, and frequencies (Number of students) in the Y axis. Before plotting the graph we have to convert the c.i. into their exact limits and extend one c.i. in each end with a frequency of O.

Class intervals with exact limits:

Class intervals with exact limits

Advantages of frequency polygon :

2. It is possible to plot two distributions at a time on same axes.

3. Comparison of two distributions can be made through frequency polygon.

4. It is possible to make it smooth.

Limitations of frequency polygon :

1. It is less precise.

2. It is not accurate in terms of area the frequency upon each interval.

Uses of frequency polygon :

1. When two or more distributions are to be compared the frequency polygon is used.

2. It represents the data in graphic form.

3. It provides knowledge of how the scores in one or more group are distributed. Whether the scores are piled up at the lower or higher end of the distribution or are evenly and regularly distributed throughout the scale.

2. Smoothed Frequency Polygon :

When the sample is very small and the frequency distribution is irregular the polygon is very jig-jag. In order to wipe out the irregularities and “also get a better notion of how the figure might look if the data were more numerous, the frequency polygon may be smoothed.”

In this process to adjust the frequencies we take a series of ‘moving’ or ‘running’ averages. To get an adjusted or smoothed frequency we add the frequency of a class interval with the two adjacent intervals, just below and above the class interval. Then the sum is divided by 3. When these adjusted frequencies are plotted against the class intervals on a graph we get a smoothed frequency polygon.

Illustration 7.4 :

Draw a smoothed frequency polygon, of the data given in the illustration No. 7.3:

Here we have to first convert the class intervals into their exact limits. Then we have to determine the adjusted or smoothed frequencies.

Determine the Adjusted or Smoothed Frequencies

3. Ogive or Cumulative Frequency Polygon:

Ogive is a cumulative frequency graphs drawn on natural scale to determine the values of certain factors like median, Quartile, Percentile etc. In these graphs the exact limits of the class intervals are shown along the X-axis and the cumulative frequen­cies are shown along the Y-axis. Below are given the steps to draw an ogive.

Get the cumulative frequency by adding the frequencies cumulatively, from the lower end (to get a less than ogive) or from the upper end (to get a more than ogive).

Mark off the class intervals in the X-axis.

Represent the cumulative frequencies along the Y-axis begin­ning with zero at the base.

Put dots at each of the coordinating points of the upper limit and the corresponding frequencies.

Join all the dots with a line drawing smoothly. This will result in curve called ogive.

Illustration No. 7.5 :

Draw an ogive from the data given below:

ogive

To plot this graph first we have to convert, the class intervals into their exact limits. Then we have to calculate the cumulative frequencies of the distribution.

Cumulative Frequencies of the Distribution

Now we have to plot the cumulative frequencies in respect to their corresponding class-intervals.

Ogive plotted from the data given above:

Ogive plotted

Uses of Ogive:

1. Ogive is useful to determine the number of students below and above a particular score.

2. When the median as a measure of central tendency is wanted.

3. When the quartiles, deciles and percentiles are wanted.

4. By plotting the scores of two groups on a same scale we can compare both the groups.

4. The Pie Diagram:

Figure given below shows the distribution of elementary pupils by their academic achievement in a school. Of the total, 60% are high achievers, 25% middle achievers and 15% low achievers. The construction of this pie diagram is quite simple. There are 360 degree in the circle. Hence, 60% of 360′ or 216° are counted off as shown in the diagram; this sector represents the proportion of high achievers students.

Ninety degrees counted off for the middle achiever students (25%) and 54 degrees for low achiever students (15%). The pie-diagram is useful when one wishes to picture proportions of the total in a striking way. Numbers of degrees may be measured off “by eye” or more accurately with a protractor.

Distribution by Academic Achievement of Pupils in Class VI of a School

Uses of Pie diagram :

1. Pie diagram is useful when one wants to picture proportions of the total in a striking way.

2. When a population is stratified and each strata is to be presented as a percentage at that time pie diagram is used.

Related Articles:

  • 5 Methods to Depict Frequency Distribution | Statistics
  • Representing Data Graphically: 3 Methods | Statistics

Comments are closed.

web statistics

  • School Guide
  • Mathematics
  • Number System and Arithmetic
  • Trigonometry
  • Probability
  • Mensuration
  • Maths Formulas
  • Class 8 Maths Notes
  • Class 9 Maths Notes
  • Class 10 Maths Notes
  • Class 11 Maths Notes
  • Class 12 Maths Notes
  • CBSE Class 9 Maths Revision Notes

Chapter 1: Number System

  • Number System in Maths
  • Natural Numbers | Definition, Examples & Properties
  • Whole Numbers - Definition, Properties and Examples
  • Rational Numbers: Definition, Examples, Worksheet
  • Irrational Numbers: Definition, Examples, Symbol, Properties
  • Real Numbers
  • Decimal Expansion of Real Numbers
  • Decimal Expansions of Rational Numbers
  • Representation of Rational Numbers on the Number Line | Class 8 Maths
  • Represent √3 on the number line
  • Operations on Real Numbers
  • Rationalization of Denominators
  • Laws of Exponents for Real Numbers

Chapter 2: Polynomials

  • Polynomials in One Variable | Polynomials Class 9 Maths
  • Polynomial Formula
  • Types of Polynomials (Based on Terms and Degrees)
  • Zeros of Polynomial
  • Factorization of Polynomial
  • Remainder Theorem
  • Factor Theorem
  • Algebraic Identities

Chapter 3: Coordinate Geometry

  • Coordinate Geometry
  • Cartesian Coordinate System
  • Cartesian Plane

Chapter 4: Linear equations in two variables

  • Linear Equations in One Variable
  • Linear Equation in Two Variables
  • Graph of Linear Equations in Two Variables
  • Graphical Methods of Solving Pair of Linear Equations in Two Variables
  • Equations of Lines Parallel to the x-axis and y-axis

Chapter 5: Introduction to Euclid's Geometry

  • Euclidean Geometry
  • Equivalent Version of Euclid’s Fifth Postulate

Chapter 6: Lines and Angles

  • Lines and Angles
  • Types of Angles
  • Pairs of Angles - Lines & Angles
  • Transversal Lines
  • Angle Sum Property of a Triangle

Chapter 7: Triangles

  • Triangles in Geometry
  • Congruence of Triangles |SSS, SAS, ASA, and RHS Rules
  • Theorem - Angle opposite to equal sides of an isosceles triangle are equal | Class 9 Maths
  • Triangle Inequality Theorem, Proof & Applications

Chapter 8: Quadrilateral

  • Angle Sum Property of a Quadrilateral
  • Quadrilateral - Definition, Properties, Types, Formulas, Examples
  • Introduction to Parallelogram: Properties, Types, and Theorem
  • Rhombus: Definition, Properties, Formula and Examples
  • Trapezium in Maths | Formulas, Properties & Examples
  • Square in Maths - Area, Perimeter, Examples & Applications
  • Kite - Quadrilaterals
  • Properties of Parallelograms
  • Mid Point Theorem

Chapter 9: Areas of Parallelograms and Triangles

  • Area of Triangle | Formula and Examples
  • Area of Parallelogram | Definition, Formulas & Examples
  • Figures on the Same Base and between the Same Parallels

Chapter 10: Circles

  • Circles in Maths
  • Radius of Circle
  • Tangent to a Circle
  • What is the longest chord of a Circle?
  • Circumference of Circle - Definition, Perimeter Formula, and Examples
  • Angle subtended by an arc at the centre of a circle
  • What is Cyclic Quadrilateral
  • The sum of opposite angles of a cyclic quadrilateral is 180° | Class 9 Maths Theorem

Chapter 11: Construction

  • Basic Constructions - Angle Bisector, Perpendicular Bisector, Angle of 60°
  • Construction of Triangles

Chapter 12: Heron's Formula

  • Area of Equilateral Triangle
  • Area of Isosceles Triangle
  • Heron's Formula
  • Applications of Heron's Formula
  • Area of Quadrilateral
  • Area of Polygons

Chapter 13: Surface Areas and Volumes

  • Surface Area of Cuboid
  • Volume of Cuboid | Formula and Examples
  • Surface Area of Cube
  • Volume of a Cube
  • Surface Area of Cylinder | Curved and Total Surface Area of Cylinder
  • Volume of a Cylinder: Formula, Definition and Examples
  • Surface Area of Cone
  • Volume of Cone: Formula, Derivation and Examples
  • Surface Area of Sphere: Formula, Derivation and Solved Examples
  • Volume of a Sphere
  • Surface Area of a Hemisphere
  • Volume of Hemisphere

Chapter 14: Statistics

  • Collection and Presentation of Data

Graphical Representation of Data

  • Bar Graphs and Histograms
  • Central Tendency in Statistics- Mean, Median, Mode
  • Mean, Median and Mode

Chapter 15: Probability

  • Experimental Probability
  • Empirical Probability
  • CBSE Class 9 Maths Formulas
  • NCERT Solutions for Class 9 Maths: Chapter Wise PDF 2024
  • RD Sharma Class 9 Solutions

Graphical Representation of Data: Graphical Representation of Data,” where numbers and facts become lively pictures and colorful diagrams . Instead of staring at boring lists of numbers, we use fun charts, cool graphs, and interesting visuals to understand information better. In this exciting concept of data visualization, we’ll learn about different kinds of graphs, charts, and pictures that help us see patterns and stories hidden in data.

There is an entire branch in mathematics dedicated to dealing with collecting, analyzing, interpreting, and presenting numerical data in visual form in such a way that it becomes easy to understand and the data becomes easy to compare as well, the branch is known as Statistics .

The branch is widely spread and has a plethora of real-life applications such as Business Analytics, demography, Astro statistics, and so on . In this article, we have provided everything about the graphical representation of data, including its types, rules, advantages, etc.

Graphical-Representation-of-Data

Table of Content

What is Graphical Representation

Types of graphical representations, line graphs, histograms , stem and leaf plot , box and whisker plot .

  • Graphical Representations used in Maths

Value-Based or Time Series Graphs 

Frequency based, principles of graphical representations, advantages and disadvantages of using graphical system, general rules for graphical representation of data, frequency polygon, solved examples on graphical representation of data.

Graphics Representation is a way of representing any data in picturized form . It helps a reader to understand the large set of data very easily as it gives us various data patterns in visualized form.

There are two ways of representing data,

  • Pictorial Representation through graphs.

They say, “A picture is worth a thousand words”.  It’s always better to represent data in a graphical format. Even in Practical Evidence and Surveys, scientists have found that the restoration and understanding of any information is better when it is available in the form of visuals as Human beings process data better in visual form than any other form.

Does it increase the ability 2 times or 3 times? The answer is it increases the Power of understanding 60,000 times for a normal Human being, the fact is amusing and true at the same time.

Check: Graph and its representations

Comparison between different items is best shown with graphs, it becomes easier to compare the crux of the data about different items. Let’s look at all the different types of graphical representations briefly: 

A line graph is used to show how the value of a particular variable changes with time. We plot this graph by connecting the points at different values of the variable. It can be useful for analyzing the trends in the data and predicting further trends. 

meaning of representation data

A bar graph is a type of graphical representation of the data in which bars of uniform width are drawn with equal spacing between them on one axis (x-axis usually), depicting the variable. The values of the variables are represented by the height of the bars. 

meaning of representation data

This is similar to bar graphs, but it is based frequency of numerical values rather than their actual values. The data is organized into intervals and the bars represent the frequency of the values in that range. That is, it counts how many values of the data lie in a particular range. 

meaning of representation data

It is a plot that displays data as points and checkmarks above a number line, showing the frequency of the point.  

meaning of representation data

This is a type of plot in which each value is split into a “leaf”(in most cases, it is the last digit) and “stem”(the other remaining digits). For example: the number 42 is split into leaf (2) and stem (4).  

meaning of representation data

These plots divide the data into four parts to show their summary. They are more concerned about the spread, average, and median of the data. 

meaning of representation data

It is a type of graph which represents the data in form of a circular graph. The circle is divided such that each portion represents a proportion of the whole. 

meaning of representation data

Graphical Representations used in Math’s

Graphs in Math are used to study the relationships between two or more variables that are changing. Statistical data can be summarized in a better way using graphs. There are basically two lines of thoughts of making graphs in maths: 

  • Value-Based or Time Series Graphs

These graphs allow us to study the change of a variable with respect to another variable within a given interval of time. The variables can be anything. Time Series graphs study the change of variable with time. They study the trends, periodic behavior, and patterns in the series. We are more concerned with the values of the variables here rather than the frequency of those values. 

Example: Line Graph

These kinds of graphs are more concerned with the distribution of data. How many values lie between a particular range of the variables, and which range has the maximum frequency of the values. They are used to judge a spread and average and sometimes median of a variable under study.

Also read: Types of Statistical Data
  • All types of graphical representations follow algebraic principles.
  • When plotting a graph, there’s an origin and two axes.
  • The x-axis is horizontal, and the y-axis is vertical.
  • The axes divide the plane into four quadrants.
  • The origin is where the axes intersect.
  • Positive x-values are to the right of the origin; negative x-values are to the left.
  • Positive y-values are above the x-axis; negative y-values are below.

graphical-representation

  • It gives us a summary of the data which is easier to look at and analyze.
  • It saves time.
  • We can compare and study more than one variable at a time.

Disadvantages

  • It usually takes only one aspect of the data and ignores the other. For example, A bar graph does not represent the mean, median, and other statistics of the data. 
  • Interpretation of graphs can vary based on individual perspectives, leading to subjective conclusions.
  • Poorly constructed or misleading visuals can distort data interpretation and lead to incorrect conclusions.
Check : Diagrammatic and Graphic Presentation of Data

We should keep in mind some things while plotting and designing these graphs. The goal should be a better and clear picture of the data. Following things should be kept in mind while plotting the above graphs: 

  • Whenever possible, the data source must be mentioned for the viewer.
  • Always choose the proper colors and font sizes. They should be chosen to keep in mind that the graphs should look neat.
  • The measurement Unit should be mentioned in the top right corner of the graph.
  • The proper scale should be chosen while making the graph, it should be chosen such that the graph looks accurate.
  • Last but not the least, a suitable title should be chosen.

A frequency polygon is a graph that is constructed by joining the midpoint of the intervals. The height of the interval or the bin represents the frequency of the values that lie in that interval. 

frequency-polygon

Question 1: What are different types of frequency-based plots? 

Types of frequency-based plots:  Histogram Frequency Polygon Box Plots

Question 2: A company with an advertising budget of Rs 10,00,00,000 has planned the following expenditure in the different advertising channels such as TV Advertisement, Radio, Facebook, Instagram, and Printed media. The table represents the money spent on different channels. 

Draw a bar graph for the following data. 

  • Put each of the channels on the x-axis
  • The height of the bars is decided by the value of each channel.

meaning of representation data

Question 3: Draw a line plot for the following data 

  • Put each of the x-axis row value on the x-axis
  • joint the value corresponding to the each value of the x-axis.

meaning of representation data

Question 4: Make a frequency plot of the following data: 

  • Draw the class intervals on the x-axis and frequencies on the y-axis.
  • Calculate the midpoint of each class interval.
Class Interval Mid Point Frequency
0-3 1.5 3
3-6 4.5 4
6-9 7.5 2
9-12 10.5 6

Now join the mid points of the intervals and their corresponding frequencies on the graph. 

meaning of representation data

This graph shows both the histogram and frequency polygon for the given distribution.

Related Article:

Graphical Representation of Data| Practical Work in Geography Class 12 What are the different ways of Data Representation What are the different ways of Data Representation? Charts and Graphs for Data Visualization

Conclusion of Graphical Representation

Graphical representation is a powerful tool for understanding data, but it’s essential to be aware of its limitations. While graphs and charts can make information easier to grasp, they can also be subjective, complex, and potentially misleading . By using graphical representations wisely and critically, we can extract valuable insights from data, empowering us to make informed decisions with confidence.

Graphical Representation of Data – FAQs

What are the advantages of using graphs to represent data.

Graphs offer visualization, clarity, and easy comparison of data, aiding in outlier identification and predictive analysis.

What are the common types of graphs used for data representation?

Common graph types include bar, line, pie, histogram, and scatter plots , each suited for different data representations and analysis purposes.

How do you choose the most appropriate type of graph for your data?

Select a graph type based on data type, analysis objective, and audience familiarity to effectively convey information and insights.

How do you create effective labels and titles for graphs?

Use descriptive titles, clear axis labels with units, and legends to ensure the graph communicates information clearly and concisely.

How do you interpret graphs to extract meaningful insights from data?

Interpret graphs by examining trends, identifying outliers, comparing data across categories, and considering the broader context to draw meaningful insights and conclusions.

Please Login to comment...

Similar reads.

  • Maths-Class-9
  • School Learning

Improve your Coding Skills with Practice

 alt=

What kind of Experience do you want to share?

black_students_with_computers_4086327569

Featured Subjects

While learning commonly occurs within the confines of classrooms, it is not limited to such spaces. Use the Acadlly LMS to advance learning opportunities regardless of the place.

Easy to navigate with user-friendly interfaces

Screenshot

Large library of video lessons & lesson notes

Learn with animated characters in a fun way, english language, agric. science, 20,000+ past questions and answers.

meaning of representation data

Different & Flexible means of payment

Debit card. transfer. paypal. others, why parents choose us, latest articles.

Read our blog section to get useful tips, how-to guides and techniques that will help excel in your studies.

August 8, 2023

A Comprehensive Guide on How to Study Effectively and Excel in Exams

Navigating online learning: a guide to effective study techniques, 12 note-taking techniques for students in schools & online learning.

5124327

Get in touch with us

If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

To log in and use all the features of Khan Academy, please enable JavaScript in your browser.

Praxis Core Math

Course: praxis core math   >   unit 1.

  • Data representations | Lesson

Data representations | Worked example

  • Center and spread | Lesson
  • Center and spread | Worked example
  • Random sampling | Lesson
  • Random sampling | Worked example
  • Scatterplots | Lesson
  • Scatterplots | Worked example
  • Interpreting linear models | Lesson
  • Interpreting linear models | Worked example
  • Correlation and Causation | Lesson
  • Correlation and causation | Worked example
  • Probability | Lesson
  • Probability | Worked example

meaning of representation data

Want to join the conversation?

  • Upvote Button navigates to signup page
  • Downvote Button navigates to signup page
  • Flag Button navigates to signup page

Video transcript

DHDC | Digital Humanities Data Curation

Digital Humanities Data Curation

This is the first stop on your way to mastering the essentials of data curation for the humanities. The Guide offers concise, expert introductions to key topics, including annotated links to important standards, articles, projects, and other resources.

The best place to start is the Table of Contents grid. To find out more about the project, visit the About This Site page. Please browse, read, and contribute. We’re still expanding the site, but take a look around. Happy browsing!

— The Editors

Follow @DHCuration

More about the DH Curation Guide

Data curation is an emerging problem for the humanities as both data and analytical practices become increasingly digital. Research groups working with cultural content as well as libraries, museums, archives, and other institutions are all in need of new expertise. This Guide is a first step to understanding the essentials of data curation for the humanities. The expert-written introductions to key topics include links to important standards, documentation, articles, and projects in the field, annotated with enough context from expert editors and the research community to indicate to newcomers how these resources might help them with data curation challenges.

A Community Resource

Intended to help students and those new to the field, the DH Curation Guide also provides a quick reference for teachers, administrators, and anyone seeking an orientation in the issues and practicalities of data curation.

As indicated by the name, this community resource guide is intended to be a living, participatory document. Readers are encouraged to review and comment on every part of this guide, to suggest additional resources, and to contribute to stub articles. Contributions from readers are incorporated at intervals to keep the Guide at the cutting edge. Read more about how to contribute

Browse, comment, contribute! The table of contents provides a road map to the Guide’s current topics and those to be added soon. Read more about this site

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

applsci-logo

Article Menu

meaning of representation data

  • Subscribe SciFeed
  • Recommended Articles
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Integrating multi-omics using bayesian ridge regression with iterative similarity bagging.

meaning of representation data

1. Introduction

  • Bayesian Ridge Regression (BRR) is introduced as a supervised domain-oriented feature selection method to reduce omics complexity and dimensionality. Features are selected based on domain contexts, such as drug response and cancer classification.
  • A new method named Iterative Similarity Bagging (ISB) is presented to perform a dynamic reduction of dimensionality and complexity without losing the biological measurements of omics data, which is a common issue with some transformation-based integration approaches.

2. Related Work

3. materials and methods, 3.1. datasets, 3.2. bayesian ridge regression, 3.3. iterative similarity bagging.

Iterative Similarity Bagging method
:
   • Iterations count: i
   • Bag size (Columns in the bag): k
   • Single-omics dataset: data
:
   • Selected features list after all iterations: selected_features

     List selected_features = []
    // Increment value after each iteration
     integer increment value: c
     index= 0 // Starting column to select genes in the bag.
     c= k
     index=0: i
      j=0: range(0,len(data.columns))
        index < len(data.columns):
         df_bag = SELECT_COLUMNS(data, index : k
         df = TRANSPOSE(df_bag)
         df_Sim = compute_similarity(df) // Euclidean distance
         threshold= get_threshold(df_Sim) // Half-mean threshold
         iteration_selected_features = df_Sim[col] > threshold
         selected_features+= list(unique(iteration_selected_features))
         
         index = index + c
         k=k + c
       
     
    

3.4. Drug Response Prediction Using Graph Convolutional Network and Convolutional Neural Network

3.5. evaluation metrics, 3.6. experimental setup, 4. results and discussion, 4.1. genomic features selected by brr, 4.2. genomic features selected by isb, 4.3. effectiveness of brr-isb in drug response prediction, 4.4. comparison with related works.

  • Researchers utilized Weighted Graph Regularized Matrix Factorization (WGRMF) [ 92 ] to predict the responses of cell lines to anti-cancer drugs. This model used the CCLE, which has 491 cell lines and 23 drugs with 10,870 known responses. WGRMF employed gene expression and drug fingerprints as inputs for the model.
  • EBSRMF [ 81 ]: Researchers proposed Ensemble-based Similarity-Regularized Matrix Factorization, a bagging-based technique to enhance drug response prediction accuracy on the CCLE dataset. The dataset comprises 24 drugs and 363 types of cell lines. It utilized gene expression profiles and chemical structure.
  • DeepDSC [ 80 ]: Gene expression data were utilized to extract features of cell lines by a stacked deep autoencoder. Subsequently, the gene expression data were combined with chemical structure information to forecast drug response. DeepDSC utilized the Cancer Cell Line Encyclopedia (CCLE), which has 491 cell lines and 23 drugs, along with 10,870 documented responses.
  • SRMF [ 93 ]: Drug response prediction was accomplished by combining gene expression data with chemical structures using a Similarity-Regularized Matrix Factorization model. The CCLE dataset has 10,870 known responses, encompassing 491 distinct cell lines and 23 drugs.

5. Conclusions

Author contributions, institutional review board statement, informed consent statement, data availability statement, conflicts of interest.

  • Subramanian, I.; Verma, S.; Kumar, S.; Jere, A.; Anamika, K. Multi-Omics Data Integration, Interpretation, and Its Application. Bioinform. Biol. Insights 2020 , 14 , 117793221989905. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Chen, C.; Wang, J.; Pan, D.; Wang, X.; Xu, Y.; Yan, J.; Wang, L.; Yang, X.; Yang, M.; Liu, G. Applications of Multi-omics Analysis in Human Diseases. MedComm 2023 , 4 , e315. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Kreitmaier, P.; Katsoula, G.; Zeggini, E. Insights from Multi-Omics Integration in Complex Disease Primary Tissues. Trends Genet. 2023 , 39 , 46–58. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Chong, J.; Soufan, O.; Li, C.; Caraus, I.; Li, S.; Bourque, G.; Wishart, D.S.; Xia, J. MetaboAnalyst 4.0: Towards More Transparent and Integrative Metabolomics Analysis. Nucleic Acids Res. 2018 , 46 , W486–W494. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • López de Maturana, E.; Alonso, L.; Alarcón, P.; Martín-Antoniano, I.A.; Pineda, S.; Piorno, L.; Calle, M.L.; Malats, N. Challenges in the Integration of Omics and Non-Omics Data. Genes 2019 , 10 , 238. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Cai, Z.; Poulos, R.C.; Liu, J.; Zhong, Q. Machine Learning for Multi-Omics Data Integration in Cancer. iScience 2022 , 25 , 103798. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Picard, M.; Scott-Boyer, M.-P.; Bodein, A.; Périn, O.; Droit, A. Integration Strategies of Multi-Omics Data for Machine Learning Analysis. Comput. Struct. Biotechnol. J. 2021 , 19 , 3735–3746. [ Google Scholar ] [ CrossRef ]
  • Hasin, Y.; Seldin, M.; Lusis, A. Multi-Omics Approaches to Disease. Genome Biol. 2017 , 18 , 83. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Lander, E.S.; Linton, L.M.; Birren, B.; Nusbaum, C.; Zody, M.C.; Baldwin, J.; Devon, K.; Dewar, K.; Doyle, M.; FitzHugh, W.; et al. Initial Sequencing and Analysis of the Human Genome. Nature 2001 , 409 , 860–921. [ Google Scholar ] [ CrossRef ]
  • Reel, P.S.; Reel, S.; Pearson, E.; Trucco, E.; Jefferson, E. Using Machine Learning Approaches for Multi-Omics Data Analysis: A Review. Biotechnol. Adv. 2021 , 49 , 107739. [ Google Scholar ] [ CrossRef ]
  • Almutiri, T.; Alomar, K.; Alganmi, N. Predicting Drug Response on Multi-Omics Data Using a Hybrid of Bayesian Ridge Regression with Deep Forest. Int. J. Adv. Comput. Sci. Appl. 2023 , 14 , 470–482. [ Google Scholar ] [ CrossRef ]
  • Nicora, G.; Vitali, F.; Dagliati, A.; Geifman, N.; Bellazzi, R. Integrated Multi-Omics Analyses in Oncology: A Review of Machine Learning Methods and Tools. Front. Oncol. 2020 , 10 , 1030. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Xuan, P.; Sun, C.; Zhang, T.; Ye, Y.; Shen, T.; Dong, Y. Gradient Boosting Decision Tree-Based Method for Predicting Interactions Between Target Genes and Drugs. Front. Genet. 2019 , 10 , 459. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Yue, X.; Wang, Z.; Huang, J.; Parthasarathy, S.; Moosavinasab, S.; Huang, Y.; Lin, S.M.; Zhang, W.; Zhang, P.; Sun, H. Graph Embedding on Biomedical Networks: Methods, Applications and Evaluations. Bioinformatics 2020 , 36 , 1241–1251. [ Google Scholar ] [ CrossRef ]
  • Ma, T.; Zhang, A. Affinity Network Fusion and Semi-Supervised Learning for Cancer Patient Clustering. Methods 2018 , 145 , 16–24. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Gligorijević, V.; Barot, M.; Bonneau, R. DeepNF: Deep Network Fusion for Protein Function Prediction. Bioinformatics 2018 , 34 , 3873–3881. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Wen, Y.; Song, X.; Yan, B.; Yang, X.; Wu, L.; Leng, D.; He, S.; Bo, X. Multi-Dimensional Data Integration Algorithm Based on Random Walk with Restart. BMC Bioinform. 2021 , 22 , 97. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Zhang, Y.; Li, A.; Peng, C.; Wang, M. Improve Glioblastoma Multiforme Prognosis Prediction by Using Feature Selection and Multiple Kernel Learning. IEEE/ACM Trans. Comput. Biol. Bioinform. 2016 , 13 , 825–835. [ Google Scholar ] [ CrossRef ]
  • He, Z.; Zhang, J.; Yuan, X.; Zhang, Y. Integrating Somatic Mutations for Breast Cancer Survival Prediction Using Machine Learning Methods. Front. Genet. 2021 , 11 , 632901. [ Google Scholar ] [ CrossRef ]
  • Ammad-ud-din, M.; Khan, S.A.; Malani, D.; Murumägi, A.; Kallioniemi, O.; Aittokallio, T.; Kaski, S. Drug Response Prediction by Inferring Pathway-Response Associations with Kernelized Bayesian Matrix Factorization. Bioinformatics 2016 , 32 , i455–i463. [ Google Scholar ] [ CrossRef ]
  • Costello, J.C.; Heiser, L.M.; Georgii, E.; Gönen, M.; Menden, M.P.; Wang, N.J.; Bansal, M.; Ammad-ud-din, M.; Hintsanen, P.; Khan, S.A.; et al. A Community Effort to Assess and Improve Drug Sensitivity Prediction Algorithms. Nat. Biotechnol. 2014 , 32 , 1202–1212. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Vahabi, N.; Michailidis, G. Unsupervised Multi-Omics Data Integration Methods: A Comprehensive Review. Front. Genet. 2022 , 13 , 854752. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Gligorijević, V.; Pržulj, N. Methods for Biological Data Integration: Perspectives and Challenges. J. R. Soc. Interface 2015 , 12 , 20150571. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Wang, B.; Mezlini, A.M.; Demir, F.; Fiume, M.; Tu, Z.; Brudno, M.; Haibe-Kains, B.; Goldenberg, A. Similarity Network Fusion for Aggregating Data Types on a Genomic Scale. Nat. Methods 2014 , 11 , 333–337. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Efendi, A.; Effrihan, E. A Simulation Study on Bayesian Ridge Regression Models for Several Collinearity Levels. AIP Conf. Proc. 2017 , 1913 , 020031. [ Google Scholar ]
  • Yassen, M.F.; Al-Duais, F.S.; Almazah, M. Ridge Regression Method and Bayesian Estimators under Composite LINEX Loss Function to Estimate the Shape Parameter in Lomax Distribution. Comput. Intell. Neurosci. 2022 , 2022 , 1200611. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Flavin, T.; Steiner, T.; Mitra, B.; Nagaraju, V. Bayesian Ridge Regression Based Model to Predict Fault Location in HVdc Network. In Proceedings of the 2022 IEEE Power & Energy Society General Meeting (PESGM), Denver, CO, USA, 17–21 July 2022; pp. 1–5. [ Google Scholar ]
  • Ngo, G.; Beard, R.; Chandra, R. Evolutionary Bagging for Ensemble Learning. Neurocomputing 2022 , 510 , 1–14. [ Google Scholar ] [ CrossRef ]
  • Toloşi, L.; Lengauer, T. Classification with Correlated Features: Unreliability of Feature Ranking and Solutions. Bioinformatics 2011 , 27 , 1986–1994. [ Google Scholar ] [ CrossRef ]
  • Jain, I.; Jain, V.K.; Jain, R. Correlation Feature Selection Based Improved-Binary Particle Swarm Optimization for Gene Selection and Cancer Classification. Appl. Soft Comput. 2018 , 62 , 203–215. [ Google Scholar ] [ CrossRef ]
  • Darst, B.F.; Malecki, K.C.; Engelman, C.D. Using Recursive Feature Elimination in Random Forest to Account for Correlated Variables in High Dimensional Data. BMC Genet. 2018 , 19 , 65. [ Google Scholar ] [ CrossRef ]
  • Misra, B.B.; Langefeld, C.; Olivier, M.; Cox, L.A. Integrated Omics: Tools, Advances and Future Approaches. J. Mol. Endocrinol. 2019 , 62 , R21–R45. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Wörheide, M.A.; Krumsiek, J.; Kastenmüller, G.; Arnold, M. Multi-Omics Integration in Biomedical Research—A Metabolomics-Centric Review. Anal. Chim. Acta 2021 , 1141 , 144–162. [ Google Scholar ] [ CrossRef ]
  • Park, M.; Kim, D.; Moon, K.; Park, T. Integrative Analysis of Multi-Omics Data Based on Blockwise Sparse Principal Components. Int. J. Mol. Sci. 2020 , 21 , 8202. [ Google Scholar ] [ CrossRef ]
  • Xie, G.; Dong, C.; Kong, Y.; Zhong, J.; Li, M.; Wang, K. Group Lasso Regularized Deep Learning for Cancer Prognosis from Multi-Omics and Clinical Features. Genes 2019 , 10 , 240. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Xie, M.; Lei, X.; Zhong, J.; Ouyang, J.; Li, G. Drug Response Prediction Using Graph Representation Learning and Laplacian Feature Selection. BMC Bioinform. 2022 , 23 , 532. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Chu, T.; Nguyen, T.T.; Hai, B.D.; Nguyen, Q.H.; Nguyen, T. Graph Transformer for Drug Response Prediction. IEEE/ACM Trans. Comput. Biol. Bioinform. 2023 , 20 , 1065–1072. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Malik, V.; Kalakoti, Y.; Sundar, D. Deep Learning Assisted Multi-Omics Integration for Survival and Drug-Response Prediction in Breast Cancer. BMC Genom. 2021 , 22 , 214. [ Google Scholar ] [ CrossRef ]
  • Wang, Z.; Li, H.; Carpenter, C.; Guan, Y. Challenge-Enabled Machine Learning to Drug-Response Prediction. AAPS J. 2020 , 22 , 106. [ Google Scholar ] [ CrossRef ]
  • Bühlmann, P.; Van De Geer, S. Statistics for High-Dimensional Data: Methods, Theory and Applications ; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2011; ISBN 364220192X. [ Google Scholar ]
  • Bøvelstad, H.M.; Nygård, S.; Størvold, H.L.; Aldrin, M.; Borgan, Ø.; Frigessi, A.; Lingjærde, O.C. Predicting Survival from Microarray Data—A Comparative Study. Bioinformatics 2007 , 23 , 2080–2087. [ Google Scholar ] [ CrossRef ]
  • Natekin, A.; Knoll, A. Gradient Boosting Machines, a Tutorial. Front. Neurorobot. 2013 , 7 , 21. [ Google Scholar ] [ CrossRef ]
  • Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. J. Mach. Learn. Res. 2014 , 15 , 1929–1958. [ Google Scholar ]
  • Partin, A.; Brettin, T.; Evrard, Y.A.; Zhu, Y.; Yoo, H.; Xia, F.; Jiang, S.; Clyde, A.; Shukla, M.; Fonstein, M. Learning Curves for Drug Response Prediction in Cancer Cell Lines. BMC Bioinform. 2021 , 22 , 252. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Chang, Y.; Park, H.; Yang, H.-J.; Lee, S.; Lee, K.-Y.; Kim, T.S.; Jung, J.; Shin, J.-M. Cancer Drug Response Profile Scan (CDRscan): A Deep Learning Model That Predicts Drug Effectiveness from Cancer Genomic Signature. Sci. Rep. 2018 , 8 , 8857. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Zhu, Y.; Brettin, T.; Evrard, Y.A.; Partin, A.; Xia, F.; Shukla, M.; Yoo, H.; Doroshow, J.H.; Stevens, R.L. Ensemble Transfer Learning for the Prediction of Anti-Cancer Drug Response. Sci. Rep. 2020 , 10 , 18040. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Sotudian, S.; Paschalidis, I.C. Machine Learning for Pharmacogenomics and Personalized Medicine: A Ranking Model for Drug Sensitivity Prediction. IEEE/ACM Trans. Comput. Biol. Bioinform. 2021 , 19 , 2324–2333. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Roder, J.; Oliveira, C.; Net, L.; Tsypin, M.; Linstid, B.; Roder, H. A Dropout-Regularized Classifier Development Approach Optimized for Precision Medicine Test Discovery from Omics Data. BMC Bioinform. 2019 , 20 , 325. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Xiaolin, X.; Xiaozhi, L.; Guoping, H.; Hongwei, L.; Jinkuo, G.; Xiyun, B.; Zhen, T.; Xiaofang, M.; Yanxia, L.; Na, X. Overfit Deep Neural Network for Predicting Drug-Target Interactions. iScience 2023 , 26 , 107646. [ Google Scholar ] [ CrossRef ]
  • Iorio, F.; Knijnenburg, T.A.; Vis, D.J.; Bignell, G.R.; Menden, M.P.; Schubert, M.; Aben, N.; Gonçalves, E.; Barthorpe, S.; Lightfoot, H.; et al. A Landscape of Pharmacogenomic Interactions in Cancer. Cell 2016 , 166 , 740–754. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Kurilov, R.; Haibe-Kains, B.; Brors, B. Assessment of Modelling Strategies for Drug Response Prediction in Cell Lines and Xenografts. Sci. Rep. 2020 , 10 , 2849. [ Google Scholar ] [ CrossRef ]
  • Barretina, J.; Caponigro, G.; Stransky, N.; Venkatesan, K.; Margolin, A.A.; Kim, S.; Wilson, C.J.; Lehár, J.; Kryukov, G.V.; Sonkin, D.; et al. The Cancer Cell Line Encyclopedia Enables Predictive Modelling of Anticancer Drug Sensitivity. Nature 2012 , 483 , 603–607. [ Google Scholar ] [ CrossRef ]
  • Yang, W.; Soares, J.; Greninger, P.; Edelman, E.J.; Lightfoot, H.; Forbes, S.; Bindal, N.; Beare, D.; Smith, J.A.; Thompson, I.R. Genomics of Drug Sensitivity in Cancer (GDSC): A Resource for Therapeutic Biomarker Discovery in Cancer Cells. Nucleic Acids Res. 2012 , 41 , D955–D961. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Xu, X.; Gu, H.; Wang, Y.; Wang, J.; Qin, P. Autoencoder Based Feature Selection Method for Classification of Anticancer Drug Response. Front. Genet. 2019 , 10 , 233. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Kim, S.; Thiessen, P.A.; Bolton, E.E.; Chen, J.; Fu, G.; Gindulyte, A.; Han, L.; He, J.; He, S.; Shoemaker, B.A. PubChem Substance and Compound Databases. Nucleic Acids Res. 2016 , 44 , D1202–D1213. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • O’Boyle, N.M. Towards a Universal SMILES Representation—A Standard Method to Generate Canonical SMILES Based on the InChI. J. Cheminform. 2012 , 4 , 22. [ Google Scholar ] [ CrossRef ]
  • Weininger, D. SMILES, a Chemical Language and Information System. 1. Introduction to Methodology and Encoding Rules. J. Chem. Inf. Comput. Sci. 1988 , 28 , 31–36. [ Google Scholar ] [ CrossRef ]
  • Kearnes, S.; McCloskey, K.; Berndl, M.; Pande, V.; Riley, P. Molecular Graph Convolutions: Moving beyond Fingerprints. J. Comput. Aided Mol. Des. 2016 , 30 , 595–608. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Goh, G.B.; Siegel, C.; Vishnu, A.; Hodas, N. Using Rule-Based Labels for Weak Supervised Learning: A ChemNet for Transferable Chemical Property Prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 302–310. [ Google Scholar ]
  • Landrum, G. Rdkit: Open-Source Cheminformatics Software. 2016. Volume 149. p. 650. Available online: http://www.rdkit.org/ (accessed on 24 June 2024).
  • Ramsundar, B.; Eastman, P.; Walters, P.; Pande, V. Deep Learning for the Life Sciences: Applying Deep Learning to Genomics, Microscopy, Drug Discovery, and More ; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2019; ISBN 1492039780. [ Google Scholar ]
  • Nguyen, T.; Nguyen, G.T.T.; Nguyen, T.; Le, D.-H. Graph Convolutional Networks for Drug Response Prediction. IEEE/ACM Trans. Comput. Biol. Bioinform. 2022 , 19 , 146–154. [ Google Scholar ] [ CrossRef ]
  • Wu, Z.; Ramsundar, B.; Feinberg, E.N.; Gomes, J.; Geniesse, C.; Pappu, A.S.; Leswing, K.; Pande, V. MoleculeNet: A Benchmark for Molecular Machine Learning. Chem. Sci. 2018 , 9 , 513–530. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Fernández, I.; Frenking, G.; Merino, G. Aromaticity of Metallabenzenes and Related Compounds. Chem. Soc. Rev. 2015 , 44 , 6452–6463. [ Google Scholar ] [ CrossRef ]
  • Tipping, M.E. Sparse Bayesian Learning and the Relevance Vector Machine. J. Mach. Learn. Res. 2001 , 1 , 211–244. [ Google Scholar ]
  • Bishop, C.M.; Nasrabadi, N.M. Pattern Recognition and Machine Learning ; Springer: Berlin/Heidelberg, Germany, 2006; Volume 4. [ Google Scholar ]
  • Hoerl, A.E.; Kennard, R.W. Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics 1970 , 12 , 55–67. [ Google Scholar ] [ CrossRef ]
  • Neal, R.M. Bayesian Learning for Neural Networks ; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012; Volume 118, ISBN 1461207452. [ Google Scholar ]
  • MacKay, D.J.C. Bayesian Interpolation. Neural Comput. 1992 , 4 , 415–447. [ Google Scholar ] [ CrossRef ]
  • Ozdemir, S.; Susarla, D. Feature Engineering Made Easy: Identify Unique Features from Your Dataset in Order to Build Powerful Machine Learning Systems ; Packt Publishing Ltd.: Birmingham, UK, 2018; ISBN 1787286479. [ Google Scholar ]
  • Tancredi, A.; Anderson, C.; O’Hagan, A. Accounting for Threshold Uncertainty in Extreme Value Estimation. Extremes 2006 , 9 , 87–106. [ Google Scholar ] [ CrossRef ]
  • Goodspeed, A.; Heiser, L.M.; Gray, J.W.; Costello, J.C. Tumor-Derived Cell Lines as Molecular Models of Cancer Pharmacogenomics. Mol. Cancer Res. 2016 , 14 , 3–13. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Gambardella, V.; Tarazona, N.; Cejalvo, J.M.; Lombardi, P.; Huerta, M.; Roselló, S.; Fleitas, T.; Roda, D.; Cervantes, A. Personalized Medicine: Recent Progress in Cancer Therapy. Cancers 2020 , 12 , 1009. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. arXiv 2016 , arXiv:1609.02907. [ Google Scholar ]
  • Joseph, V.R. Optimal Ratio for Data Splitting. Stat. Anal. Data Min. ASA Data Sci. J. 2022 , 15 , 531–538. [ Google Scholar ] [ CrossRef ]
  • Dunford, R.; Su, Q.; Tamang, E. The Pareto Principle. 2014. Available online: https://core.ac.uk/download/pdf/200202097.pdf (accessed on 24 June 2024).
  • Nti, I.K.; Nyarko-Boateng, O.; Aning, J. Performance of Machine Learning Algorithms with Different K Values in K-Fold Cross-Validation. Int. J. Inf. Technol. Comput. Sci. 2021 , 13 , 61–71. [ Google Scholar ]
  • Wong, T.-T.; Yeh, P.-Y. Reliable Accuracy Estimates from K-Fold Cross Validation. IEEE Trans. Knowl. Data Eng. 2019 , 32 , 1586–1594. [ Google Scholar ] [ CrossRef ]
  • Liu, Q.; Hu, Z.; Jiang, R.; Zhou, M. DeepCDR: A Hybrid Graph Convolutional Network for Predicting Cancer Drug Response. Bioinformatics 2020 , 36 , i911–i918. [ Google Scholar ] [ CrossRef ]
  • Li, M.; Wang, Y.; Zheng, R.; Shi, X.; Li, Y.; Wu, F.-X.; Wang, J. DeepDSC: A Deep Learning Method to Predict Drug Sensitivity of Cancer Cell Lines. IEEE/ACM Trans. Comput. Biol. Bioinform. 2021 , 18 , 575–582. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Shahzad, M.; Tahir, M.A.; Khan, M.A.; Jiang, R.; Malick, R.A.S. EBSRMF: Ensemble Based Similarity-Regularized Matrix Factorization to Predict Anticancer Drug Responses. J. Intell. Fuzzy Syst. 2022 , 43 , 3443–3452. [ Google Scholar ] [ CrossRef ]
  • Golbraikh, A.; Tropsha, A. Beware of Q2! J. Mol. Graph. Model. 2002 , 20 , 269–276. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Zhao, L.; Qiu, T.; Jiang, D.; Xu, H.; Zou, L.; Yang, Q.; Chen, C.; Jiao, B. SGCE Promotes Breast Cancer Stem Cells by Stabilizing EGFR. Adv. Sci. 2020 , 7 , 1903700. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Zhang, S.; Wang, H.; Liu, A. Identification of ATP1B1, a Key Copy Number Driver Gene in Diffuse Large B-Cell Lymphoma and Potential Target for Drugs. Ann. Transl. Med. 2022 , 10 , 1136. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Katuwal, N.B.; Kang, M.S.; Ghosh, M.; Hong, S.D.; Jeong, Y.G.; Park, S.M.; Kim, S.-G.; Sohn, J.; Kim, T.H.; Moon, Y.W. Targeting PEG10 as a Novel Therapeutic Approach to Overcome CDK4/6 Inhibitor Resistance in Breast Cancer. J. Exp. Clin. Cancer Res. 2023 , 42 , 325. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Xu, Z.; Xiang, L.; Peng, L.; Gu, H.; Wang, Y. Comprehensive Analysis of the Immune Implication of AKAP12 in Stomach Adenocarcinoma. Comput. Math. Methods Med. 2022 , 2022 , 3445230. [ Google Scholar ] [ CrossRef ]
  • Lodi, M.; Voilquin, L.; Alpy, F.; Molière, S.; Reix, N.; Mathelin, C.; Chenard, M.-P.; Tomasetto, C.-L. STARD3: A New Biomarker in HER2-Positive Breast Cancer. Cancers 2023 , 15 , 362. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Shen, R.; Mo, Q.; Schultz, N.; Seshan, V.E.; Olshen, A.B.; Huse, J.; Ladanyi, M.; Sander, C. Integrative Subtype Discovery in Glioblastoma Using ICluster. PLoS ONE 2012 , 7 , e35236. [ Google Scholar ] [ CrossRef ]
  • Bishop, C.M.; Tipping, M.E. Bayesian Regression and Classification. Nato Sci. Ser. Sub Ser. III Comput. Syst. Sci. 2003 , 190 , 267–288. [ Google Scholar ]
  • Ying, X. An Overview of Overfitting and Its Solutions. J. Phys. Conf. Ser. 2019 , 1168 , 022022. [ Google Scholar ] [ CrossRef ]
  • Zhang, Z.; Zhang, Y.; Li, Z. Removing the Feature Correlation Effect of Multiplicative Noise. Adv. Neural Inf. Process. Syst. 2018 , 31 . Available online: https://papers.nips.cc/paper_files/paper/2018/hash/e7b24b112a44fdd9ee93bdf998c6ca0e-Abstract.html (accessed on 24 June 2024).
  • Guan, N.-N.; Zhao, Y.; Wang, C.-C.; Li, J.-Q.; Chen, X.; Piao, X. Anticancer Drug Response Prediction in Cell Lines Using Weighted Graph Regularized Matrix Factorization. Mol. Ther. Nucleic Acids 2019 , 17 , 164–174. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Wang, L.; Li, X.; Zhang, L.; Gao, Q. Improved Anticancer Drug Response Prediction in Cell Lines Using Matrix Factorization with Similarity Regularization. BMC Cancer 2017 , 17 , 513. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Kohavi, R.; John, G.H. Wrappers for Feature Subset Selection. Artif. Intell. 1997 , 97 , 273–324. [ Google Scholar ] [ CrossRef ]
  • Guyon, I.; Elisseeff, A. An Introduction to Variable and Feature Selection. J. Mach. Learn. Res. 2003 , 3 , 1157–1182. [ Google Scholar ]

Click here to enlarge figure

TypeRaw DataProcessed
Drugs2424
Cell lines1061363
Gene expression20,049 19,389
Copy number alteration24,96024,960
Single-nucleotide mutation16671667
TypeRaw DataProcessed
Drugs9898
Cell lines1124555
Gene expression11,83311,712
Copy number alteration24,96024,959
Single-nucleotide mutation7054
TypeGenes
Gene expressionTFPI2, SGCE, PPIC, ATP1B1, DSP, PEG10, MAGEA4, C1S, CPVL, GATA6.
Copy number alterationRASSF8AS1, MIR4302, CCNE1, RASSF8, LMNTD1, LOC102724958, STARD3, LINC00906, KRAS, LYRM5
Single-nucleotide mutationAKAP12, TP53, NLRP3, ATRX, OBSCN, CARD10, KRAS, ATR, FZD1, GPR112
TypeGenes
Gene expressionTFPI2, SGCE, ATP1B1, DSP, PEG10, MAGEA4, C1S, CPVL, GATA6, RP11-490M8.1
Copy number alterationRASSF8-AS1, STARD3, PPP1R1B, SLC35E3, ZNF536, SOX5, TRIT1, TMEM75, ZNF879, ST8SIA1
Single-nucleotide mutationAKAP12, TP53, NLRP3, ATRX, OBSCN, CARD10, KRAS, ATR, FZD1, GPR112
MethodInput FeaturesModel
Features
TrainingValidationTestingTime
RMSEPCCR RMSEPCCR RMSEPCCR
Baseline46,01646,0160.0880.9350.8730.120.8640.7440.130.130.7374:03:11
Iterative Similarity Bagging (ISB)
ISB bag size = 50,
iterations = 5
46,01613,8440.0870.9350.8750.1180.8710.7550.1260.870.75410:17
ISB bag size = 50,
iterations = 10
46,01612,2700.1030.9090.8240.1270.8470.7160.130.8580.7369:16
ISB bag size = 100,
iterations = 10
46,01659260.080.9460.8940.1170.870.7550.1290.8620.745:50
ISB bag size = 200,
iterations = 10
46,01623900.0910.9290.8630.1160.8750.7640.1240.8730.763:51
ISB bag size = 300,
iterations = 10
46,01622610.0910.9290.8630.1190.8660.7470.1260.8680.753:48
ISB bag size = 400,
iterations = 10
46,01621190.10.9170.8370.1190.8660.750.1270.8650.7473:42
Bayesian Ridge Regression with Iterative Similarity Bagging (BRR-ISB)
BRR23,68323,6830.0870.9370.8770.1170.8720.7580.1250.870.75415:43
BRR-ISB bag size = 50,
iterations = 5
23,68357400.0930.9260.8570.1150.8760.7660.1240.8720.7595:48
BRR-ISB bag size = 50,
iterations = 10
23,68348220.0940.9240.8540.1210.860.7390.1270.8650.7485:20
BRR-ISB bag size = 100,
iterations = 10
23,68321330.0990.9170.840.1170.8710.7580.1250.8690.7543:57
BRR-ISB bag size = 200,
iterations = 10
23,68312730.0970.9190.8450.1140.8790.7710.1210.8790.773:50
BRR-ISB bag size = 300,
iterations = 10
23,68312450.0990.9180.840.1160.8720.760.1220.8770.7683:47
BRR-ISB bag size = 400,
iterations = 10
23,68312600.0970.9210.8460.1160.8740.7630.1270.8670.7493:50
Similarity Network Fusion
SNF46,0163630.130.8540.7210.130.8410.6990.1340.8480.7163:02
BRR-SNF23,6833630.1260.8590.7380.1270.8440.7130.1380.8470.73:02
MethodInput FeaturesModel
Features
TrainingValidationTestingTime
RMSEPCCR RMSEPCCR RMSEPCCR
Baseline36,72536,7250.0230.9340.8720.0320.8790.7710.030.890.79119:38:19
Iterative Similarity Bagging (ISB)
ISB bag size = 50,
iterations = 5
36,72511,9870.0220.9430.8890.0310.8810.7760.030.8950.859:48
ISB bag size = 50,
iterations = 10
36,72510,3670.0230.9350.8740.0310.880.7740.030.8950.79951:44
ISB bag size = 100,
iterations = 10
36,72542110.0240.9330.8670.0310.8870.7830.0290.90.80832:52
ISB bag size = 200,
iterations = 10
36,72511740.0260.9190.8430.0310.8810.7750.030.8940.79826:07
ISB bag size = 300,
iterations = 10
36,7259740.0250.920.8460.0310.8830.780.0290.8960.80324:31
ISB bag size = 400,
iterations = 10
36,7259560.0250.920.8460.0310.8820.7770.0290.8960.80124:43
Bayesian Ridge Regression with Iterative Similarity Bagging (BRR-ISB)
BRR36,72518,3920.0240.930.8660.0320.8780.7710.030.8940.7981:23:11
BRR-ISB bag size = 50,
iterations = 5
18,39253690.0230.9380.8790.0310.8850.7830.0290.8990.80736:19
BRR-ISB bag size = 50,
iterations = 10
18,39245090.0240.9280.8590.0310.8820.7750.030.8940.79732:07
BRR-ISB bag size = 100,
iterations = 10
18,39216810.0260.9150.8350.0310.8810.7740.030.8920.79423:28
BRR-ISB bag size = 200,
iterations = 10
18,3926060.0260.9160.8380.0310.8830.7770.0290.8960.80119:58
BRR-ISB bag size = 300,
iterations = 10
18,3925490.0280.9040.8170.0320.8780.7710.030.8920.79621:08
BRR-ISB bag size = 400,
iterations = 10
18,3925660.0280.9030.8150.0320.8770.7690.030.8920.79621:33
Similarity Network Fusion
SNF36,7255550.0290.8960.8020.0330.870.7560.0310.8840.78220:23
BRR-SNF18,3925550.0290.8940.7990.0330.8690.7560.0310.8840.78120:19
ModelRMSEPCCR
WGRMF0.560.72-
EBSRMF0.210.86
DeepDSC0.23-78
SRMF0.570.71
BRR-ISB (Proposed)0.120.87977
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Almutiri, T.M.; Alomar, K.H.; Alganmi, N.A. Integrating Multi-Omics Using Bayesian Ridge Regression with Iterative Similarity Bagging. Appl. Sci. 2024 , 14 , 5660. https://doi.org/10.3390/app14135660

Almutiri TM, Alomar KH, Alganmi NA. Integrating Multi-Omics Using Bayesian Ridge Regression with Iterative Similarity Bagging. Applied Sciences . 2024; 14(13):5660. https://doi.org/10.3390/app14135660

Almutiri, Talal Morizig, Khalid Hamad Alomar, and Nofe Ateq Alganmi. 2024. "Integrating Multi-Omics Using Bayesian Ridge Regression with Iterative Similarity Bagging" Applied Sciences 14, no. 13: 5660. https://doi.org/10.3390/app14135660

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

  • Open access
  • Published: 28 June 2024

Genes divided according to the relative position of the longest intron show increased representation in different KEGG pathways

  • Pavel Dvorak   ORCID: orcid.org/0000-0002-2124-7219 1 , 2 , 3 ,
  • Viktor Hlavac   ORCID: orcid.org/0000-0003-0695-0552 2 , 4 ,
  • Vojtech Hanicinec   ORCID: orcid.org/0000-0002-0515-4848 2 ,
  • Bhavana Hemantha Rao   ORCID: orcid.org/0000-0001-9559-5606 2 &
  • Pavel Soucek   ORCID: orcid.org/0000-0002-4294-6799 2 , 4  

BMC Genomics volume  25 , Article number:  649 ( 2024 ) Cite this article

Metrics details

Despite the fact that introns mean an energy and time burden for eukaryotic cells, they play an irreplaceable role in the diversification and regulation of protein production. As a common feature of eukaryotic genomes, it has been reported that in protein-coding genes, the longest intron is usually one of the first introns. The goal of our work was to find a possible difference in the biological function of genes that fulfill this common feature compared to genes that do not. Data on the lengths of all introns in genes were extracted from the genomes of six vertebrates (human, mouse, koala, chicken, zebrafish and fugu) and two other model organisms (nematode worm and arabidopsis). We showed that more than 40% of protein-coding genes have the relative position of the longest intron located in the second or third tertile of all introns. Genes divided according to the relative position of the longest intron were found to be significantly increased in different KEGG pathways. Genes with the longest intron in the first tertile predominate in a range of pathways for amino acid and lipid metabolism, various signaling, cell junctions or ABC transporters. Genes with the longest intron in the second or third tertile show increased representation in pathways associated with the formation and function of the spliceosome and ribosomes. In the two groups of genes defined in this way, we further demonstrated the difference in the length of the longest introns and the distribution of their absolute positions. We also pointed out other characteristics, namely the positive correlation between the length of the longest intron and the sum of the lengths of all other introns in the gene and the preservation of the exact same absolute and relative position of the longest intron between orthologous genes.

Peer Review reports

Introduction

Four distinct types are recognized among introns, generally defined as sequences within genes that are subsequently excised from the corresponding RNA transcripts [ 1 , 2 ]. These are the so-called group I and group II self-splicing introns, transfer RNA introns and spliceosomal introns. Spliceosomal introns, which are the focus of this work, are excised from precursor mRNAs by spliceosome, a special ribonucleoprotein complex [ 3 ]. Introns of this type were found in the nuclear genomes of all representatives of the Eukarya domain investigated so far, but were not observed in any of the representatives of the other two domains – Bacteria and Archaea [ 4 ]. Part of a research devoted to introns is therefore focused mainly on clarifying questions about the origin and evolution of introns, another part is mostly concerned with unraveling the biological functions of introns in the genes of extant species.

As part of a long-standing scientific discussion, three main scenarios for the origin and evolution of introns have been formulated [ 4 , 5 , 6 , 7 , 8 ]. The introns-early theory assumes that introns were present in the common ancestor of all three domains of life—the last universal common ancestor (LUCA)—and were subsequently eliminated in the genomes of all representatives of bacteria and archaea. Similarly, some groups of eukaryotic organisms mainly lost introns, resulting in large differences in the number of introns embodied in the genomes of representatives living today.

In contrast, the introns-late theory places the origin of spliceosomal introns at a much later time, between the common ancestors of all modern eukaryotic organisms. The number of introns increased differently in diverse lineages mainly by the mechanism of reverse splicing and the insertion of transposable elements.

The third variant was brought by the theory called introns-first, which infers the presence of introns already at the very beginning of the creation of genes, even before the creation of DNA, in an environment in which all processes of transmission and implementation of hereditary information were mediated and controlled by RNA molecules, the so-called RNA world [ 9 , 10 ].

The number and size of introns exhibit a great variability between the genomes of different present-day organisms. It was calculated that in the genomes of organisms with a lower intron density (approximately up to 3 introns per 1 kilobase pair, kbp) shorter introns (with an average length of about 75 bp) occur, without a significant correlation between density and length. On the contrary, in organisms with a higher density of introns, the positive correlation between density and length is already significant [ 8 ]. Among the so-called intron-poor organisms are a number of unicellular organisms, including, for example, the yeast Saccharomyces cerevisiae (with a density of about 0.05 introns per gene and an average length of 256 bp) [ 11 ]. The genomes of all vertebrates belong to the intron-rich group, of which the genomes of mammals show the highest density and length of introns (approximately 8 introns per gene with an average length of around 6 kbp) [ 6 , 12 ]. In intron-rich organisms, we can find a great variability in the number and size of introns, even among individual genes.

As a general rule for most eukaryotic species studied, the first intron is the longest intron in a given gene. This rule is even more pronounced when the first intron is located in the 5' UTR region of the gene [ 13 ]. One of the reasons for the exceptional status of the first introns in a gene may be their increased association with affecting gene expression, which has already been suggested in analyses of various eukaryotic species [ 14 , 15 ]. This phenomenon is often referred to as intron-mediated enhancement (IME) and it has even been suggested that introns with such a function could be used to activate gene expression in gene therapy [ 16 ].

Although data from whole-genome sequencing of a number of organisms are already available, their interpretation is still ongoing and incomplete. Also, some genomic characteristics have not yet been studied in detail in the literature. At the beginning of our work, we asked ourselves this question: What is the actual percentage of genes that fulfill the above-mentioned characteristic, i.e. that the longest intron in a gene is located among the first introns? In our initial work on a selection of protein-coding genes of the human genome, we showed that approximately 64% of the genes have the longest intron in the 1 st tertile of all introns in the gene, while 19% in the second and 17% in the third. Notable peaks were seen at the position in the middle of the gene and the last intron (5 and 6%, respectively) [ 17 ]. It was therefore clear that a non-negligible number of genes do not have the longest introns in the first positions. Consequently, we asked the second question: Do genes that have the longest intron in the 2 nd or 3 rd tertile of all introns show any specific functional characteristics compared to genes that have the longest intron in the 1 st tertile? Such a relationship was implied in our aforementioned work on a subset of human genes. An example of DNA repair genes, among which there is a significantly higher representation of genes with the longest intron in the 2 nd or 3 rd tertile, was demonstrated for human genes.

In this follow-up work, we set ourselves the task of finding out whether the relationships between the position of the longest intron and the biological function of genes have a more general validity among other eukaryotic organisms. Our analyses were therefore performed on genome-wide data from six representatives of vertebrates, as the group with the largest volume of biological knowledge, and two representatives of more distant model organisms as outgroups. We bring some new information about the longest introns in genes with a broader validity in this article.

Materials and methods

Primary data and species.

Available data on all protein-coding genes of a given organism were downloaded from the Ensembl database ( https://www.ensembl.org/index.html ; Release 109) [ 18 ], from which the lengths of all introns were calculated according to the algorithm described below. The analyzed vertebrates included: human ( Homo sapiens ), mouse ( Mus musculus ), koala ( Phascolarctos cinereus ), chicken ( Gallus gallus ), zebrafish ( Danio rerio ) and fugu ( Takifugu rubripes ). Intron data from a nematode worm ( Caenorhabditis elegans ) and arabidopsis ( Arabidopsis thaliana ) were taken as outgroups. The MANE Select and Ensembl Canonical Flag features, which we set as selection criteria, has not yet been established for all genomes of organisms available in the given database, and therefore this characteristics was the main limitation for the selection of vertebrate organisms we could test ( http://www.ensembl.org/info/genome/genebuild/transcript_quality_tags.html ). Regarding the nematode worm data, the main gene isoforms have not yet been identified for this organism, therefore we analyzed the set of all gene transcripts included in the APPRIS system.

Algorithm for calculating the lengths of introns

In the first step, we obtained the positions of the beginnings (hereinafter referred to as Exon Start) and ends (Exon End) of all exons (coding as well as non-coding exons in untranslated regions) in the desired transcripts of all protein-coding genes of the tested organism. For this, we used a query to the Ensembl database via the BioMart tool. In order to evaluate the most representative transcripts, we used MANE Select flags for Filters criteria in the human genome [ 19 ]. Considering that MANE Select is not available for other genomes, we sorted the transcripts in other organisms based on the selection of criteria Gene type—protein coding and at the same time Ensembl Canonical—only. Among the data (Attributes) we queried for each transcript there were: Gene stable ID, Gene stable ID version, Transcript Stable ID, Gene name, Strand, Exon rank in transcript, Exon region start (bp) and Exon region end (bp).

In the second step, we used an in-house shell script to calculate the lengths of individual introns from the obtained data, the code is available in the Supplementary Information as Introns2.5 (SH Source File) or Supplementary Information S1 (Microsoft Excel File). Genes were sorted not by their names (symbols) but by Gene or Transcript Stable IDs. We calculated intron lengths in bp for genes on the Forward strand according to the formula {[Exon(n + 1)Start – Exon(n)End] – 1}; n are positive integers starting from 1. For genes on the Reverse strand, this formula was modified to {[Exon(n)Start – Exon(n + 1)End] – 1}. The script then created a table with the lengths of all introns for each protein-coding gene and searched for the position of the longest intron in the gene. In particular, we used AWK language to perform the following steps: 1) Calculate the lengths of the introns; 2) Extract Gene names if they were available; 3) Indicate the longest intron; 4) Calculate the relative position of the longest intron (the ratio of the position of the longest intron to the total number of introns in the given gene). Then we used Bourne Again Shell (BASH) to create a matrix of Gene or Transcript Stable IDs versus the lengths of the introns in bp. The script can handle various delimiters and require an input file exported from Ensembl BioMart query with either Gene Stable ID or Transcript stable ID as the main unique identifier. The first seven columns in the file must be: (1) Gene stable ID, (2) Transcript stable ID, (3) Gene name, (4) Strand, (5) Exon rank in transcript, (6) Exon region start and (7) Exon region end. The primary data about length of introns obtained by this algorithm are stored for individual tested organisms in Supplementary Information Tables S2-9.

In case the gene contained several introns with the same longest value, the longest intron was chosen as the one that had the highest absolute position number, i.e., position furthest from the 5' end of the gene. In the genomes of tested vertebrates, this situation occurred only in less than 0.1% of genes, in nematode worm and arabidopsis in less than 1%. The list of genes for which this situation occurred is presented in Table S10.

Gene set enrichment analysis (GSEA)

Based on our previous work with the human genome and the knowledge that the distribution of the positions of the longest introns showed three peaks—at the beginning, middle and end of the gene, we divided all protein-coding genes of the studied organism into those containing less than three introns and which contain three or more introns [ 17 ]. We further divided genes with three or more introns into three subgroups according to the relative position of the longest intron (defined above). Thus, the position of the longest intron in the 1 st tertile of all introns means that the relative ratio is in the interval (0;0.33]. Similarly, for the position of the longest intron in the 2 nd tertile, it is in the interval (0.33;0.66], and for the 3 rd tertile in (0.66;1]. In GSEA analyses, three subgroups of all protein-coding genes were compared from individual organisms; these input data for GSEA analyses can be obtained from Table S11.

We performed GSEA in parallel using two web platforms – g:Profiler ( https://biit.cs.ut.ee/gprofiler/gost ) [ 20 ] and ShinyGO ( http://bioinformatics.sdstate.edu/go/ ) [ 21 ], taking into account the procedure recommended in the work of Reimand et al. [ 22 ]. With the g:Profiler program, we used the option to analyze multiple gene files simultaneously ( Run as multiquery option), other possible options for setting the result parameters were left in the default settings. The default settings were also left in the ShinyGO program. We used the AmiGO 2 project ( http://amigo.geneontology.org/amigo/landing ) [ 23 ] for a better navigation in the hierarchical structure of GO terms (Gene Ontology; http://geneontology.org/ ) [ 24 ] in the resulting lists of terms and the tool for creating Venn diagrams Multiple List Comparator ( https://molbiotools.com/listcompare.php ) [ 25 ] for finding common terms between these lists.

Random gene lists

For each set of the GSEA input data, we created equally large set of control data. The creation of these control data proceeded in such a way that all the protein-coding genes of the given organism were randomly divided into subgroups with the same number of genes as in the subgroups of the GSEA input data, where the genes were divided according to the relative position of the longest intron. The "shuf" command in the Linux operating system was used to randomize individual genes into these control sets, which are available in Table S12. A basic statistical comparison of the primary and control data sets showed the random selection of genes in the control data.

Analysis of orthologous genes

The analysis of orthologous genes carried out by us aimed primarily to demonstrate whether a situation can occur between these genes where the same absolute or even relative position of the longest intron is preserved. Furthermore, we were interested in an approximate estimate of the frequency of this phenomenon and whether its localization within the gene, i.e. its relative position, can influence any change in the position of the longest intron. On the data on protein-coding genes obtained for the above-mentioned analyses, we monitored the change in the absolute and relative position of the longest intron in three subgroups of orthologous genes. From the human genes with the longest intron in the 1 st , 2 nd and 3 rd tertiles, we randomly selected 20 genes from each group and searched for the corresponding orthologous genes in the tested vertebrates. Information available in the Ensembl database was used to assess the orthologous relationship. In cases of the presence of multiple orthologous genes in one species at the same time, only one of them was selected among these genes, according to the highest values ​​for Gene Order Conservation (GOC) score and the Whole Genome Alignment (WGA) coverage ( https://www.ensembl.org/Help/View?id=135 ). Due to the random selection of genes, human genes that have no described orthologous genes in other vertebrate species were also selected for the analysis.

Statistics and visualization

Freely available PAST software package (PAleontological STatistics, Natural History Museum, University of Oslo; version 4.11; https://www.nhm.uio.no/english/research/resources/past/ ) was used to evaluate the data using basic statistical methods for comparison of several univariate groups (e.g. the Kruskal–Wallis and Mann–Whitney pairwise tests). The usual threshold for statistical significance ( p  < 0.05) was accepted. Bonferroni adjustment or False Discovery Rate corrections were used during multiple testing depending on the availability of these calculations in the used programs. Hierarchical clustering with average linkage method and subsequent heatmap creation was performed in Morpheus web server (Broad Institute, Cambridge, USA; https://software.broadinstitute.org/morpheus ). Tree-map visualizations of lists of GO terms were created with Revigo software ( http://revigo.irb.hr/ ) [ 26 ]. KEGG diagrams (Kyoto Encyclopedia of Genes and Genomes; https://www.genome.jp/kegg/ ) [ 27 ] with genes highlighted were created with the help of Pathview ( https://pathview.uncc.edu/ ) [ 28 ].

More than 40% of protein-coding genes have the longest intron located in the 2 nd or 3 rd tertiles

Among the six analyzed vertebrate genomes, there is up to a two-fold difference in the number of annotated protein-coding genes, the lowest number in chicken (16,711) and the highest in zebrafish (30,153). However, the intron density is comparable, i.e. 6 or 7 introns per one gene (median). The representation of intronless genes ranges from 3% in chicken to 7% in mouse and koala. The percentage of genes with three or more introns ranges between 73 and 80% in these genomes. Regarding the position of the longest intron, the longest intron in the 1 st tertile had 44% (zebrafish) to 58% (human) of the genes with three or more introns. Compared to the tested vertebrates, the intron density value of worm is 5 and the percentage of genes with the longest intron in the 1st tertile is only 39%. In arabidopsis, the intron density is even lower (3), and genes with three or more introns make up only 51% of all protein-coding genes, of which 49% have the longest intron lying in the 1 st tertile. Table 1 provides an overview of the analyzed genomes.

The longest introns located in the 1 st tertile of all introns in the gene are significantly longer than the longest introns in the 2 nd or 3 rd tertile

By dividing the protein-coding genes into a group with the longest intron located in the 1 st tertile and a group with the longest intron in the 2 nd or 3 rd tertile, two approximately equally numerous groups will be created in all tested species. For the two groups of genes defined in this way, we compared the lengths of the longest introns. In all analyzed genomes of vertebrates and two other model organisms, the group with the longest intron in the 1 st tertile showed significantly longer introns. The results of this analysis are shown for selected species in Fig. 1 , and numerical values for all species are summarized in Table S13.

figure 1

Comparison of longest intron lengths for genes with the longest intron position in the 1 st versus 2 nd or 3. rd tertile for mouse ( A ), fugu ( B ) and arabidopsis ( C ). The median is shown with a horizontal line inside the boxes. In all species, there is a significant statistical difference between the two tested groups ( p  < 0.0001, Mann–Whitney test)

The tested groups of genes according to the relative position of the longest intron show a different distribution of absolute positions

Groups of genes divided by the relative position of the longest intron in the 1 st versus 2 nd or 3 rd tertile differ in the representation profile of the absolute positions of these introns. For all tested species, in the 1 st tertile group, the percentage of the intron with the absolute position No. 1 ranged between 61 and 77%, intron No. 2 between 15 and 24%, and intron No. 3 between 4 and 9%. In the 2 nd or 3 rd tertile group, no introns with the absolute position No. 1 were present and the percentage representation of the longest introns with the absolute position No. 2, 3, 4, 5, 6 and 7 was in the range of 13–21%, 17–24%, 13–17%, 10–12%, 7–9% and 6–7%, respectively. Histograms showing the described distributions are shown for selected species in Fig. 2 , the calculated values are disclosed for all species in Table S14.

figure 2

Distribution of the longest intron absolute positions in the two compared groups of genes defined by the relative positions of these introns (1 st versus 2 nd or 3 rd tertile)

All tested species showed a positive correlation between the lengths of the longest introns and lengths of all other introns in genes

A correlation analysis between the variables – the length of the longest intron and sum of the lengths of all other introns in the gene – was performed. Firstly, all protein-coding genes in genomes were evaluated together. Correlation coefficients (Spearmen's rs) ranged from 0.72 to 0.8 ( P values less than 0.0001) for all vertebrates and nematode worm. These values can be interpreted as a strong association. In arabidopsis, the same coefficient had a value of 0.33 ( P  < 0.0001). This result expresses only a weak relationship. Secondly, the two subgroups – 1 st versus non-1 st tertile – were evaluated separately. A similar correlation patterns as well as coefficient values were detected for both subgroups as for the sets with all genes. Specific differences can be observed in the individual scatter plots, which are related to different lengths of introns in evolutionarily more distant groups (Fig. 3 ). However, a positive correlation between the two variables can be considered a general trend, although it was stronger in animals than in the one member of plants tested.

figure 3

Scatter plots showing the correlation between the length of the longest intron and the sum of the lengths of all other introns in the gene. Protein-coding genes in human ( A ), mouse ( B ), koala ( C ), chicken ( D ), zebrafish ( E ), fugu ( F ), nematode worm ( G ) and arabidopsis ( H ) were tested, with overall Spearman rs coefficients 0.8, 0.77, 0.79, 0.79, 0.77, 0.72, 0.76 and 0.33, respectively; the corresponding P values were less than 0.0001 in all cases

GSEA analyses were performed separately for genes with one and two introns and for genes with three or more introns. Between the significant results for the individual tested organisms, their intersections were further sought with the intention of defining the most general trend. The koala was excluded from these analyses because this species still lacks sufficient data in the Gene Ontology and KEGG pathways databases.

For genes with one or two introns, only one significantly increased KEGG pathway— Neuroactive ligand-receptor interaction —was found common to all 5 tested vertebrates. The Cytokine-cytokine receptor interaction pathway was significant for 3 vertebrates (human, mouse and zebrafish).

For genes with three or more introns, the same two subgroups as in the previous analyses were tested – genes with the longest intron in the 1 st versus 2 nd or 3 rd tertile. Intersections between the results in individual species showed that the 1 st tertile group is most generally characterized by the following pathways: ABC transporters, Arginine and proline metabolism, Calcium signaling pathway, Endocytosis, Glycerolipid metabolism, Glycerophospholipid metabolism, Inositol phosphate metabolism, Purine metabolism, and Sphingolipid metabolism . All these pathways were found in the intersection of at least six tested organisms, at least one of which was a representative from the outgroups. Significantly, the pathways Spliceosome , Ribosome , Proteasome , and Ribosome biogenesis in eukaryotes are characteristic for the genes with the longest intron in the 2 nd or 3 rd tertile. For these pathways, we found a match in all seven tested organisms for Spliceosome , for the others there was a match in at least five organisms, at least one of which was from outgroup species. The results of GSEA analyses and their visualization are presented in Table S15 and Fig. 4 .

figure 4

KEGG pathways significantly overrepresented in genes with the longest intron in 1 st versus 2 nd or 3 rd tertile of introns

Genes with the longest intron in the 2 nd or 3 rd tertile are found among all components of the spliceosome

For the Spliceosome pathway, we took a closer look at which specific genes belong to the genes with the longest intron in the 2 nd or 3 rd tertile and also in which of the tested species this characteristic is preserved. We can conclude that all Spliceosome components contain proteins whose genes fall into the mentioned characteristic. Seven out of 70 monitored components retained this characteristic in all 7 tested organisms and 15 components in 6 organisms. Among the most conserved components of the spliceosome are: Sm, Lsm, U1-70K, U1C, p68, U2B, SF3a, SF3b, Prp8BP, Sad1, Prp38, PRL1, Syf, G10, AQR, Y14, THOC, hnRNPs, and SR. Figure 5 provides a visualization of this analysis for zebrafish.

figure 5

The Spliceosome pathway in the genome of the model organism zebrafish. The genes, which have the longest intron located in the 2 nd or 3 rd tertile of all introns are highlighted in red

Control data did not show the same repeatability of significant results across organisms as did the genomic data

Following the same procedure as for the primary data, GSEA analyses were also performed with the control gene sets. Due to the nature and amount of tested data, some GO terms, but not pathways, were found to be statistically significant for individual organisms, even for the control sets. However, these were always terms other than those that were significant in the primary genomic data, and above all none of these GO terms was replicated in any other of the tested organisms. The recurrence of the same GO terms and pathways among different species in the primary data is a strong argument for the biological significance of the results presented here.

A phenomenon where the same absolute and relative position of the longest intron is preserved among orthologous genes is rare

From 60 randomly selected human genes and their respective orthologous genes in 5 other vertebrate representatives, we demonstrated the preservation of the same absolute position of the longest intron in 3 genes ( ACAP2, LMCD1 and NPAS4 ). In all these cases, it was the first intron in the gene. From the point of view of the relative position of the longest intron, we observed the preservation of exactly the same value among all found orthologous genes only in 1 gene ( NPAS4 ). We observed preservation of the relative position within the same tertile group for 8 genes from the 1 st tertile group, 2 genes from the 2 nd tertile group and no genes from the 3rd tertile group. All genes included in this analysis and monitored values ​​are recorded in Table S16, visualization of the results is provided in Figure S1.

Although longer introns represent an energy and time burden for the genetic apparatus of the cell trying to efficiently transfer information from its storage to concrete implementation, they perform important and apparently irreplaceable tasks in this process. The results of whole-genome sequencing of many organisms, which have recently increased significantly and become freely available, have enabled our deeper understanding of genome organization. Due to the complexity of this issue, however, there are still many unanswered questions. Our work shows the relationship between genes with a certain biological function and the location of the longest intron within the exon–intron structure of the gene. According to the data presented, the differences are evident even when the genes are roughly divided into genes with the longest intron located in the 1 st tertile of all introns and genes with the longest intron in the 2 nd or 3 rd tertile. In the first of these two groups, genes from a wide spectrum of biological functions, primarily associated with the development of more complex multicellular organisms, predominate. In the second group, a higher representation of genes associated with the biogenesis and function of the spliceosome and ribosomes can be demonstrated.

The presence of longer introns closer to the 5' end of eukaryotic genes is a well-known phenomenon [ 13 ], which is explained by a greater occurrence of regulatory elements and thus a greater restriction of selectivity in these introns. There is evidence that longer introns in general contain higher densities of conserved sites [ 29 ] and various regulatory elements [ 15 , 30 ]. However, introns of thousands to tens of thousands of base pairs already represent a significant burden for the process of transcription and splicing, and therefore other functional advantages of this system are sought. Prolonged transcription time in the case of long introns has been proposed as one of the mechanisms influencing the resulting gene expression [ 31 ].

Splicing of long introns must overcome the problem of considerable physical distance between the sequences involved. In long introns in invertebrate organisms, combined donor–acceptor splicing sites (called RP-sites) were detected to an increased extent, and a process of gradual removal of smaller sections named recursive splicing was proposed [ 32 ]. However, in long introns of vertebrates, these RP-sites were not observed to an increased extent, and therefore modified solutions were suggested. Shepard et al. [ 33 ] recognized an increased amount of SINE and LINE repetitive sequences in long introns of vertebrates and proposed the formation of multiple hairpins with large loops. These hairpins can form compact spatial structures facilitating splicing. Kelly et al. [ 34 ] further studied other possibilities of recursive splicing in vertebrates and found usage of RP-sites with alternative sequences.

It turns out that different exon–intron architecture is used by different groups of genes according to the biological context. If we compare housekeeping genes with tissue-specific genes, the length of housekeeping genes is significantly smaller [ 35 ]. Similarly, genes whose products are used in a rapid biological response have lower intron density than genes whose products are applied after a certain time delay [ 36 , 37 ]. Our results correspond with those reported in the work of Schonfeld and colleagues [ 38 ]. Using a computer model, they showed that the introns of essential genes show such specific characteristics that essential genes can be defined and distinguished from non-essential genes. Their work focused primarily on the first introns, where they demonstrated that essential genes have significantly shorter introns than non-essential genes.

The RNA world theory deals with the explanation of the evolutionary development of the exon–intron organization of current genes [ 39 ]. This theory considers the existence of RNA and its essential function for the transfer of genetic information and its implementation among the first cells on our planet even before the existence of DNA and the function of proteins as catalysts. Among other things, this theory considers the irreplaceable role of introns, which also existed already at this initial stage [ 40 , 41 ]. The later incorporation of additional introns in the form of transposable elements with no initial function for gene expression has created a very heterogeneous intron system that is complex to understand and reveal from our current situation [ 5 , 42 ]. As a consequence, a certain gene architecture most likely promotes or suppresses gene evolution [ 43 ].

The main limitation of our work is a certain degree of simplification given by following only the main isoforms of protein-coding genes and neglecting the influence of alternative splicing. In addition to the targeted simplification of the whole situation, our approach was also guided by the so far limited amount of knowledge about the biological function of other than the main isoforms. Therefore, expanding the data to include other gene isoforms and more accurately scaling the length of the longest introns could be the next direction for follow-up research. Also, the analysis of the conservation of the lengths of the longest introns between orthologous genes, which was done in this work only on a limited sample of genes, could, with the extension to whole-genome data, bring other new and interesting findings in the future.

Availability of data and materials

The datasets—code as well as all Supplementary Information—generated during and/or analysed during the current study are available in the Zenodo repository,  https://doi.org/10.5281/zenodo.12577986 .

William Roy S, Gilbert W. The evolution of spliceosomal introns: patterns, puzzles and progress. Nat Rev Genet. 2006;7:211–21.

Article   CAS   Google Scholar  

Hubé F, Francastel C. Mammalian introns: when the junk generates molecular diversity. IJMS. 2015;16:4429–52.

Article   PubMed   PubMed Central   Google Scholar  

Gehring NH, Roignant J-Y. Anything but ordinary – emerging splicing mechanisms in eukaryotic gene regulation. Trends Genet. 2021;37:355–72.

Article   CAS   PubMed   Google Scholar  

Irimia M, Roy SW. Origin of spliceosomal introns and alternative splicing. Cold Spring Harb Perspect Biol. 2014;6:a016071–a016071.

Girardini KN, Olthof AM, Kanadia RN. Introns: the “dark matter” of the eukaryotic genome. Front Genet. 2023;14:1150212.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Jeffares DC, Mourier T, Penny D. The biology of intron gain and loss. Trends Genet. 2006;22:16–22.

Koonin EV. The origin of introns and their role in eukaryogenesis: a compromise solution to the introns-early versus introns-late debate? Biol Direct. 2006;1:22.

Rogozin IB, Carmel L, Csuros M, Koonin EV. Origin and evolution of spliceosomal introns. Biol Direct. 2012;7:11.

Müller F, Escobar L, Xu F, Węgrzyn E, Nainytė M, Amatov T, et al. A prebiotically plausible scenario of an RNA–peptide world. Nature. 2022;605:279–84.

Robertson MP, Joyce GF. The origins of the RNA world. Cold Spring Harb Perspect Biol. 2012;4:a003608–a003608.

Kupfer DM, Drabenstot SD, Buchanan KL, Lai H, Zhu H, Dyer DW, et al. Introns and splicing elements of five diverse fungi. Eukaryot Cell. 2004;3:1088–100.

Francis WR, Wörheide G. Similar ratios of introns to intergenic sequence across animal genomes. Genome Biol Evol. 2017;9:1582–98.

Bradnam KR, Korf I. Longer first introns are a general property of eukaryotic gene structure. PLoS ONE. 2008;3:e3093.

Jo S-S, Choi SS. Analysis of the functional relevance of epigenetic chromatin marks in the first intron associated with specific gene expression patterns. Genome Biol Evol. 2019;11:786–97.

Park SG, Hannenhalli S, Choi SS. Conservation in first introns is positively associated with the number of exons within genes and the presence of regulatory epigenetic signals. BMC Genomics. 2014;15:526.

Rose AB. Introns as gene regulators: a brick on the accelerator. Front Genet. 2019;9:672.

Dvorak P, Hanicinec V, Soucek P. The position of the longest intron is related to biological functions in some human genes. Front Genet. 2023;13:1085139.

Cunningham F, Allen JE, Allen J, Alvarez-Jarreta J, Amode MR, Armean IM, et al. Ensembl 2022. Nucleic Acids Res. 2022;50:D988–95.

Morales J, Pujar S, Loveland JE, Astashyn A, Bennett R, Berry A, et al. A joint NCBI and EMBL-EBI transcript set for clinical genomics and research. Nature. 2022;604:310–5.

Raudvere U, Kolberg L, Kuzmin I, Arak T, Adler P, Peterson H, et al. g:Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update). Nucleic Acids Res. 2019;47:W191–8.

Ge SX, Jung D, Yao R. ShinyGO: a graphical gene-set enrichment tool for animals and plants. Bioinformatics. 2020;36:2628–9.

Reimand J, Isserlin R, Voisin V, Kucera M, Tannus-Lopes C, Rostamianfar A, et al. Pathway enrichment analysis and visualization of omics data using g:Profiler, GSEA. Cytoscape and EnrichmentMap. Nat Protoc. 2019;14:482–517.

Carbon S, Ireland A, Mungall CJ, Shu S, Marshall B, Lewis S, et al. AmiGO: online access to ontology and annotation data. Bioinformatics. 2009;25:288–9.

Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Gene ontology: tool for the unification of biology. Nat Genet. 2000;25:25–9.

Jia A, Xu L, Wang Y. Venn diagrams in bioinformatics. Brief Bioinform. 2021;22:bbab108.

Article   PubMed   Google Scholar  

Supek F, Bošnjak M, Škunca N, Šmuc T. REVIGO summarizes and visualizes long lists of gene ontology terms. PLoS ONE. 2011;6:e21800.

Kanehisa M, Furumichi M, Sato Y, Ishiguro-Watanabe M, Tanabe M. KEGG: integrating viruses and cellular organisms. Nucleic Acids Res. 2021;49:D545–51.

Luo W, Pant G, Bhavnasi YK, Blanchard SG, Brouwer C. Pathview Web: user friendly pathway visualization and data integration. Nucleic Acids Res. 2017;45:W501–8.

Shin S-H, Choi SS. Lengths of coding and noncoding regions of a gene correlate with gene essentiality and rates of evolution. Genes Genom. 2015;37:365–74.

Article   Google Scholar  

Majewski J, Ott J. Distribution and characterization of regulatory elements in the human genome. Genome Res. 2002;12:1827–36.

Swinburne IA, Miguez DG, Landgraf D, Silver PA. Intron length increases oscillatory periods of gene expression in animal cells. Genes Dev. 2008;22:2342–6.

Burnette JM, Miyamoto-Sato E, Schaub MA, Conklin J, Lopez AJ. Subdivision of large introns in Drosophila by recursive splicing at nonexonic elements. Genetics. 2005;170:661–74.

Shepard S, McCreary M, Fedorov A. The peculiarities of large intron splicing in animals. PLoS ONE. 2009;4:e7853.

Kelly S, Georgomanolis T, Zirkel A, Diermeier S, O’Reilly D, Murphy S, et al. Splicing of many human genes involves sites embedded within introns. Nucleic Acids Res. 2015;43:4721–32.

Vinogradov AE. Compactness of human housekeeping genes: selection for economy or genomic design? Trends Genet. 2004;20:248–53.

Jeffares DC, Penkett CJ, Bähler J. Rapidly regulated genes are intron poor. Trends Genet. 2008;24:375–8.

Heyn P, Kalinka AT, Tomancak P, Neugebauer KM. Introns and gene expression: cellular constraints, transcriptional regulation, and evolutionary consequences. Bioessays. 2015;37:148–54.

Schonfeld E, Vendrow E, Vendrow J, Schonfeld E. On the relation of gene essentiality to intron structure: a computational and deep learning approach. Life Sci Alliance. 2021;4.

Gilbert W. The RNA world. Nature. 1986;319:618.

Fedorov A, Fedorova L. Introns: mighty elements from the RNA world. J Mol Evol. 2004;59:718–21.

Penny D, Hoeppner MP, Poole AM, Jeffares DC. An overview of the introns-first theory. J Mol Evol. 2009;69:527–40.

Roy SW, Fedorov A, Gilbert W. The signal of ancient introns is obscured by intron density and homolog number. Proc Natl Acad Sci USA. 2002;99:15513–7.

Kandul NP, Noor MA. Large introns in relation to alternative splicing and gene evolution: a case study of Drosophila bruno-3. BMC Genet. 2009;10:67.

Download references

Acknowledgements

We would also like to thank other colleagues from the Biomedical Center, Faculty of Medicine in Pilsen, Charles University, for creating a stimulating environment.

This work was supported by the Czech Medical Council, project no. NU21-07–00247 (to V.Hl.); the Czech Science Foundation, project no. 21-27902S (to P.S.); and Grant Agency of Charles University in Prague, program Cooperatio “Surgical Disciplines” no. 207043 (to P.S. and P.D.).

Author information

Authors and affiliations.

Department of Biology, Faculty of Medicine in Pilsen, Charles University, Alej Svobody 76, 32300, Pilsen, Czech Republic

Pavel Dvorak

Biomedical Center, Faculty of Medicine in Pilsen, Charles University, Alej Svobody 76, 32300, Pilsen, Czech Republic

Pavel Dvorak, Viktor Hlavac, Vojtech Hanicinec, Bhavana Hemantha Rao & Pavel Soucek

Institute of Medical Genetics, University Hospital Pilsen, Dr. Edvarda Benese 13, 30599, Pilsen, Czech Republic

Toxicogenomics Unit, National Institute of Public Health, Srobarova 48, 10042, Prague, Czech Republic

Viktor Hlavac & Pavel Soucek

You can also search for this author in PubMed   Google Scholar

Contributions

P. D.: Conceptualization, Methodology, Validation, Formal analysis, Writing - Original Draft, Visualization, Funding acquisition; V. Hl.: Methodology, Programming, Data Analysis, Writing - Original Draft; V. Ha.: Software, Formal analysis, Data Curation, Writing - Original Draft; B. H. R.: Software, Formal analysis, Data Curation, Writing - Original Draft; P. S.: Writing - Review & Editing, Supervision, Funding acquisition;

Corresponding author

Correspondence to Pavel Dvorak .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

12864_2024_10558_moesm1_esm.zip.

Supplementary Material 1. Figure S1: Comparison of the absolute and relative positions of the longest introns among orthologous genes of 6 vertebrates. Twenty randomly selected human genes and their corresponding orthologs were monitored in each of the groups with the longest human intron in the 1 st tertile (A), 2 nd tertile (B) and 3 rd tertile (C).

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Dvorak, P., Hlavac, V., Hanicinec, V. et al. Genes divided according to the relative position of the longest intron show increased representation in different KEGG pathways. BMC Genomics 25 , 649 (2024). https://doi.org/10.1186/s12864-024-10558-x

Download citation

Received : 25 October 2023

Accepted : 24 June 2024

Published : 28 June 2024

DOI : https://doi.org/10.1186/s12864-024-10558-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Gene structure
  • Longest intron
  • Gene function
  • Ribosome biogenesis
  • Spliceosome

BMC Genomics

ISSN: 1471-2164

meaning of representation data

IMAGES

  1. PPT

    meaning of representation data

  2. 4) S1 Representation of Data

    meaning of representation data

  3. Graphical Representation

    meaning of representation data

  4. PPT

    meaning of representation data

  5. SOLUTION: Data representation in statistics

    meaning of representation data

  6. Types Of Graph Representation In Data Structure

    meaning of representation data

VIDEO

  1. 84. Introduction to Data Analytics and Data Representation

  2. Visual Representation of Literal Meaning by Viraj 😂🫶🏻

  3. Graph Introduction and Representation

  4. Representation of Graph

  5. Lecture 34: Representation of Data and Inferences-I

  6. Lecture 35: Representation of Data and Inferences-II

COMMENTS

  1. Data representations

    Data representations are useful for interpreting data and identifying trends and relationships. When working with data representations, pay close attention to both the data values and the key words in the question. When matching data to a representation, check that the values are graphed accurately for all categories.

  2. Data Representation: Definition, Types, Examples

    Data Representation in Maths. Definition: After collecting the data, the investigator has to condense them in tabular form to study their salient features.Such an arrangement is known as the presentation of data. Any information gathered may be organised in a frequency distribution table, and then shown using pictographs or bar graphs.

  3. 2.1: Types of Data Representation

    Displaying Data. It is often easier for people to interpret relative sizes of data when that data is displayed graphically. Note that a categorical variable is a variable that can take on one of a limited number of values and a quantitative variable is a variable that takes on numerical values that represent a measurable quantity.Examples of categorical variables are tv stations, the state ...

  4. What are the different ways of Data Representation?

    Data Representation. The word data refers to constituting people, things, events, ideas. It can be a title, an integer, or anycast. After collecting data the investigator has to condense them in tabular form to study their salient features. Such an arrangement is known as the presentation of data.

  5. Graphical Representation of Data

    Graphical representation is a form of visually displaying data through various methods like graphs, diagrams, charts, and plots. It helps in sorting, visualizing, and presenting data in a clear manner through different types of graphs. Statistics mainly use graphical representation to show data.

  6. What Is Data Visualization: Definition, Types, Tips, and Examples

    Data Visualization is a graphic representation of data that aims to communicate numerous heavy data in an efficient way that is easier to grasp and understand. In a way, data visualization is the mapping between the original data and graphic elements that determine how the attributes of these elements vary. The visualization is usually made by ...

  7. 2: Graphical Representations of Data

    A histogram is a graphic version of a frequency distribution. The graph consists of bars of equal width drawn adjacent to each other. The horizontal scale represents classes of quantitative data values and the vertical scale represents frequencies. The heights of the bars correspond to frequency values. Histograms are typically used for large ...

  8. Representing Data

    Histogram. A histogram is a graphical representation used to display quantitative continuous data (numeric data). The graphical display uses bars that are different heights and each bar groups numbers into ranges. The horizontal axis represents the numerical range, and the vertical axis represents the frequency, which is the number of times the data falls in the particular numerical range.

  9. Data Representation: How to Represent Data Effectively

    Body: Data Presentation. Data representation refers to how data is presented, encoded, and structured for storage and processing. Effective data representation is crucial in various fields ...

  10. 2.1: Introduction

    Then patterns can more easily be discerned. Figure 2.1.1 2.1. 1: When you have large amounts of data, you will need to organize it in a way that makes sense. These ballots from an election are rolled together with similar ballots to keep them organized. (credit: William Greeson) In this chapter, you will study graphical ways to describe and ...

  11. What is Data Representation?

    Learn more about Data Representation. Take a deep dive into Data Representation with our course AI for Designers . In an era where technology is rapidly reshaping the way we interact with the world, understanding the intricacies of AI is not just a skill, but a necessity for designers. The AI for Designers course delves into the heart of this ...

  12. Data representations

    Data representations problems ask us to interpret data representations or create data representations based on given information. Aside from tables, the two most common data representation types on the SAT are bar graphs and line graphs. In this lesson, we'll learn to: You can learn anything. Let's do this!

  13. Graphical Representation

    Graphical Representation is a way of analysing numerical data. It exhibits the relation between data, ideas, information and concepts in a diagram. It is easy to understand and it is one of the most important learning strategies. It always depends on the type of information in a particular domain. There are different types of graphical ...

  14. Data Visualization: Definition, Benefits, and Examples

    Data visualization is the representation of information and data using charts, graphs, maps, and other visual tools. These visualizations allow us to easily understand any patterns, trends, or outliers in a data set. Data visualization also presents data to the general public or specific audiences without technical knowledge in an accessible ...

  15. Graphic Representation of Data: Meaning, Principles and Methods

    Meaning of Graphic Representation of Data: Graphic representation is another way of analysing numerical data. A graph is a sort of chart through which statistical data are represented in the form of lines or curves drawn across the coordinated points plotted on its surface.

  16. What is data visualisation? A definition, examples and resources

    Data visualisation is the graphical representation of information and data. By using visual elements like charts, graphs and maps, data visualisation tools provide an accessible way to see and understand trends, outliers and patterns in data. In the world of big data, data visualisation tools and technologies are essential for analysing massive ...

  17. What Is Data Visualization? Definition & Examples

    Data visualization is the graphical representation of information and data. By using v isual elements like charts, graphs, and maps, data visualization tools provide an accessible way to see and understand trends, outliers, and patterns in data. Additionally, it provides an excellent way for employees or business owners to present data to non ...

  18. PDF Data Representation

    Data Representation • Data refers to the symbols that represent people, events, things, and ideas. Data can be a name, a number, the colors in a photograph, or the notes in a musical composition. • Data Representation refers to the form in which data is stored, processed, and transmitted. • Devices such as smartphones, iPods, and

  19. Data Organization and Representation

    Data Organization and Representation. Explore different ways of representing, analyzing, and interpreting data, including line plots, frequency tables, cumulative and relative frequency tables, and bar graphs. Learn how to use intervals to describe variation in data. Learn how to determine and understand the median. View Transcript.

  20. Graphical Representation of Data

    Graphical Representation of Data: Graphical Representation of Data," where numbers and facts become lively pictures and colorful diagrams.Instead of staring at boring lists of numbers, we use fun charts, cool graphs, and interesting visuals to understand information better. In this exciting concept of data visualization, we'll learn about different kinds of graphs, charts, and pictures ...

  21. Data Representation

    Definition of Data Representation: Data representation is the way in which information or data is encoded and stored in a format that computers can understand and manipulate. Since computers work with binary systems (1s and 0s), data from the real world needs to be translated into these binary values for processing. Data representation ...

  22. Data representations

    And so halfway between one and three would be two. So in this case, the median would be two. Now if you had an odd number of numbers, let's say you had one, one, one, three, and four, then you have a very clear middle number here. You order the numbers like this, and then the middle number is this one over here, so that would be the median.

  23. Data Representation

    By data representation is meant, in general, any convention for the arrangement of things in the physical world in such a way as to enable information to be encoded and later decoded by suitable automatic systems. We specify conventions because information can be conveyed by other means as well.

  24. Applied Sciences

    Cancer research has increasingly utilized multi-omics analysis in recent decades to obtain biomolecular information from multiple layers, thereby gaining a better understanding of complex biological systems. However, the curse of dimensionality is one of the most significant challenges when handling omics or biological data. Additionally, integrating multi-omics by transforming different omics ...

  25. Genes divided according to the relative position of the longest intron

    Despite the fact that introns mean an energy and time burden for eukaryotic cells, they play an irreplaceable role in the diversification and regulation of protein production. As a common feature of eukaryotic genomes, it has been reported that in protein-coding genes, the longest intron is usually one of the first introns. The goal of our work was to find a possible difference in the ...