Tuesday, November 29, 2022
HomeData Science7 Visualizations with Python to Specific Adjustments in Rank over Time |...

7 Visualizations with Python to Specific Adjustments in Rank over Time | by Boriharn Ok | Nov, 2022


Utilizing Python to visualise the adjustments in rank over time

Picture by Austris Augusts on Unsplash

Rating information is ordering information place in a numerically ordered sequence. That is a straightforward method to talk the knowledge because it helps the reader effortlessly perceive the sequence. The rating is a good suggestion for dealing with a number of observations or categorical information.

Nonetheless, issues change on a regular basis. As time go, the place in rating might be consistently altered. Visualizing positions of the ranks throughout a interval helps notify the change and progress.

This text will information you with some concepts to visualise the adjustments in rank over time.

Examples of knowledge visualization with Python on this article for presenting the adjustments in rank over time. Photos by the creator

Let’s get began

Get information

To point out that the tactic talked about right here might be utilized to real-world datasets, I’ll use the ‘Air Air pollution in Seoul’ dataset from Kaggle (hyperlink). The info was offered by the Seoul Metropolitan Authorities (hyperlink). The info is used beneath the phrases of the Inventive Commons License CC-BY.

The dataset consists of the air air pollution information: SO2, NO2, CO, O3, PM10, and PM2.5 recorded between 2017 and 2019 from 25 districts in Seoul, South Korea.

On this article, we are going to work with Carbon monoxide (CO), a standard air pollutant that’s dangerous to people. The measurement unit is part-per-million (ppm).

Import Information

After downloading the dataset, begin with import libraries.

import numpy as np
import pandas as pd
import math
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

Use Pandas to learn ‘Measurement_summary.csv’

df = pd.read_csv('<file location>/Measurement_summary.csv')
df.head()

Discover information

Exploring the dataset as step one is at all times a good suggestion. Luckily, the outcome under reveals that we would not have to cope with lacking values.

df.data()

Let’s take a look at the entire variety of the variable ‘Station code.’

df['Station code'].nunique()

## output
## 25

There are 25 districts in whole.

set(df['Station code'])

## output
## {101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114,
## 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125}

Choose and put together information

For instance, I’ll choose Station codes 111-118. If you wish to plot different station numbers, be at liberty to switch the code under.

list_stations = [111, 112, 113, 114, 115, 116, 117, 118]
df_select = df[df['Station code'].isin(list_stations)]
df_select.head()

The retrieved dataset just isn’t able to be plotted. Some columns are wanted to be created or modified earlier than use.

## crete year_month, 12 months and month columns
year_month = [i[0:7] for i in record(df_select['Measurement date'])]
df_select['year_month'] = year_month
df_select['year'] = [i[0:4] for i in year_month]
df_select['month'] = [i[-2:] for i in year_month]

## create district identify column
district = [i.split(', ')[2] for i in df_select['Address']]
df_select['District'] = district

## change Station code column sort
df_select = df_select.astype({'Station code': str})

## groupby with location and level of time
df_month = df_select.groupby(['Station code','District',
'year_month','year','month']).imply()
df_month.reset_index(inplace=True)
df_month.head()

Right here comes an essential step. The principle concept of this text is to create visualizations for rating information. Subsequent, we are going to create a column for rating the districts’ CO quantity (ppm) throughout every time level.

maintain = []
for i in record(set(df_month['year_month'])):
df = df_month[df_month['year_month']==i]
order = df['CO'].rank(ascending=0)
df['rank'] = [int(i) for i in order]
maintain.append(df)

df_month = pd.concat(maintain)
df_month.sort_values(['year_month', 'Station code'], ascending=True,
inplace=True, ignore_index=True)
df_month.head()

Earlier than persevering with, we are going to outline a dictionary of colours to facilitate the plotting course of.

#extract coloration palette, the palette might be modified
list_dist = record(set(df_select['District']))
pal = record(sns.color_palette(palette='Spectral',
n_colors=len(list_dist)).as_hex())
dict_color = dict(zip(list_dist, pal))

Information visualization

This text intends to information with some visualization concepts for rating information over time. Thus, the obtained outcome ought to be simple to know whereas permitting the reader to check the information ranks between totally different cut-off dates.

One thing is required to be clarified earlier than persevering with. Every graph has its execs and cons. After all, nothing is ideal. Some concepts introduced right here could also be only for an attention grabbing impact. However all of them have the identical function of displaying the adjustments in information ranks over time.

The charts on this article might be categorized into two teams: animations and charts.

Animation

Apart from being a good suggestion to catch consideration, animation can simply present the adjustments in rank over time.

1. Evaluating bar top with an Animated bar chart

Plotly is a helpful graphing library for making interactive and animated graphs. The idea of making use of an animated bar chart is to repair every district’s place. Every bar will likely be annotated with the rating quantity. By doing this, the quantity of CO might be in contrast over time.

import plotly.categorical as px
fig = px.bar(df_month, x='District', y='CO',
coloration='District', textual content='rank',
color_discrete_map= dict_color,
animation_frame='year_month',
animation_group='Station code',
range_y=[0,1.2],
labels={ 'CO': 'CO (ppm)'},
)
fig.update_layout(width=1000, top=600, showlegend=False,
xaxis = dict(tickmode = 'linear', dtick = 1))
fig.update_traces(textfont_size=16, textangle=0)
fig.present()

Voila!!

Animated bar chart reveals districts’ month-to-month rank and CO(ppm) quantity. Photos by creator.

The hooked up outcome above might look quick since that is simply an instance of the end result. Don’t be concerned; there’s a pause button to pause and a button to pick out a particular time level.

2. Racing with an Animated scatter plot

Now let’s change the viewpoint by shifting every district in response to its rank at totally different cut-off dates. The sizes of the scatter dots can be utilized to indicate the CO quantity.

To facilitate plotting with Plotly, we have to add two extra columns to the DataFrame, place on the X-axis, and textual content for annotation.

ym = record(set(year_month))
ym.kind()

df_month['posi'] = [ym.index(i) for i in df_month['year_month']]
df_month['CO_str'] = [str(round(i,2)) for i in df_month['CO']]
df_month['CO_text'] = [str(round(i,2))+' ppm' for i in df_month['CO']]
df_month.head()

Subsequent, plot an animated scatter plot.

import plotly.categorical as px
fig = px.scatter(df_month, x='posi', y='rank',
dimension= 'CO',
coloration='District', textual content='CO_text',
color_discrete_map= dict_color,
animation_frame='year_month',
animation_group='District',
range_x=[-2,len(ym)],
range_y=[0.5,6.5]
)
fig.update_xaxes(title='', seen=False)
fig.update_yaxes(autorange='reversed', title='Rank',
seen=True, showticklabels=True)
fig.update_layout(xaxis=dict(showgrid=False),
yaxis=dict(showgrid=True))
fig.update_traces(textposition='center left')
fig.present()

Ta-da…

Animated scatter chart reveals districts’ month-to-month rank and CO(ppm) quantity. Photos by creator.

Charts

Animated charts are usually restricted by with the ability to categorical one time limit. To point out a number of time factors, some charts and strategies might be utilized to exhibit many time factors without delay.

3. Drawing strains with a Bump chart

Mainly, a bump chart applies a number of strains to indicate the adjustments in rating over time. Plotting a bump chart with Plotly permits customers to filter the outcome and supply extra info when hovering the cursor over every information level, as proven within the outcome under.

import plotly.categorical as px
fig = px.line(df_month, x = 'year_month', y = 'rank',
coloration = 'District',
color_discrete_map= dict_color,
markers=True,
hover_name = 'CO_text')
fig.update_traces(marker=dict(dimension=11))
fig.update_yaxes(autorange='reversed', title='Rank',
seen=True, showticklabels=True)
fig.update_xaxes(title='', seen=True, showticklabels=True)
fig.update_layout(xaxis=dict(showgrid=False),
yaxis=dict(showgrid=False) )
fig.present()
Bump chart reveals districts’ rank and quantity of CO(ppm) month-to-month. The outcome might be filtered and supply extra info, as proven. Photos by creator.

4. Creating a photograph collage of bar charts

A easy bar chart can categorical rating at a time level. With many time factors, we are able to create many bar charts after which mix them into a photograph collage. Begin with utilizing the Seaborn library to create a bar chart.

df_select = df_month[df_month['year_month']=='2017-01']
fig, ax = plt.subplots(figsize=(15, 6))

sns.set_style('darkgrid')
sns.barplot(information = df_select,
x = 'District', y ='CO',
order=df_select.sort_values('CO', ascending=False)['District'],
palette=dict_color)
ax.bar_label(ax.containers[0],
labels=df_select.sort_values('CO', ascending=False)['CO_str'],
label_type='edge', dimension=11)
plt.ylabel('CO (ppm)')
plt.title('2017-01')
plt.present()

Bar chart reveals districts’ quantity of CO(ppm) by rating. Photos by creator.

Use the for-loop operate to create the bar charts at totally different time factors. Please bear in mind that the code under will export the charts to your laptop for importing later.

keep_save = []
for t in ym:
df_ = df_month[df_month['year_month']==t]
fig, ax = plt.subplots(figsize=(8.5, 5))
sns.set_style('darkgrid')
sns.barplot(information = df_,
x = 'District', y ='CO',
order = df_.sort_values('CO', ascending=False)['District'],
palette=dict_color)
ax.bar_label(ax.containers[0],
labels=df_.sort_values('CO', ascending=False)['CO_str'],
label_type='edge', dimension=11)
plt.ylim([0, 1.2])
plt.ylabel('CO (ppm)')
plt.title(t)
plt.tight_layout()
s_name = t + '_bar.png'
keep_save.append(s_name)
plt.savefig(s_name)
plt.present()

Create a operate to mix the charts. I discovered a wonderful code to mix many plots from this hyperlink on Stack Overflow.

Apply the operate.

## get_collage(n_col, n_row, width, top, save_name, 'output.png')
# width = n_col * determine width
# top = n_row * determine top

get_collage(12, 3, 12*850, 3*500, keep_save, 'order_bar.png')

Ta-da…

Part of a photograph collage combining bar charts reveals districts’ rank and quantity of CO(ppm) month-to-month. Photos by creator.

The outcome reveals every district’s month-to-month CO quantity whereas presenting the rating order over time. Thus, we are able to evaluate the district ranks and the quantity of air pollution of many time factors on the identical time.

5. Fancy the bar charts with a Round bar chart

With the identical idea because the earlier concept, we are able to flip regular bar charts into round bar charts(aka race monitor plots) and mix them into a photograph collage.

As beforehand talked about that every thing has its execs and cons. Every bar on the round chart could also be onerous to check because of the unequal size ratio of every bar. Nonetheless, this may be thought-about a great possibility for creating an attention grabbing impact.

Begin with an instance of making a round bar chart.

Round bar chart reveals districts’ quantity of CO(ppm) by rating. Photos by creator.

Making use of the for-loop operate to get different round bar charts. The outcomes will likely be exported to your laptop for import later.

Use the operate to acquire a photograph collage.

get_collage(12, 3, 12*860, 3*810, keep_cir, 'order_cir.png')
Part of a photograph collage combining round bar charts reveals districts’ rank and quantity of CO(ppm) month-to-month. Photos by creator.

6. One other method to fancy the bar charts with a Radial bar chart

Altering the route of the bar charts to start out from the middle with radial bar charts. That is one other concept for catching consideration. Nonetheless, it may be seen that the bars not situated close to one another are onerous to check.

Begin with an instance of a radial bar chart.

Radial bar chart reveals districts’ quantity of CO(ppm) by rating. Photos by creator.

Making use of the for-loop operate to create different radial bar charts. The outcomes can even be exported to your laptop for import later.

Apply the operate to acquire a photograph collage.

get_collage(12, 3, 12*800, 3*800, keep_rad, 'order_rad.png')
Part of a photograph collage combining radial bar charts reveals districts’ rank and quantity of CO(ppm) month-to-month. Photos by creator.

7. Utilizing coloration with Warmth Map

Usually, the warmth map is a standard chart for presenting information right into a two-dimensional chart and displaying values with colours. With our dataset, the colour might be utilized to indicate the rank numbers.

Begin with making a pivot desk with pd.pivot().

df_pivot = pd.pivot(information=df_month, index='District',
columns='year_month', values='rank')
df_pivot

After getting the pivot desk, we are able to simply create a warmth map with only a few strains of code.

plt.determine(figsize=(20,9.5))
sns.heatmap(df_pivot, cmap='viridis_r', annot=True, cbar=False)
plt.present()
Utilized heatmap to indicate the adjustments in districts’ rank over time. Photos by creator.

With the colour and annotation, we are able to spot the district with the very best (yellow coloration) and lowest(darkish blue coloration) variety of CO. The change in rating might be seen over time.

Abstract

This text has introduced seven visualization concepts with Python code to specific the adjustments in information ranks over time. As beforehand talked about, every thing has its execs and limits. An essential factor is discovering the suitable chart that fits the information.

I am certain there are extra graphs for rating information over time than talked about right here. This text solely guides with some concepts. If in case you have any solutions or suggestions, please be at liberty to go away a remark. I’d be comfortable to see it.

Thanks for studying

These are my information visualization articles that you could be discover attention-grabbing:

  • 8 Visualizations with Python to Deal with A number of Time-Sequence Information (hyperlink)
  • 6 Visualization with Python Tips to Deal with Extremely-Lengthy Time-Sequence Information (hyperlink)
  • 9 Visualizations with Python to indicate Proportions as a substitute of a Pie chart (hyperlink)
  • 9 Visualizations with Python that Catch Extra Consideration than a Bar Chart (hyperlink)
RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments