A density plot visualises the distribution of data over a continuous interval (or time period). Density Plots are not affected by the number of bins (each bar used in a typical histogram) used, thus, they are better at visualizing the shape of the distribution than a histogram unless the bins in the histogram have a theoretical meaning.
- Notice that the variable on the x-axis should be continuous. Density plots are not designed for use with discrete variables.
# You may need to install seaborn on the command line using 'pip install seaborn' or 'conda install seaborn' import seaborn as sns # Set a theme for seaborn sns.set_theme(style="darkgrid") # Load the example diamonds dataset diamonds = sns.load_dataset("diamonds") # Take a look at the data print(diamonds.head())
sns.kdeplot(data=diamonds, x="price", cut=0);
This is basic, but there are lots of ways to adjust it through keyword arguments (you can see these by running
help(sns.kdeplot)) or via calling functions on the matplotlib
ax object that running
sns.kdeplot returns when not followed by
;. In this simple example, the
cut keyword argument forces the density estimate to end at the end-points of the data–which makes sense for a variable like price, which has a hard cut-off at 0.
Let’s use further keyword arguments to enrich the plot, including different colours (‘hues’) for each cut of diamond. One keyword argument that may not be obvious is
hue_order. The default function call would have arranged the
cut types so that the ‘Fair’ cut obscured the other types, so the argument passed to the
hue_order keyword below reverses the order of the unique list of diamond cuts via
sns.kdeplot(data=diamonds, x="price", hue="cut", hue_order=diamonds['cut'].unique()[::-1], fill=True, alpha=.4, linewidth=0.5, cut=0.);
For this R demonstration, we are going to use ggplot2 package to create a density plot. Additionally, we will use the dataset
diamonds that is made available by the ggplot2 package.
To begin with this R demonstration, make sure that we install and load all the useful packages that we need it.
# load necessary packages library(ggplot2) library(viridis) library(RColorBrewer) library(tidyverse) library(ggthemes) library(ggpubr) library(datasets)
Next, in order to make a density plot, we are going to use the
geom_density() functions. We will specify
price as our x-axis.
ggplot(diamonds, aes(x = price)) + geom_density()
We can always change the color of the density plot using the
col argument and fill the color inside the density plot using
fill argument. Furthermore, we can specify the degree of transparency density fill area using the argument
alpha ranges from 0 to 1.
ggplot(diamonds, aes(x = price))+ geom_density(fill = "lightblue", col = 'black', alpha = 0.6)
We can also change the type of line of the density plot as well by adding
ggplot(diamonds, aes(x = price)) + geom_density(fill = "lightblue", col = 'black', linetype = "dashed")
Furthermore, you can also combine both histogram and density plots together.
ggplot(diamonds, aes(x = price)) + geom_histogram(aes(y = ..density..), colour = "black", fill = "grey45") + geom_density(col = "red", size = 1,linetype = "dashed")
What happen if we want to make multiple densities?
For example, we want to make multiple densities plots for price based on the type of cut, all we need to do is adding
ggplot(data=diamonds, aes(x = price, fill = cut)) + geom_density(adjust = 1.5, alpha = .3)
For this demonstration, we will use the plottig scheme, a community-contributed color scheme for plots that greatly improves over Stata’s default plot color schemes. For more on using schemes in Stata, see here.
clear all set more off ssc install blindschemes // Install the blindschemes set of color schemes, which includes plottig graph query, schemes // Show the available schemes you have installed, to confirm plottig was installed *Pull in Stata's NHANES dataset use http://www.stata-press.com/data/r16/nhanes2.dta, clear *Plot the kernel density kdensity height, scheme(plottig)