Class EmpiricalDistribution
public final class EmpiricalDistribution
extends java.lang.Object
EmpiricalDistribution
class offers a means to calculate the empirical cumulative distribution (CDF) and probability density (PDF) functions, including percentiles.
After the distribution is created, a user will typically call the analyse()
method to estimate the various statistical quantities.
The distribution can only contain Integer.MAX_VALUE
samples.
Note that a valid I18NL10N
database must be available!
Note that this class cannot be subclassed!
- Version:
- 26/06/2018
- Author:
- Sven Maerivoet
-
Constructor Summary
Constructors Constructor Description EmpiricalDistribution()
Constructs an emptyEmpiricalDistribution
object.EmpiricalDistribution(double[] x)
Constructs anEmpiricalDistribution
object for a given array of values.EmpiricalDistribution(double[] x, double[] histogramBinRightEdges)
Constructs anEmpiricalDistribution
object for a given array of values and user-specified histogram bin right edges.EmpiricalDistribution(double[] x, int nrOfHistogramBins)
Constructs anEmpiricalDistribution
object for a given array of values and a user-specified number of histogram bins. -
Method Summary
Modifier and Type Method Description void
analyse()
Estimates the empirical distribution and analyses it various statistical quantities.double
calculateKDEPDFBandwidth(MathTools.EKernelType kernelType)
Calculates the bandwidth for kernel density estimation (KDE) based on Silverman's Rule-of-Thumb.void
clear()
Clears the empirical distribution.void
estimateKDEPDF(MathTools.EKernelType kernelType, double bandwidth, int nrOfSupportPoints, double minSupport, double maxSupport)
Estimates the probability distribution function (PDF) using a specified kernel function.double
getCDF(double x)
Returns the value of the cumulative distribution function (CDF) evaluated atx
.static double
getChiSquare(double alpha, int degreesOfFreedom)
Returns the chi-square value corresponding to a specified alpha level and number of degrees of freedom.double[]
getData()
Retrieves the raw data for this empirical distribution.double
getExpectedValue()
Returns the expected value for the first moment (population mean), which in this case is approximated by the sample mean.FunctionLookupTable
getFullKDEPDF()
Returns the previously complete calculated kernel density estimation (KDE) of the probability distribution function (PDF).double
getHistogramBinCentre(int histogramBin)
Returns the centre of a specified histogram bin.double[]
getHistogramBinCentres()
Returns the centres of all the histogram bins.double
getHistogramBinCount(int histogramBin)
Returns the count associated with a specified histogram bin.double[]
getHistogramBinCounts()
Returns the counts for all the histogram bins.double[]
getHistogramBinFrequencies()
Returns the frequencies for all the histogram bins.double
getHistogramBinFrequency(int histogramBin)
Returns the frequency associated with a specified histogram bin.double
getHistogramBinWidth()
Returns the width of a histogram bin.double
getInterquartileRange()
Returns the interquartile range (IQR) (i.e., the difference between the 75th and the 25th percentiles).static java.lang.String
getInterquartileRangeDescription()
Returns a descriptive label of the interquartile range (IQR).double
getJarqueBeraTestStatistic()
Calculates the Jarque-Bera test statistic.double
getKDEPDF(double x)
Returns the value of the probability density function (PDF) evaluated atx
(based on kernel density estimation, KDE).FunctionLookupTable
getKDEPDFModes()
Returns all modes (i.e., local maxima) for the calculated kernel density estimation (KDE) of the probability density function (PDF).double
getKDEXMaximum()
Returns the maximum of the values for a kernel density estimation (KDE) of the probability distribution function (PDF).double
getKDEXMinimum()
Returns the minimum of the values for a kernel density estimation (KDE) of the probability distribution function (PDF).double
getKDEXRange()
Returns the range of the values for a kernel density estimation (KDE) of the probability distribution function (PDF).double
getKurtosis()
Returns the sample kurtosis (using an unbiased estimator).static java.lang.String
getKurtosisDescription()
Returns a descriptive label of the kurtosis.java.lang.String
getKurtosisInterpretation()
Returns a qualitative description of the kurtosis based on its test statistic.double
getKurtosisZStatistic()
Returns a two-tailed test statistic Z of kurtosis (different from zero) with a 5% significance level.double
getMean()
This is the sample mean, which in this case is an alias for the expected value.static java.lang.String
getMeanDescription()
Returns a descriptive label of the mean (expected value).double
getMedian()
Returns the median (i.e., the 50th percentile).static java.lang.String
getMedianDescription()
Returns a descriptive label of the median.int
getN()
Returns the sample size.int
getNrOfHistogramBins()
Returns the number of histogram bins used for estimating the probability density function (PDF).boolean[]
getOutliers()
Returns the outliers which are defined as having z-scores greater than 3.double
getPDF(double x)
Returns the value of the probability density function (PDF) evaluated atx
(based on a histogram).double
getPercentile(double percentile)
Returns the given percentile.double
getPercentile(int percentile)
Returns the given percentile.static java.lang.String
getPercentileDescription()
Returns a descriptive label of a percentile.double[]
getPercentiles()
Returns all the percentiles for the range [0,100].double
getSkewness()
Returns the sample skewness (using an unbiased estimator).double
getSkewnessConfidenceBounds()
Returns the symmetrical skewness' confidence bounds for a 95% confidence interval, defined as twice the standard error of skewness (SES).static java.lang.String
getSkewnessDescription()
Returns a descriptive label of the skewness.java.lang.String
getSkewnessInterpretation()
Returns a qualitative description of the skewness based on its test statistic.double
getSkewnessZStatistic()
Returns a two-tailed test statistic Z of skewness (different from zero) with a 5% significance level.double[]
getSortedData()
Retrieves the raw data for this empirical distribution.double
getStandardDeviation()
Returns the standard deviation (i.e., the positive square root of the variance).static java.lang.String
getStandardDeviationDescription()
Returns a descriptive label of the standard deviation.double
getTrimmedMean(double percentageToTrim)
This is the trimmed (or truncated) mean, which corresponds to the mean calculated after symmetrically discarding a certain percentage of data points at the high and low end (without interpolation).double
getVariance()
Returns the sample variance (using an unbiased estimator of the population variance).static java.lang.String
getVarianceDescription()
Returns a descriptive label of the variance.double
getXMaximum()
Returns the maximum of the input values.double
getXMinimum()
Returns the minimum of the input values.double
getXRange()
Returns the range of the input values.double[]
getZScores()
Returns the calculated z-scores, defined as:boolean
isJarqueBeraTestAccepted(double alpha)
Compares the Jarque-Bera test statistic with the chi-square distribution with 2 degrees of freedom for a given alpha level.void
recalculatePDF()
Recalculates the probability density function (PDF).void
recalculatePDF(int nrOfHistogramBins)
Recalculates the probability density function (PDF) using a user-specified number of histogram bins.void
setData(double[] x)
Sets the source data for the empirical distribution.void
setData(double[] x, int nrOfHistogramBins)
Sets the source data for the empirical distribution, as well as a user-specified number of histogram bins.Methods inherited from class java.lang.Object
clone, equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Constructor Details
-
EmpiricalDistribution
public EmpiricalDistribution()Constructs an emptyEmpiricalDistribution
object. -
EmpiricalDistribution
public EmpiricalDistribution(double[] x)Constructs anEmpiricalDistribution
object for a given array of values.The Freedman-Diaconis rule is applied for finding the optimal histogram bin width, and consequently the optimal number of histogram bins:
bin width = 2 * IQR / n^1/3
- Parameters:
x
- the array of values to estimate the empirical distribution for
-
EmpiricalDistribution
public EmpiricalDistribution(double[] x, int nrOfHistogramBins)Constructs anEmpiricalDistribution
object for a given array of values and a user-specified number of histogram bins.- Parameters:
x
- the array of values to estimate the empirical distribution fornrOfHistogramBins
- the user-specified number of histogram bins
-
EmpiricalDistribution
public EmpiricalDistribution(double[] x, double[] histogramBinRightEdges)Constructs anEmpiricalDistribution
object for a given array of values and user-specified histogram bin right edges.- Parameters:
x
- the array of values to estimate the empirical distribution forhistogramBinRightEdges
- the array of values containing the histogram bin right edges
-
-
Method Details
-
getData
public double[] getData()Retrieves the raw data for this empirical distribution.- Returns:
- the raw data for this empirical distribution
- See Also:
getSortedData()
-
getSortedData
public double[] getSortedData()Retrieves the raw data for this empirical distribution.- Returns:
- the raw data for this empirical distribution
- See Also:
getData()
-
setData
public void setData(double[] x)Sets the source data for the empirical distribution.The Freedman-Diaconis rule is applied for finding the optimal histogram bin width, and consequently the optimal number of histogram bins:
bin width = 2 * IQR / n^1/3
- Parameters:
x
- the array of values to estimate the empirical distribution for
-
setData
public void setData(double[] x, int nrOfHistogramBins)Sets the source data for the empirical distribution, as well as a user-specified number of histogram bins.- Parameters:
x
- the array of values to estimate the empirical distribution fornrOfHistogramBins
- the user-specified number of histogram bins
-
clear
public void clear()Clears the empirical distribution. -
analyse
public void analyse()Estimates the empirical distribution and analyses it various statistical quantities. -
getCDF
public double getCDF(double x)Returns the value of the cumulative distribution function (CDF) evaluated atx
.- Parameters:
x
- the value to evaluate the cumulative distribution function at- Returns:
- the value of the cumulative distribution function evaluated at
x
-
getPercentile
public double getPercentile(int percentile)Returns the given percentile.- Parameters:
percentile
- the requested percentile (in the interval [0,100])- Returns:
- the requested percentile value
-
getPercentile
public double getPercentile(double percentile)Returns the given percentile.- Parameters:
percentile
- the requested percentile (in the interval [0.0,100.0])- Returns:
- the requested percentile value
-
getPercentiles
public double[] getPercentiles()Returns all the percentiles for the range [0,100].- Returns:
- an array containing all the percentiles in the range [0,100]
-
getXMinimum
public double getXMinimum()Returns the minimum of the input values.- Returns:
- the minimum of the input values
-
getXMaximum
public double getXMaximum()Returns the maximum of the input values.- Returns:
- the maximum of the input values
-
getXRange
public double getXRange()Returns the range of the input values.- Returns:
- the range of the input values
-
getKDEXMinimum
public double getKDEXMinimum()Returns the minimum of the values for a kernel density estimation (KDE) of the probability distribution function (PDF).- Returns:
- the minimum of the values for a kernel density estimation (KDE) of the probability distribution function (PDF)
-
getKDEXMaximum
public double getKDEXMaximum()Returns the maximum of the values for a kernel density estimation (KDE) of the probability distribution function (PDF).- Returns:
- the maximum of the values for a kernel density estimation (KDE) of the probability distribution function (PDF)
-
getKDEXRange
public double getKDEXRange()Returns the range of the values for a kernel density estimation (KDE) of the probability distribution function (PDF).- Returns:
- the range of the values for a kernel density estimation (KDE) of the probability distribution function (PDF)
-
getMedian
public double getMedian()Returns the median (i.e., the 50th percentile).- Returns:
- the median
-
getInterquartileRange
public double getInterquartileRange()Returns the interquartile range (IQR) (i.e., the difference between the 75th and the 25th percentiles).- Returns:
- the interquartile range (IQR)
-
recalculatePDF
public void recalculatePDF()Recalculates the probability density function (PDF).The Freedman-Diaconis rule is applied for finding the optimal histogram bin width, and consequently the optimal number of histogram bins:
bin width = 2 * IQR / n^1/3
-
recalculatePDF
public void recalculatePDF(int nrOfHistogramBins)Recalculates the probability density function (PDF) using a user-specified number of histogram bins.- Parameters:
nrOfHistogramBins
- the user-specified number of histogram bins
-
calculateKDEPDFBandwidth
Calculates the bandwidth for kernel density estimation (KDE) based on Silverman's Rule-of-Thumb.- Parameters:
kernelType
- the type of kernel function to use in the calculation- Returns:
- an estimation of the bandwidth
-
estimateKDEPDF
public void estimateKDEPDF(MathTools.EKernelType kernelType, double bandwidth, int nrOfSupportPoints, double minSupport, double maxSupport)Estimates the probability distribution function (PDF) using a specified kernel function.- Parameters:
kernelType
- the type of kernel function to usebandwidth
- the bandwidth of the kernel functionnrOfSupportPoints
- the number of (X,Y) values to use for the smoothened 1D functionminSupport
- the minimum value for the supportmaxSupport
- the maximum value for the support
-
getNrOfHistogramBins
public int getNrOfHistogramBins()Returns the number of histogram bins used for estimating the probability density function (PDF).- Returns:
- the number of histogram bins used for estimating the probability density function (PDF)
-
getHistogramBinCount
public double getHistogramBinCount(int histogramBin)Returns the count associated with a specified histogram bin.- Parameters:
histogramBin
- the histogram bin to lookup the count for- Returns:
- the count associated with the specified histogram bin
-
getHistogramBinCounts
public double[] getHistogramBinCounts()Returns the counts for all the histogram bins.- Returns:
- an array containing the counts for all the histogram bins
-
getHistogramBinFrequency
public double getHistogramBinFrequency(int histogramBin)Returns the frequency associated with a specified histogram bin.- Parameters:
histogramBin
- the histogram bin to lookup the frequency for- Returns:
- the frequency associated with the specified histogram bin
-
getHistogramBinFrequencies
public double[] getHistogramBinFrequencies()Returns the frequencies for all the histogram bins.- Returns:
- an array containing the frequencies for all the histogram bins
-
getHistogramBinCentre
public double getHistogramBinCentre(int histogramBin)Returns the centre of a specified histogram bin.- Parameters:
histogramBin
- the histogram bin to lookup the centre for- Returns:
- the centre of the specified histogram bin
-
getHistogramBinCentres
public double[] getHistogramBinCentres()Returns the centres of all the histogram bins.- Returns:
- an array containing the centres of all the histogram bins
-
getHistogramBinWidth
public double getHistogramBinWidth()Returns the width of a histogram bin.- Returns:
- the width of a histogram bin
-
getPDF
public double getPDF(double x)Returns the value of the probability density function (PDF) evaluated atx
(based on a histogram).- Parameters:
x
- the value to evaluate the probability density function at- Returns:
- the value of the probability density function evaluated at
x
-
getKDEPDF
public double getKDEPDF(double x)Returns the value of the probability density function (PDF) evaluated atx
(based on kernel density estimation, KDE).- Parameters:
x
- the value to evaluate the probability density function at- Returns:
- the value of the probability density function evaluated at
x
-
getFullKDEPDF
Returns the previously complete calculated kernel density estimation (KDE) of the probability distribution function (PDF).- Returns:
- the lookup table with X and Y values for the KDE PDF
-
getN
public int getN()Returns the sample size.- Returns:
- the sample size
-
getExpectedValue
public double getExpectedValue()Returns the expected value for the first moment (population mean), which in this case is approximated by the sample mean.- Returns:
- the expected value for the first moment
- See Also:
getMean()
-
getMean
public double getMean()This is the sample mean, which in this case is an alias for the expected value.- Returns:
- the sample mean
- See Also:
getExpectedValue()
,getTrimmedMean(double)
-
getTrimmedMean
public double getTrimmedMean(double percentageToTrim)This is the trimmed (or truncated) mean, which corresponds to the mean calculated after symmetrically discarding a certain percentage of data points at the high and low end (without interpolation).- Parameters:
percentageToTrim
- the percentage to trim (left and right combined)- Returns:
- the trimmed mean
- See Also:
getMean()
-
getKDEPDFModes
Returns all modes (i.e., local maxima) for the calculated kernel density estimation (KDE) of the probability density function (PDF).- Returns:
- all modes of the specified PDF
-
getVariance
public double getVariance()Returns the sample variance (using an unbiased estimator of the population variance).- Returns:
- the sample variance (using an unbiased estimator of the population variance)
-
getStandardDeviation
public double getStandardDeviation()Returns the standard deviation (i.e., the positive square root of the variance).- Returns:
- the standard deviation
-
getSkewness
public double getSkewness()Returns the sample skewness (using an unbiased estimator).Skewness implies:
- Positive skew: longer right tail, density mass constrained to the left.
- Negative skew: longer left tail, density mass constrained to the right.
Note that the amount of skewness is determined as follows:
- -0.5 ≤ skewness ≤ +0.5: approximately symmetric distribution.
- -1 ≤ skewness < -0.5, or +0.5 < skewness ≤ +1: moderately skewed distribution.
- skewness < -1, or skewness > +1: highly skewed distribution.
- Returns:
- the sample skewness (using an unbiased estimator)
-
getSkewnessConfidenceBounds
public double getSkewnessConfidenceBounds()Returns the symmetrical skewness' confidence bounds for a 95% confidence interval, defined as twice the standard error of skewness (SES).- Returns:
- the symmetrical skewness' confidence bounds for a 95% confidence interval
-
getSkewnessZStatistic
public double getSkewnessZStatistic()Returns a two-tailed test statistic Z of skewness (different from zero) with a 5% significance level.- Z > +2: population is very likely positively skewed.
- Z < -2: population is very likely negatively skewed.
- -2 ≤ Z ≤ +2: inconclusive (might be symmetric, might be skewed).
The larger Z, the higher the probability.
- Returns:
- the skewness Z-statistic
-
getKurtosis
public double getKurtosis()Returns the sample kurtosis (using an unbiased estimator).The value returned is the excess kurtosis, such that it is zero for a normal distribution:
- Mesokurtic: has zero excess (e.g., normal distribution).
- Leptokurtic: has positive excess, higher and sharper central peak, with longer and fatter tails (i.e., more extreme values).
- Platykurtic: has negative excess, lower and broader central peak, with shorter and thinner tails (i.e., less extreme values).
As the kurtosis increases, more probability mass is transferred from the distribution's shoulders to the centre and tails.
- Returns:
- the sample kurtosis (using an unbiased estimator)
-
getKurtosisZStatistic
public double getKurtosisZStatistic()Returns a two-tailed test statistic Z of kurtosis (different from zero) with a 5% significance level.- Z > +2: population has very likely positive kurtosis (leptokurtic).
- Z < -2: population has very likely negative kurtosis (platykurtic).
- -2 ≤ Z ≤ +2: inconclusive (might be negative, zero, or positive kurtosis).
The larger Z, the higher the probability.
- Returns:
- the kurtosis Z-statistic
-
getJarqueBeraTestStatistic
public double getJarqueBeraTestStatistic()Calculates the Jarque-Bera test statistic.This tests the goodness-of-fit of whether the distribution's skewness and kurtosis match that of the normal distribution.
The test result should be compared to the values of the chi-square distribution with 2 degrees of freedom.
- Returns:
- the Jarque-Bera test statistic
- See Also:
isJarqueBeraTestAccepted(double)
,getChiSquare(double,int)
-
isJarqueBeraTestAccepted
public boolean isJarqueBeraTestAccepted(double alpha)Compares the Jarque-Bera test statistic with the chi-square distribution with 2 degrees of freedom for a given alpha level.Alpha levels can be 0.995, 0.99, 0.975, 0.95, 0.90, 0.10, 0.05, 0.025, 0.01, or 0.005.
- Parameters:
alpha
- the alpha level- Returns:
true
if the test is accepted,false
if it is rejected- See Also:
getJarqueBeraTestStatistic()
,getChiSquare(double,int)
-
getChiSquare
public static double getChiSquare(double alpha, int degreesOfFreedom)Returns the chi-square value corresponding to a specified alpha level and number of degrees of freedom.Alpha levels can be 0.995, 0.99, 0.975, 0.95, 0.90, 0.10, 0.05, 0.025, 0.01, or 0.005.
The number of degrees of freedom is clipped between 1 and 100.
- Parameters:
alpha
- the alpha leveldegreesOfFreedom
- the number of degrees of freedom- Returns:
- the chi-square value corresponding to the specified alpha level and number of degrees of freedom
- See Also:
getJarqueBeraTestStatistic()
,isJarqueBeraTestAccepted(double)
-
getZScores
public double[] getZScores()Returns the calculated z-scores, defined as:(value - mean) / standard deviation
- Returns:
- the z-scores
- See Also:
getOutliers()
-
getOutliers
public boolean[] getOutliers()Returns the outliers which are defined as having z-scores greater than 3.- Returns:
- the outliers
- See Also:
getZScores()
-
getMeanDescription
public static java.lang.String getMeanDescription()Returns a descriptive label of the mean (expected value).- Returns:
- a descriptive label of the mean
-
getStandardDeviationDescription
public static java.lang.String getStandardDeviationDescription()Returns a descriptive label of the standard deviation.- Returns:
- a descriptive label of the standard deviation
-
getVarianceDescription
public static java.lang.String getVarianceDescription()Returns a descriptive label of the variance.- Returns:
- a descriptive label of the variance
-
getMedianDescription
public static java.lang.String getMedianDescription()Returns a descriptive label of the median.- Returns:
- a descriptive label of the median
-
getInterquartileRangeDescription
public static java.lang.String getInterquartileRangeDescription()Returns a descriptive label of the interquartile range (IQR).- Returns:
- a descriptive label of the interquartile range (IQR)
-
getPercentileDescription
public static java.lang.String getPercentileDescription()Returns a descriptive label of a percentile.- Returns:
- a descriptive label of a percentile
-
getSkewnessDescription
public static java.lang.String getSkewnessDescription()Returns a descriptive label of the skewness.- Returns:
- a descriptive label of the skewness
-
getKurtosisDescription
public static java.lang.String getKurtosisDescription()Returns a descriptive label of the kurtosis.- Returns:
- a descriptive label of the kurtosis
-
getSkewnessInterpretation
public java.lang.String getSkewnessInterpretation()Returns a qualitative description of the skewness based on its test statistic.- Returns:
- a qualitative description of the skewness
-
getKurtosisInterpretation
public java.lang.String getKurtosisInterpretation()Returns a qualitative description of the kurtosis based on its test statistic.- Returns:
- a qualitative description of the kurtosis
-