Skip to content

leta199/Rejection-Sampling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

333 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Rejection Sampling

Also know as the "accept - reject" method is a way of generating samples of data from a complex probability function. We will apply this method to solve problems in order to generate random variables. Statistics is fun! This project will cover:

  • How to define and graph a target probability density function.
  • How to define proposal probability density function.
  • Making a scaling factor for our problems

HOW IT'S MADE

Languages used: R (version 4.5.2)
Environment: RStudio

Language: R Built with RStudio Status

METHODS AND TECHNIQUES

Image

Target probability density function
This is the Gamma(2,1) function we must sample from where x >= 5.
The function that we define our probability density function from is:

target_pdf(x)
Image

The support of our pdf creates the following area: Image

Proposal probability density function
This is a known probability distribution that we can easily sample from e.g in R.
It must follow the following criteria:
i) Cover the target pdf i.e there must be some M where:

$$M \cdot g(x) \geq f(x)$$

for any x in the support.

Exponential Distribution ~ exp(0.8)

Initially, I wanted to use an exponential distribution however I soon ran into a few problems.
For this proposal distribution, I seleced an exponential distribution with lambda = 0.8.
This was to allow for the decay of the exponential to be less than that of the Gamma distribution.

We then have to find the scaling factor M so that we can fulfill condition 1 above. To find this value we used the logic in the Mathematics note here: Math Note exponential(0.8)
After finding this scaling factor, we plotted the graph of exponential(0.8) below:

Image

Then we can compare the exponential(0.8) to our target pdf: Image

  • We can see that the exponential fits the Gamma(2,1) target very well however, we have not since we only want to generate values from x >= 5, we reject many of our generated sampled from exponential.

  • Therefore, we will likely have many rejections making the function inefficient.

  • The proability of acceptance will be:

    $$P(\text{Success}) = \frac{1}{M} = \frac{1}{56.873}$$ = 1.76% i.e we need to generate 56.873 samples on average to accept 1 value.

Rejection Sampling

The rgamma_2_1_exp function implements the core rejection sampling logic. It iteratively generates candidates and filters them based on the calculated scaling factor M.

Initialization
n <- 5000: Defines the target number of accepted samples.
sample_y: A pre-allocated vector to store accepted values (improves performance).
total_sample: Tracks every attempt (successes and failures) to measure algorithm efficiency.
count: Tracks current successful iterations.

The Simulation Loop
The function uses a while loop that runs until count reaches n:
Proposal Generation: Calls exp_generator() to draw a candidate y from the Exponential distribution.
Comparison Variable: Generates u∼Uniform(0,1) to serve as the "threshold" for the acceptance test.

Acceptance Criterion:
The algorithm uses if statement to check if: u≤ f(y)/M⋅g(y), where g(y) is the exponential proposal density function

If True: The candidate y is stored in sample_y and count increments.

If False: The candidate is discarded, and the loop repeats.

Return Values

The function returns a list containing three key outputs:
acceptance_rate: A formatted string showing the percentage of successful proposals vs. total attempts.
samples: The final vector of values representing the target distribution.
no_of_samples: The total iterations required (useful for complexity analysis).

We can see that we have managed to generate data from the Gamma(2,1) using rejection sampling and proposal exp(0.8). Histogram:

Image

We can also verify that the generated samples are from the Gamma(2,1) distribution using the QQ plot: Image

  • The QQ plot is slightly shifted downwards as the mean of our Exponential(0.8) is 1.25 and less than of the Gamma(2,1) at 2.

Shifted Cauchy Distribution ~ 5 + cauchy(1)

Cuachy distributions are calssically heavier tailed than T or Normal or distirbutions therefore, this was the second proposal to be used.
After finding the appropriate scaling scaling factor (math note found here:Math note cauchy , we plotted the graph of cauchy(5) below: Image

Then we can compare the cauchy(5) to our target pdf: Image

$$P(\text{Success}) = \frac{1}{N} = \frac{1}{7.973}$$ = 12.54% i.e we need to generate 7.973 samples on average to accept 1 value.

We then use the same rejection sampling algoirthm that we had above and we can investiagte the geneated values and compare the efficiencies of the two methods. We expcet this to be more accurate.
We then generate samples using the rgamma_2_1_cauchy function. We can see that we have generated the appropriate sample values using the histogram below:

Image

Our QQ plot is very similar to that of the exponential(0.8) proposal: Image

Efficiency of Proposal
To inverstigate the change in effieciny we used the formula:

$$ \frac{\text{Var}(P_{\text{cauchy}}) - \text{Var}(P_{\text{exp}})}{\text{Var}(P_{\text{exp}})} $$

By utilisng the cauchy distribution, we have acieved a 9.8% improvement in data generation regarding variance.
However, eventhough we had improved variance, the number of samples we generate was still very high at 72% less samples needed for the Cauchy but still around 79,000 samples needed. The main takeawasy would be that:

  • The shifted nature of the Cauchy proposal helps reduce the variance of our samples.
  • The Cauchy does not have the same support as the Gamma(2,1). Cauchy is supported from - to + infinity while the Gamma(2,1) is supported on x >= 5. This means that half of our samples are automatically rejected with is a waste of samples.
  • Let us try using a sifted exponential(0.8) to compare the efficiecncy.

Shifted Exponential Distribution ~ 5 + exp(0.8)

For this proposal, we will use a shifted exponential with the same lambda of 0.8 above.
We can see that this shifted exponential fits even better than the above exponential:

Image

$$P(\text{Success}) = \frac{1}{N} = \frac{1}{1.041667}$$ = 96% i.e we need to generate 1.042 samples on average to accept 1 value which is close to 1:1.

We use the same method of rejection sampling as above by using the function rgamma_2_1_shift_exp.
We can see that this same sample follows the Gamma(2,1) distirbution from the histogram and QQ plot: Image

Image

Using this method we have a 2.75% decrease in variance and we have a 93.4% decrease in the number of samples we generate.
Therefore we have acheived the best version of rejection sampling.

Comparison of methods
Number of samples generated
We can see that using each subsequent method we have a drastic reduction in number of samples generated to create a list of 5000 valid samples.

Image

Variance of samples generated
All of the methods have a similar variance but our shifted exponential has the lowest variance. Image

Main Takeaways

  • Always ensure to match the shape of the target pdf such as its decay.
  • Probability density functions and their associated functions often have different supports to be a valid pdf.
  • When finding an appropriate proposal play around with all parameters and translations and scaling factors.
  • The larger the sclaing factor the lower the probability of acceptance.

Anyway, thanks for reading. Give it a try and a star!!

PROJECT STRUCTURE

|Simulation- Rejection Sampling

|├── CauchyProposal
| ├──Math note scaling factor
│ ├──QQ_plot_cauchy
│ ├──Cauchy_vs_Gamma
│ ├──Cauchy proposal
│ ├──Full_cauchy
│ └──Histogram_cauchy

|├── ExponentialProposal
| ├──Exponential graph
│ ├──Math notes scaling factor
│ ├──Histogram exp
│ └──QQ plot

|├── RejectionSampling-R
| └──Rejection Sampling R file

|├── ShiftedExponentialProposal
| ├──QQ plot of shifted exp
│ ├──Comparison graph
│ └──Histogram of samples

|└──README

USEFUL RESOURCES

The textbook "Probability with applications and R" by Dr. Wagaman and Dr. Dobrow was very helpful in many of my endevours.
Rejection Sampling: Sampling from ‘difficult’ distributions - was a website that lays down the basics of rejection sampling.

WHAT DOES THE FUTURE HOLD?

  1. Check distribution using histogram ✅
  2. Check distribution using qqplot ✅
  3. Use a shifted Cauchy ✅
  4. Use a shifted Exponential ✅

AUTHORS

leta199

About

How do we take sample from an extremely complex probability distribution. Rejection sampling of course! This allows us to create a sample of size n from a complex probability distribution that may be impossible to sample from otherwise.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages