Yes, you seem to have it more or less correct. Both are heuristics: Silverman's rule is a fairly simple one, justified by the derivation from a Gaussian distribution; LSCV is more advanced, justified by a cross-validation argument.

As with all heuristics, there will be occasions where each of them will break, but generally I would probably lean toward kde_lscv, although they shouldn't give hugely different results.

-Simon

On Wednesday, 20 January 2016 08:29:42 UTC, Daniel Carrera wrote:

Hello,

This week I've begun learning about non-parametric statistics and I'm interested in the kernel density estimation, which is implemented in KernelDensity.jl.

Could someone help me understand how kde_lscv() differs from kde()? The documentation says it selects the bandwidth by "least squares cross validation". What does that mean? What are the advantages? As far as I can figure out, LSCV means that it tries to minimize the mean-squared error (MSE) and that's better because the regular kde() function uses a bandwidth estimation (Silverman's rule) that is designed for Gaussian data. Have I understood things correctly?

In general, should I worry about using kde() instead of kde_lscv() if I don't know ahead of time that my data is Gaussian? Or is kde() a good default?

Cheers,

Daniel.

--

You received this message because you are subscribed to the Google Groups "julia-stats" group.

To unsubscribe from this group and stop receiving emails from it, send an email to

[hidden email].

For more options, visit

https://groups.google.com/d/optout.