Log-log regression

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Log-log regression

Mario Silveira
How to run a log-log linear regression  in Julia?
Like "lm(log (y) ~ log (x)" in R

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Log-log regression

Milan Bouchet-Valat
Le dimanche 21 février 2016 à 15:54 -0800, Mario Henrique a écrit :
> How to run a log-log linear regression  in Julia?
> Like "lm(log (y) ~ log (x)" in R
AFAIK this is perfectly equivalent to taking the log of both x and y,
and applying a linear regression on the resulting variables. So that
should be quite straightforward with GLM.jl (see its documentation).


Regards

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Log-log regression

Michael Borregaard
The documentation is not very explicit about the preferred way to do something like that. It looks to me as if you have to put the variables in a DataFrame, update the DataFrame with log versions, then do the lm.
using GLM, DataFrames
x
= collect(1:5) + rand(5)
y
= collect(1:5) + rand(5)
test
= DataFrame(x = x, y = y)
test
[:logx] = log(test[:x])
test
[:logy] = log(test[:y])
lm
(logy ~ logx, test)

Is that the preferred method?

Den mandag den 22. februar 2016 kl. 10.45.25 UTC+1 skrev Milan Bouchet-Valat:
Le dimanche 21 février 2016 à 15:54 -0800, Mario Henrique a écrit :
> How to run a log-log linear regression  in Julia?
> Like "lm(log (y) ~ log (x)" in R
AFAIK this is perfectly equivalent to taking the log of both x and y,
and applying a linear regression on the resulting variables. So that
should be quite straightforward with GLM.jl (see its documentation).


Regards

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Log-log regression

Mario Silveira
Yes, i have to put log in data frame firt, take "lm(log(y) ~ log(x), data)" i get a error.
Thanks for help.

2016-02-22 8:30 GMT-03:00 Michael Borregaard <[hidden email]>:
The documentation is not very explicit about the preferred way to do something like that. It looks to me as if you have to put the variables in a DataFrame, update the DataFrame with log versions, then do the lm.
using GLM, DataFrames
x
= collect(1:5) + rand(5)
y
= collect(1:5) + rand(5)
test
= DataFrame(x = x, y = y)
test
[:logx] = log(test[:x])
test
[:logy] = log(test[:y])
lm
(logy ~ logx, test)

Is that the preferred method?


Den mandag den 22. februar 2016 kl. 10.45.25 UTC+1 skrev Milan Bouchet-Valat:
Le dimanche 21 février 2016 à 15:54 -0800, Mario Henrique a écrit :
> How to run a log-log linear regression  in Julia?
> Like "lm(log (y) ~ log (x)" in R
AFAIK this is perfectly equivalent to taking the log of both x and y,
and applying a linear regression on the resulting variables. So that
should be quite straightforward with GLM.jl (see its documentation).


Regards

--
You received this message because you are subscribed to a topic in the Google Groups "julia-stats" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/julia-stats/wGH77VmDQDc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--
Mario Henrique

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Log-log regression

Milan Bouchet-Valat
Le lundi 22 février 2016 à 08:48 -0300, Mario Silveira a écrit :
> Yes, i have to put log in data frame firt, take "lm(log(y) ~ log(x),
> data)" i get a error.
Yes, sorry I wasn't clear (I thought your question was about
statistical theory). Adding transformed variables to the data frame is
the preferred method AFAIK. And it's not too annoying to do either.

That said, transformation inside formulas will likely be supported at
some point. See this old issue :
https://github.com/JuliaStats/DataFrames.jl/issues/19


Regards

> Thanks for help.
>
> 2016-02-22 8:30 GMT-03:00 Michael Borregaard <[hidden email]>
> :
> > The documentation is not very explicit about the preferred way to
> > do something like that. It looks to me as if you have to put the
> > variables in a DataFrame, update the DataFrame with log versions,
> > then do the lm. 
> > using GLM, DataFrames
> > x = collect(1:5) + rand(5)
> > y = collect(1:5) + rand(5)
> > test = DataFrame(x = x, y = y)
> > test[:logx] = log(test[:x])
> > test[:logy] = log(test[:y])
> > lm(logy ~ logx, test)
> >
> > Is that the preferred method?
> >
> >
> > > Le dimanche 21 février 2016 à 15:54 -0800, Mario Henrique a
> > > écrit : 
> > > > How to run a log-log linear regression  in Julia? 
> > > > Like "lm(log (y) ~ log (x)" in R 
> > > AFAIK this is perfectly equivalent to taking the log of both x
> > > and y, 
> > > and applying a linear regression on the resulting variables. So
> > > that 
> > > should be quite straightforward with GLM.jl (see its
> > > documentation). 
> > >
> > >
> > > Regards 
> > -- 
> > You received this message because you are subscribed to a topic in
> > the Google Groups "julia-stats" group.
> > To unsubscribe from this topic, visit https://groups.google.com/d/t
> > opic/julia-stats/wGH77VmDQDc/unsubscribe.
> > To unsubscribe from this group and all its topics, send an email to
> > [hidden email].
> > For more options, visit https://groups.google.com/d/optout.
> >
>
>

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Log-log regression

Michael Borregaard
Thanks for the clarification! And great news on the potential for functions to be included in formulas. It is not too annoying to add the variables, but it will help to create a more fluid and natural analytical workflow I feel.

On Mon, Feb 22, 2016 at 1:53 PM, Milan Bouchet-Valat <[hidden email]> wrote:
Le lundi 22 février 2016 à 08:48 -0300, Mario Silveira a écrit :
> Yes, i have to put log in data frame firt, take "lm(log(y) ~ log(x),
> data)" i get a error.
Yes, sorry I wasn't clear (I thought your question was about
statistical theory). Adding transformed variables to the data frame is
the preferred method AFAIK. And it's not too annoying to do either.

That said, transformation inside formulas will likely be supported at
some point. See this old issue :
https://github.com/JuliaStats/DataFrames.jl/issues/19


Regards

> Thanks for help.
>
> 2016-02-22 8:30 GMT-03:00 Michael Borregaard <[hidden email]>
> :
> > The documentation is not very explicit about the preferred way to
> > do something like that. It looks to me as if you have to put the
> > variables in a DataFrame, update the DataFrame with log versions,
> > then do the lm. 
> > using GLM, DataFrames
> > x = collect(1:5) + rand(5)
> > y = collect(1:5) + rand(5)
> > test = DataFrame(x = x, y = y)
> > test[:logx] = log(test[:x])
> > test[:logy] = log(test[:y])
> > lm(logy ~ logx, test)
> >
> > Is that the preferred method?
> >
> >
> > > Le dimanche 21 février 2016 à 15:54 -0800, Mario Henrique a
> > > écrit : 
> > > > How to run a log-log linear regression  in Julia? 
> > > > Like "lm(log (y) ~ log (x)" in R 
> > > AFAIK this is perfectly equivalent to taking the log of both x
> > > and y, 
> > > and applying a linear regression on the resulting variables. So
> > > that 
> > > should be quite straightforward with GLM.jl (see its
> > > documentation). 
> > >
> > >
> > > Regards 
> > -- 
> > You received this message because you are subscribed to a topic in
> > the Google Groups "julia-stats" group.
> > To unsubscribe from this topic, visit https://groups.google.com/d/t
> > opic/julia-stats/wGH77VmDQDc/unsubscribe.
> > To unsubscribe from this group and all its topics, send an email to
> > [hidden email].
> > For more options, visit https://groups.google.com/d/optout.
> >
>
>

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Log-log regression

Mario Silveira
Yes Michael. I'm economist and most softwares I know for econometrics have to add log variables firt to do regression. There is no problem in do it for me.
Thanks for help, Julia is great!!


2016-02-22 10:07 GMT-03:00 Michael Krabbe Borregaard <[hidden email]>:
Thanks for the clarification! And great news on the potential for functions to be included in formulas. It is not too annoying to add the variables, but it will help to create a more fluid and natural analytical workflow I feel.

On Mon, Feb 22, 2016 at 1:53 PM, Milan Bouchet-Valat <[hidden email]> wrote:
Le lundi 22 février 2016 à 08:48 -0300, Mario Silveira a écrit :
> Yes, i have to put log in data frame firt, take "lm(log(y) ~ log(x),
> data)" i get a error.
Yes, sorry I wasn't clear (I thought your question was about
statistical theory). Adding transformed variables to the data frame is
the preferred method AFAIK. And it's not too annoying to do either.

That said, transformation inside formulas will likely be supported at
some point. See this old issue :
https://github.com/JuliaStats/DataFrames.jl/issues/19


Regards

> Thanks for help.
>
> 2016-02-22 8:30 GMT-03:00 Michael Borregaard <[hidden email]>
> :
> > The documentation is not very explicit about the preferred way to
> > do something like that. It looks to me as if you have to put the
> > variables in a DataFrame, update the DataFrame with log versions,
> > then do the lm. 
> > using GLM, DataFrames
> > x = collect(1:5) + rand(5)
> > y = collect(1:5) + rand(5)
> > test = DataFrame(x = x, y = y)
> > test[:logx] = log(test[:x])
> > test[:logy] = log(test[:y])
> > lm(logy ~ logx, test)
> >
> > Is that the preferred method?
> >
> >
> > > Le dimanche 21 février 2016 à 15:54 -0800, Mario Henrique a
> > > écrit : 
> > > > How to run a log-log linear regression  in Julia? 
> > > > Like "lm(log (y) ~ log (x)" in R 
> > > AFAIK this is perfectly equivalent to taking the log of both x
> > > and y, 
> > > and applying a linear regression on the resulting variables. So
> > > that 
> > > should be quite straightforward with GLM.jl (see its
> > > documentation). 
> > >
> > >
> > > Regards 
> > -- 
> > You received this message because you are subscribed to a topic in
> > the Google Groups "julia-stats" group.
> > To unsubscribe from this topic, visit https://groups.google.com/d/t
> > opic/julia-stats/wGH77VmDQDc/unsubscribe.
> > To unsubscribe from this group and all its topics, send an email to
> > [hidden email].
> > For more options, visit https://groups.google.com/d/optout.
> >
>
>

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "julia-stats" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/julia-stats/wGH77VmDQDc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--
Mario Henrique

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Log-log regression

Cedric St-Jean-2
This page seems relevant:

  1. Abusing linear regression makes the baby Gauss cry. Fitting a line to your log-log plot by least squares is a bad idea. It generally doesn't even give you a probability distribution, and even if your data do follow a power-law distribution, it gives you a bad estimate of the parameters. You cannot use the error estimates your regression software gives you, because those formulas incorporate assumptions which directly contradict the idea that you are seeing samples from a power law. And no, you cannot claim that because the line "explains" (really, describes) a lot of the variance that you must have a power law, because you can get a very high R^2 from other distributions (that test has no "power"). And this is without getting into the additional errors caused by trying to fit a line to binned histograms.
    It's true that fitting lines on log-log graphs is what Pareto did back in the day when he started this whole power-law business, but "the day" was the 1890s. There's a time and a place for being old school; this isn't it.
  2. Use maximum likelihood to estimate the scaling exponent. It's fast! The formula is easy! Best of all, it works! The method of maximum likelihood was invented in 1922 [parts 1 and 2], by someone who studied statistical mechanics, no less. The maximum likelihood estimators for the discrete (Zipf/zeta) and continuous (Pareto) power laws were worked out in 1952 and 1957 (respectively). They converge on the correct value of the scaling exponent with probability 1, and they do so efficiently. You can even work out their sampling distribution (it's an inverse gamma) and so get exact confidence intervals. Use the MLEs!
I don't usually work with power laws, so I don't have an opinion on this. But I believe that the issue is that the log on the y-axis distorts the Gaussian error distribution on which the least-square fit is predicated.

HTH,

Cédric

On Monday, February 22, 2016 at 8:59:29 AM UTC-5, Mario Henrique wrote:
Yes Michael. I'm economist and most softwares I know for econometrics have to add log variables firt to do regression. There is no problem in do it for me.
Thanks for help, Julia is great!!


2016-02-22 10:07 GMT-03:00 Michael Krabbe Borregaard <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="WZpnelThFwAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">mkborr...@...>:
Thanks for the clarification! And great news on the potential for functions to be included in formulas. It is not too annoying to add the variables, but it will help to create a more fluid and natural analytical workflow I feel.

On Mon, Feb 22, 2016 at 1:53 PM, Milan Bouchet-Valat <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="WZpnelThFwAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">nali...@...> wrote:
Le lundi 22 février 2016 à 08:48 -0300, Mario Silveira a écrit :
> Yes, i have to put log in data frame firt, take "lm(log(y) ~ log(x),
> data)" i get a error.
Yes, sorry I wasn't clear (I thought your question was about
statistical theory). Adding transformed variables to the data frame is
the preferred method AFAIK. And it's not too annoying to do either.

That said, transformation inside formulas will likely be supported at
some point. See this old issue :
<a href="https://github.com/JuliaStats/DataFrames.jl/issues/19" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2FJuliaStats%2FDataFrames.jl%2Fissues%2F19\46sa\75D\46sntz\0751\46usg\75AFQjCNHZaASVgW4NiG2knUDmHJGil9KmWg&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2FJuliaStats%2FDataFrames.jl%2Fissues%2F19\46sa\75D\46sntz\0751\46usg\75AFQjCNHZaASVgW4NiG2knUDmHJGil9KmWg&#39;;return true;">https://github.com/JuliaStats/DataFrames.jl/issues/19


Regards

> Thanks for help.
>
> 2016-02-22 8:30 GMT-03:00 Michael Borregaard <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="WZpnelThFwAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">mkborr...@...>
> :
> > The documentation is not very explicit about the preferred way to
> > do something like that. It looks to me as if you have to put the
> > variables in a DataFrame, update the DataFrame with log versions,
> > then do the lm. 
> > using GLM, DataFrames
> > x = collect(1:5) + rand(5)
> > y = collect(1:5) + rand(5)
> > test = DataFrame(x = x, y = y)
> > test[:logx] = log(test[:x])
> > test[:logy] = log(test[:y])
> > lm(logy ~ logx, test)
> >
> > Is that the preferred method?
> >
> >
> > > Le dimanche 21 février 2016 à 15:54 -0800, Mario Henrique a
> > > écrit : 
> > > > How to run a log-log linear regression  in Julia? 
> > > > Like "lm(log (y) ~ log (x)" in R 
> > > AFAIK this is perfectly equivalent to taking the log of both x
> > > and y, 
> > > and applying a linear regression on the resulting variables. So
> > > that 
> > > should be quite straightforward with GLM.jl (see its
> > > documentation). 
> > >
> > >
> > > Regards 
> > -- 
> > You received this message because you are subscribed to a topic in
> > the Google Groups "julia-stats" group.
> > To unsubscribe from this topic, visit <a href="https://groups.google.com/d/t" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://groups.google.com/d/t&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/t&#39;;return true;">https://groups.google.com/d/t
> > opic/julia-stats/wGH77VmDQDc/unsubscribe.
> > To unsubscribe from this group and all its topics, send an email to
> > <a href="javascript:" target="_blank" gdf-obfuscated-mailto="WZpnelThFwAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">julia-stats...@googlegroups.com.
> > For more options, visit <a href="https://groups.google.com/d/optout" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.
> >
>
>

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="WZpnelThFwAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">julia-stats...@googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "julia-stats" group.
To unsubscribe from this topic, visit <a href="https://groups.google.com/d/topic/julia-stats/wGH77VmDQDc/unsubscribe" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/topic/julia-stats/wGH77VmDQDc/unsubscribe&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/topic/julia-stats/wGH77VmDQDc/unsubscribe&#39;;return true;">https://groups.google.com/d/topic/julia-stats/wGH77VmDQDc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="WZpnelThFwAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">julia-stats...@googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.



--
Mario Henrique

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Log-log regression

Mario Silveira
Cédric,
in economics log-log models are widely used for various reasons, the main one is that it allows you to interpret the model in elasticity term.
Log-log with OLS appears in the majority of econometrics books, has been shown, empirical and mathematically be useful when used with the right premises.
The author must be right that log-log has its problems, but I think it is wrong to generalize it, at least he should present some quotes with proof and evidence, if unlike just merely an allegation without scientific value.

Thanks for atention Cédric.


2016-03-01 22:39 GMT-03:00 Cedric St-Jean <[hidden email]>:
This page seems relevant:

  1. Abusing linear regression makes the baby Gauss cry. Fitting a line to your log-log plot by least squares is a bad idea. It generally doesn't even give you a probability distribution, and even if your data do follow a power-law distribution, it gives you a bad estimate of the parameters. You cannot use the error estimates your regression software gives you, because those formulas incorporate assumptions which directly contradict the idea that you are seeing samples from a power law. And no, you cannot claim that because the line "explains" (really, describes) a lot of the variance that you must have a power law, because you can get a very high R^2 from other distributions (that test has no "power"). And this is without getting into the additional errors caused by trying to fit a line to binned histograms.
    It's true that fitting lines on log-log graphs is what Pareto did back in the day when he started this whole power-law business, but "the day" was the 1890s. There's a time and a place for being old school; this isn't it.
  2. Use maximum likelihood to estimate the scaling exponent. It's fast! The formula is easy! Best of all, it works! The method of maximum likelihood was invented in 1922 [parts 1 and 2], by someone who studied statistical mechanics, no less. The maximum likelihood estimators for the discrete (Zipf/zeta) and continuous (Pareto) power laws were worked out in 1952 and 1957 (respectively). They converge on the correct value of the scaling exponent with probability 1, and they do so efficiently. You can even work out their sampling distribution (it's an inverse gamma) and so get exact confidence intervals. Use the MLEs!
I don't usually work with power laws, so I don't have an opinion on this. But I believe that the issue is that the log on the y-axis distorts the Gaussian error distribution on which the least-square fit is predicated.

HTH,

Cédric

On Monday, February 22, 2016 at 8:59:29 AM UTC-5, Mario Henrique wrote:
Yes Michael. I'm economist and most softwares I know for econometrics have to add log variables firt to do regression. There is no problem in do it for me.
Thanks for help, Julia is great!!


2016-02-22 10:07 GMT-03:00 Michael Krabbe Borregaard <[hidden email]>:
Thanks for the clarification! And great news on the potential for functions to be included in formulas. It is not too annoying to add the variables, but it will help to create a more fluid and natural analytical workflow I feel.

On Mon, Feb 22, 2016 at 1:53 PM, Milan Bouchet-Valat <[hidden email]> wrote:
Le lundi 22 février 2016 à 08:48 -0300, Mario Silveira a écrit :
> Yes, i have to put log in data frame firt, take "lm(log(y) ~ log(x),
> data)" i get a error.
Yes, sorry I wasn't clear (I thought your question was about
statistical theory). Adding transformed variables to the data frame is
the preferred method AFAIK. And it's not too annoying to do either.

That said, transformation inside formulas will likely be supported at
some point. See this old issue :
https://github.com/JuliaStats/DataFrames.jl/issues/19


Regards

> Thanks for help.

>
> 2016-02-22 8:30 GMT-03:00 Michael Borregaard <[hidden email]>
> :
> > The documentation is not very explicit about the preferred way to
> > do something like that. It looks to me as if you have to put the
> > variables in a DataFrame, update the DataFrame with log versions,
> > then do the lm. 
> > using GLM, DataFrames
> > x = collect(1:5) + rand(5)
> > y = collect(1:5) + rand(5)
> > test = DataFrame(x = x, y = y)
> > test[:logx] = log(test[:x])
> > test[:logy] = log(test[:y])
> > lm(logy ~ logx, test)
> >
> > Is that the preferred method?
> >
> >
> > > Le dimanche 21 février 2016 à 15:54 -0800, Mario Henrique a
> > > écrit : 
> > > > How to run a log-log linear regression  in Julia? 
> > > > Like "lm(log (y) ~ log (x)" in R 
> > > AFAIK this is perfectly equivalent to taking the log of both x
> > > and y, 
> > > and applying a linear regression on the resulting variables. So
> > > that 
> > > should be quite straightforward with GLM.jl (see its
> > > documentation). 
> > >
> > >
> > > Regards 
> > -- 
> > You received this message because you are subscribed to a topic in
> > the Google Groups "julia-stats" group.
> > To unsubscribe from this topic, visit https://groups.google.com/d/t
> > opic/julia-stats/wGH77VmDQDc/unsubscribe.
> > To unsubscribe from this group and all its topics, send an email to
> > [hidden email].
> > For more options, visit https://groups.google.com/d/optout.
> >
>
>

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "julia-stats" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/julia-stats/wGH77VmDQDc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--
Mario Henrique

--
You received this message because you are subscribed to a topic in the Google Groups "julia-stats" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/julia-stats/wGH77VmDQDc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--
Mario Henrique

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Log-log regression

Stefan Karpinski
Cedric's point isn't that you can't have models that appear linear on a log-log plot – power laws are precisely this kind of model. The point is that you should not use linear regression on the log-transformed data to estimate the model parameters. If that's what's done in econometrics books, they may need revision.

On Tue, Mar 1, 2016 at 9:17 PM, Mario Silveira <[hidden email]> wrote:
Cédric,
in economics log-log models are widely used for various reasons, the main one is that it allows you to interpret the model in elasticity term.
Log-log with OLS appears in the majority of econometrics books, has been shown, empirical and mathematically be useful when used with the right premises.
The author must be right that log-log has its problems, but I think it is wrong to generalize it, at least he should present some quotes with proof and evidence, if unlike just merely an allegation without scientific value.

Thanks for atention Cédric.


2016-03-01 22:39 GMT-03:00 Cedric St-Jean <[hidden email]>:
This page seems relevant:

  1. Abusing linear regression makes the baby Gauss cry. Fitting a line to your log-log plot by least squares is a bad idea. It generally doesn't even give you a probability distribution, and even if your data do follow a power-law distribution, it gives you a bad estimate of the parameters. You cannot use the error estimates your regression software gives you, because those formulas incorporate assumptions which directly contradict the idea that you are seeing samples from a power law. And no, you cannot claim that because the line "explains" (really, describes) a lot of the variance that you must have a power law, because you can get a very high R^2 from other distributions (that test has no "power"). And this is without getting into the additional errors caused by trying to fit a line to binned histograms.
    It's true that fitting lines on log-log graphs is what Pareto did back in the day when he started this whole power-law business, but "the day" was the 1890s. There's a time and a place for being old school; this isn't it.
  2. Use maximum likelihood to estimate the scaling exponent. It's fast! The formula is easy! Best of all, it works! The method of maximum likelihood was invented in 1922 [parts 1 and 2], by someone who studied statistical mechanics, no less. The maximum likelihood estimators for the discrete (Zipf/zeta) and continuous (Pareto) power laws were worked out in 1952 and 1957 (respectively). They converge on the correct value of the scaling exponent with probability 1, and they do so efficiently. You can even work out their sampling distribution (it's an inverse gamma) and so get exact confidence intervals. Use the MLEs!
I don't usually work with power laws, so I don't have an opinion on this. But I believe that the issue is that the log on the y-axis distorts the Gaussian error distribution on which the least-square fit is predicated.

HTH,

Cédric

On Monday, February 22, 2016 at 8:59:29 AM UTC-5, Mario Henrique wrote:
Yes Michael. I'm economist and most softwares I know for econometrics have to add log variables firt to do regression. There is no problem in do it for me.
Thanks for help, Julia is great!!


2016-02-22 10:07 GMT-03:00 Michael Krabbe Borregaard <[hidden email]>:
Thanks for the clarification! And great news on the potential for functions to be included in formulas. It is not too annoying to add the variables, but it will help to create a more fluid and natural analytical workflow I feel.

On Mon, Feb 22, 2016 at 1:53 PM, Milan Bouchet-Valat <[hidden email]> wrote:
Le lundi 22 février 2016 à 08:48 -0300, Mario Silveira a écrit :
> Yes, i have to put log in data frame firt, take "lm(log(y) ~ log(x),
> data)" i get a error.
Yes, sorry I wasn't clear (I thought your question was about
statistical theory). Adding transformed variables to the data frame is
the preferred method AFAIK. And it's not too annoying to do either.

That said, transformation inside formulas will likely be supported at
some point. See this old issue :
https://github.com/JuliaStats/DataFrames.jl/issues/19


Regards

> Thanks for help.

>
> 2016-02-22 8:30 GMT-03:00 Michael Borregaard <[hidden email]>
> :
> > The documentation is not very explicit about the preferred way to
> > do something like that. It looks to me as if you have to put the
> > variables in a DataFrame, update the DataFrame with log versions,
> > then do the lm. 
> > using GLM, DataFrames
> > x = collect(1:5) + rand(5)
> > y = collect(1:5) + rand(5)
> > test = DataFrame(x = x, y = y)
> > test[:logx] = log(test[:x])
> > test[:logy] = log(test[:y])
> > lm(logy ~ logx, test)
> >
> > Is that the preferred method?
> >
> >
> > > Le dimanche 21 février 2016 à 15:54 -0800, Mario Henrique a
> > > écrit : 
> > > > How to run a log-log linear regression  in Julia? 
> > > > Like "lm(log (y) ~ log (x)" in R 
> > > AFAIK this is perfectly equivalent to taking the log of both x
> > > and y, 
> > > and applying a linear regression on the resulting variables. So
> > > that 
> > > should be quite straightforward with GLM.jl (see its
> > > documentation). 
> > >
> > >
> > > Regards 
> > -- 
> > You received this message because you are subscribed to a topic in
> > the Google Groups "julia-stats" group.
> > To unsubscribe from this topic, visit https://groups.google.com/d/t
> > opic/julia-stats/wGH77VmDQDc/unsubscribe.
> > To unsubscribe from this group and all its topics, send an email to
> > [hidden email].
> > For more options, visit https://groups.google.com/d/optout.
> >
>
>

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "julia-stats" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/julia-stats/wGH77VmDQDc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--
Mario Henrique

--
You received this message because you are subscribed to a topic in the Google Groups "julia-stats" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/julia-stats/wGH77VmDQDc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--
Mario Henrique

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Log-log regression

Mario Silveira
Econometrics eventually "developing independently" of the statistical form. The reason is the type of data we deal. In a basic course in statistics, linear regression takes only a few chapters of the book. In a first course of econometrics, the whole course is about regression models, when the estimators are biased, multicollinearity, heteroscedasticity, static or dynamic interpretation ...
I believe that no other area of study is so much concerned with the accuracy of regression as in econometrics.
Ps: in econometrics, statistical estimation happens in a second stage, first the model must be validated theoretically. Of course, there are several problems, but the log-log regression to elasticity is one of the most basic and reliable tools used every day worldwide.
If you interest in see a litle of how econometric works, an sugestion is Econometrics Analysis by Greene.

2016-03-01 23:26 GMT-03:00 Stefan Karpinski <[hidden email]>:
Cedric's point isn't that you can't have models that appear linear on a log-log plot – power laws are precisely this kind of model. The point is that you should not use linear regression on the log-transformed data to estimate the model parameters. If that's what's done in econometrics books, they may need revision.

On Tue, Mar 1, 2016 at 9:17 PM, Mario Silveira <[hidden email]> wrote:
Cédric,
in economics log-log models are widely used for various reasons, the main one is that it allows you to interpret the model in elasticity term.
Log-log with OLS appears in the majority of econometrics books, has been shown, empirical and mathematically be useful when used with the right premises.
The author must be right that log-log has its problems, but I think it is wrong to generalize it, at least he should present some quotes with proof and evidence, if unlike just merely an allegation without scientific value.

Thanks for atention Cédric.


2016-03-01 22:39 GMT-03:00 Cedric St-Jean <[hidden email]>:
This page seems relevant:

  1. Abusing linear regression makes the baby Gauss cry. Fitting a line to your log-log plot by least squares is a bad idea. It generally doesn't even give you a probability distribution, and even if your data do follow a power-law distribution, it gives you a bad estimate of the parameters. You cannot use the error estimates your regression software gives you, because those formulas incorporate assumptions which directly contradict the idea that you are seeing samples from a power law. And no, you cannot claim that because the line "explains" (really, describes) a lot of the variance that you must have a power law, because you can get a very high R^2 from other distributions (that test has no "power"). And this is without getting into the additional errors caused by trying to fit a line to binned histograms.
    It's true that fitting lines on log-log graphs is what Pareto did back in the day when he started this whole power-law business, but "the day" was the 1890s. There's a time and a place for being old school; this isn't it.
  2. Use maximum likelihood to estimate the scaling exponent. It's fast! The formula is easy! Best of all, it works! The method of maximum likelihood was invented in 1922 [parts 1 and 2], by someone who studied statistical mechanics, no less. The maximum likelihood estimators for the discrete (Zipf/zeta) and continuous (Pareto) power laws were worked out in 1952 and 1957 (respectively). They converge on the correct value of the scaling exponent with probability 1, and they do so efficiently. You can even work out their sampling distribution (it's an inverse gamma) and so get exact confidence intervals. Use the MLEs!
I don't usually work with power laws, so I don't have an opinion on this. But I believe that the issue is that the log on the y-axis distorts the Gaussian error distribution on which the least-square fit is predicated.

HTH,

Cédric

On Monday, February 22, 2016 at 8:59:29 AM UTC-5, Mario Henrique wrote:
Yes Michael. I'm economist and most softwares I know for econometrics have to add log variables firt to do regression. There is no problem in do it for me.
Thanks for help, Julia is great!!


2016-02-22 10:07 GMT-03:00 Michael Krabbe Borregaard <[hidden email]>:
Thanks for the clarification! And great news on the potential for functions to be included in formulas. It is not too annoying to add the variables, but it will help to create a more fluid and natural analytical workflow I feel.

On Mon, Feb 22, 2016 at 1:53 PM, Milan Bouchet-Valat <[hidden email]> wrote:
Le lundi 22 février 2016 à 08:48 -0300, Mario Silveira a écrit :
> Yes, i have to put log in data frame firt, take "lm(log(y) ~ log(x),
> data)" i get a error.
Yes, sorry I wasn't clear (I thought your question was about
statistical theory). Adding transformed variables to the data frame is
the preferred method AFAIK. And it's not too annoying to do either.

That said, transformation inside formulas will likely be supported at
some point. See this old issue :
https://github.com/JuliaStats/DataFrames.jl/issues/19


Regards

> Thanks for help.

>
> 2016-02-22 8:30 GMT-03:00 Michael Borregaard <[hidden email]>
> :
> > The documentation is not very explicit about the preferred way to
> > do something like that. It looks to me as if you have to put the
> > variables in a DataFrame, update the DataFrame with log versions,
> > then do the lm. 
> > using GLM, DataFrames
> > x = collect(1:5) + rand(5)
> > y = collect(1:5) + rand(5)
> > test = DataFrame(x = x, y = y)
> > test[:logx] = log(test[:x])
> > test[:logy] = log(test[:y])
> > lm(logy ~ logx, test)
> >
> > Is that the preferred method?
> >
> >
> > > Le dimanche 21 février 2016 à 15:54 -0800, Mario Henrique a
> > > écrit : 
> > > > How to run a log-log linear regression  in Julia? 
> > > > Like "lm(log (y) ~ log (x)" in R 
> > > AFAIK this is perfectly equivalent to taking the log of both x
> > > and y, 
> > > and applying a linear regression on the resulting variables. So
> > > that 
> > > should be quite straightforward with GLM.jl (see its
> > > documentation). 
> > >
> > >
> > > Regards 
> > -- 
> > You received this message because you are subscribed to a topic in
> > the Google Groups "julia-stats" group.
> > To unsubscribe from this topic, visit https://groups.google.com/d/t
> > opic/julia-stats/wGH77VmDQDc/unsubscribe.
> > To unsubscribe from this group and all its topics, send an email to
> > [hidden email].
> > For more options, visit https://groups.google.com/d/optout.
> >
>
>

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "julia-stats" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/julia-stats/wGH77VmDQDc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--
Mario Henrique

--
You received this message because you are subscribed to a topic in the Google Groups "julia-stats" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/julia-stats/wGH77VmDQDc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--
Mario Henrique

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "julia-stats" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/julia-stats/wGH77VmDQDc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--
Mario Henrique

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Log-log regression

Mario Silveira
Maybe I did not express myself properly. Here an example of what I meant http://www.dummies.com/how-to/content/econometrics-and-the-loglog-model.html

2016-03-01 23:49 GMT-03:00 Mario Silveira <[hidden email]>:
Econometrics eventually "developing independently" of the statistical form. The reason is the type of data we deal. In a basic course in statistics, linear regression takes only a few chapters of the book. In a first course of econometrics, the whole course is about regression models, when the estimators are biased, multicollinearity, heteroscedasticity, static or dynamic interpretation ...
I believe that no other area of study is so much concerned with the accuracy of regression as in econometrics.
Ps: in econometrics, statistical estimation happens in a second stage, first the model must be validated theoretically. Of course, there are several problems, but the log-log regression to elasticity is one of the most basic and reliable tools used every day worldwide.
If you interest in see a litle of how econometric works, an sugestion is Econometrics Analysis by Greene.

2016-03-01 23:26 GMT-03:00 Stefan Karpinski <[hidden email]>:
Cedric's point isn't that you can't have models that appear linear on a log-log plot – power laws are precisely this kind of model. The point is that you should not use linear regression on the log-transformed data to estimate the model parameters. If that's what's done in econometrics books, they may need revision.

On Tue, Mar 1, 2016 at 9:17 PM, Mario Silveira <[hidden email]> wrote:
Cédric,
in economics log-log models are widely used for various reasons, the main one is that it allows you to interpret the model in elasticity term.
Log-log with OLS appears in the majority of econometrics books, has been shown, empirical and mathematically be useful when used with the right premises.
The author must be right that log-log has its problems, but I think it is wrong to generalize it, at least he should present some quotes with proof and evidence, if unlike just merely an allegation without scientific value.

Thanks for atention Cédric.


2016-03-01 22:39 GMT-03:00 Cedric St-Jean <[hidden email]>:
This page seems relevant:

  1. Abusing linear regression makes the baby Gauss cry. Fitting a line to your log-log plot by least squares is a bad idea. It generally doesn't even give you a probability distribution, and even if your data do follow a power-law distribution, it gives you a bad estimate of the parameters. You cannot use the error estimates your regression software gives you, because those formulas incorporate assumptions which directly contradict the idea that you are seeing samples from a power law. And no, you cannot claim that because the line "explains" (really, describes) a lot of the variance that you must have a power law, because you can get a very high R^2 from other distributions (that test has no "power"). And this is without getting into the additional errors caused by trying to fit a line to binned histograms.
    It's true that fitting lines on log-log graphs is what Pareto did back in the day when he started this whole power-law business, but "the day" was the 1890s. There's a time and a place for being old school; this isn't it.
  2. Use maximum likelihood to estimate the scaling exponent. It's fast! The formula is easy! Best of all, it works! The method of maximum likelihood was invented in 1922 [parts 1 and 2], by someone who studied statistical mechanics, no less. The maximum likelihood estimators for the discrete (Zipf/zeta) and continuous (Pareto) power laws were worked out in 1952 and 1957 (respectively). They converge on the correct value of the scaling exponent with probability 1, and they do so efficiently. You can even work out their sampling distribution (it's an inverse gamma) and so get exact confidence intervals. Use the MLEs!
I don't usually work with power laws, so I don't have an opinion on this. But I believe that the issue is that the log on the y-axis distorts the Gaussian error distribution on which the least-square fit is predicated.

HTH,

Cédric

On Monday, February 22, 2016 at 8:59:29 AM UTC-5, Mario Henrique wrote:
Yes Michael. I'm economist and most softwares I know for econometrics have to add log variables firt to do regression. There is no problem in do it for me.
Thanks for help, Julia is great!!


2016-02-22 10:07 GMT-03:00 Michael Krabbe Borregaard <[hidden email]>:
Thanks for the clarification! And great news on the potential for functions to be included in formulas. It is not too annoying to add the variables, but it will help to create a more fluid and natural analytical workflow I feel.

On Mon, Feb 22, 2016 at 1:53 PM, Milan Bouchet-Valat <[hidden email]> wrote:
Le lundi 22 février 2016 à 08:48 -0300, Mario Silveira a écrit :
> Yes, i have to put log in data frame firt, take "lm(log(y) ~ log(x),
> data)" i get a error.
Yes, sorry I wasn't clear (I thought your question was about
statistical theory). Adding transformed variables to the data frame is
the preferred method AFAIK. And it's not too annoying to do either.

That said, transformation inside formulas will likely be supported at
some point. See this old issue :
https://github.com/JuliaStats/DataFrames.jl/issues/19


Regards

> Thanks for help.

>
> 2016-02-22 8:30 GMT-03:00 Michael Borregaard <[hidden email]>
> :
> > The documentation is not very explicit about the preferred way to
> > do something like that. It looks to me as if you have to put the
> > variables in a DataFrame, update the DataFrame with log versions,
> > then do the lm. 
> > using GLM, DataFrames
> > x = collect(1:5) + rand(5)
> > y = collect(1:5) + rand(5)
> > test = DataFrame(x = x, y = y)
> > test[:logx] = log(test[:x])
> > test[:logy] = log(test[:y])
> > lm(logy ~ logx, test)
> >
> > Is that the preferred method?
> >
> >
> > > Le dimanche 21 février 2016 à 15:54 -0800, Mario Henrique a
> > > écrit : 
> > > > How to run a log-log linear regression  in Julia? 
> > > > Like "lm(log (y) ~ log (x)" in R 
> > > AFAIK this is perfectly equivalent to taking the log of both x
> > > and y, 
> > > and applying a linear regression on the resulting variables. So
> > > that 
> > > should be quite straightforward with GLM.jl (see its
> > > documentation). 
> > >
> > >
> > > Regards 
> > -- 
> > You received this message because you are subscribed to a topic in
> > the Google Groups "julia-stats" group.
> > To unsubscribe from this topic, visit https://groups.google.com/d/t
> > opic/julia-stats/wGH77VmDQDc/unsubscribe.
> > To unsubscribe from this group and all its topics, send an email to
> > [hidden email].
> > For more options, visit https://groups.google.com/d/optout.
> >
>
>

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "julia-stats" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/julia-stats/wGH77VmDQDc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--
Mario Henrique

--
You received this message because you are subscribed to a topic in the Google Groups "julia-stats" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/julia-stats/wGH77VmDQDc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--
Mario Henrique

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "julia-stats" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/julia-stats/wGH77VmDQDc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--
Mario Henrique



--
Mario Henrique

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Log-log regression

Stefan Karpinski
I think the key claim here is:

You can estimate this model with OLS by simply using natural log values for the variables instead of their original scale.

There are people far more qualified than I am on this list confirm (or deny) this, but statistical best-practice seem to be that this is not a good way to estimate the model parameters. The attached paper gives a nice overview – I found it very informative in any case. They suggest using equation (5) to estimate the exponent of a power law. The preceding paragraph explains why using OLS on the log-transformed values is a bad idea, giving an example with synthetic data where that method gives a confidence interval that fails to contain the true parameter value. Using equation (5), on the other hand, gives a confidence interval perfectly centered on the true value.

On Tue, Mar 1, 2016 at 9:58 PM, Mario Silveira <[hidden email]> wrote:
Maybe I did not express myself properly. Here an example of what I meant http://www.dummies.com/how-to/content/econometrics-and-the-loglog-model.html

2016-03-01 23:49 GMT-03:00 Mario Silveira <[hidden email]>:
Econometrics eventually "developing independently" of the statistical form. The reason is the type of data we deal. In a basic course in statistics, linear regression takes only a few chapters of the book. In a first course of econometrics, the whole course is about regression models, when the estimators are biased, multicollinearity, heteroscedasticity, static or dynamic interpretation ...
I believe that no other area of study is so much concerned with the accuracy of regression as in econometrics.
Ps: in econometrics, statistical estimation happens in a second stage, first the model must be validated theoretically. Of course, there are several problems, but the log-log regression to elasticity is one of the most basic and reliable tools used every day worldwide.
If you interest in see a litle of how econometric works, an sugestion is Econometrics Analysis by Greene.

2016-03-01 23:26 GMT-03:00 Stefan Karpinski <[hidden email]>:
Cedric's point isn't that you can't have models that appear linear on a log-log plot – power laws are precisely this kind of model. The point is that you should not use linear regression on the log-transformed data to estimate the model parameters. If that's what's done in econometrics books, they may need revision.

On Tue, Mar 1, 2016 at 9:17 PM, Mario Silveira <[hidden email]> wrote:
Cédric,
in economics log-log models are widely used for various reasons, the main one is that it allows you to interpret the model in elasticity term.
Log-log with OLS appears in the majority of econometrics books, has been shown, empirical and mathematically be useful when used with the right premises.
The author must be right that log-log has its problems, but I think it is wrong to generalize it, at least he should present some quotes with proof and evidence, if unlike just merely an allegation without scientific value.

Thanks for atention Cédric.


2016-03-01 22:39 GMT-03:00 Cedric St-Jean <[hidden email]>:
This page seems relevant:

  1. Abusing linear regression makes the baby Gauss cry. Fitting a line to your log-log plot by least squares is a bad idea. It generally doesn't even give you a probability distribution, and even if your data do follow a power-law distribution, it gives you a bad estimate of the parameters. You cannot use the error estimates your regression software gives you, because those formulas incorporate assumptions which directly contradict the idea that you are seeing samples from a power law. And no, you cannot claim that because the line "explains" (really, describes) a lot of the variance that you must have a power law, because you can get a very high R^2 from other distributions (that test has no "power"). And this is without getting into the additional errors caused by trying to fit a line to binned histograms.
    It's true that fitting lines on log-log graphs is what Pareto did back in the day when he started this whole power-law business, but "the day" was the 1890s. There's a time and a place for being old school; this isn't it.
  2. Use maximum likelihood to estimate the scaling exponent. It's fast! The formula is easy! Best of all, it works! The method of maximum likelihood was invented in 1922 [parts 1 and 2], by someone who studied statistical mechanics, no less. The maximum likelihood estimators for the discrete (Zipf/zeta) and continuous (Pareto) power laws were worked out in 1952 and 1957 (respectively). They converge on the correct value of the scaling exponent with probability 1, and they do so efficiently. You can even work out their sampling distribution (it's an inverse gamma) and so get exact confidence intervals. Use the MLEs!
I don't usually work with power laws, so I don't have an opinion on this. But I believe that the issue is that the log on the y-axis distorts the Gaussian error distribution on which the least-square fit is predicated.

HTH,

Cédric

On Monday, February 22, 2016 at 8:59:29 AM UTC-5, Mario Henrique wrote:
Yes Michael. I'm economist and most softwares I know for econometrics have to add log variables firt to do regression. There is no problem in do it for me.
Thanks for help, Julia is great!!


2016-02-22 10:07 GMT-03:00 Michael Krabbe Borregaard <[hidden email]>:
Thanks for the clarification! And great news on the potential for functions to be included in formulas. It is not too annoying to add the variables, but it will help to create a more fluid and natural analytical workflow I feel.

On Mon, Feb 22, 2016 at 1:53 PM, Milan Bouchet-Valat <[hidden email]> wrote:
Le lundi 22 février 2016 à 08:48 -0300, Mario Silveira a écrit :
> Yes, i have to put log in data frame firt, take "lm(log(y) ~ log(x),
> data)" i get a error.
Yes, sorry I wasn't clear (I thought your question was about
statistical theory). Adding transformed variables to the data frame is
the preferred method AFAIK. And it's not too annoying to do either.

That said, transformation inside formulas will likely be supported at
some point. See this old issue :
https://github.com/JuliaStats/DataFrames.jl/issues/19


Regards

> Thanks for help.

>
> 2016-02-22 8:30 GMT-03:00 Michael Borregaard <[hidden email]>
> :
> > The documentation is not very explicit about the preferred way to
> > do something like that. It looks to me as if you have to put the
> > variables in a DataFrame, update the DataFrame with log versions,
> > then do the lm. 
> > using GLM, DataFrames
> > x = collect(1:5) + rand(5)
> > y = collect(1:5) + rand(5)
> > test = DataFrame(x = x, y = y)
> > test[:logx] = log(test[:x])
> > test[:logy] = log(test[:y])
> > lm(logy ~ logx, test)
> >
> > Is that the preferred method?
> >
> >
> > > Le dimanche 21 février 2016 à 15:54 -0800, Mario Henrique a
> > > écrit : 
> > > > How to run a log-log linear regression  in Julia? 
> > > > Like "lm(log (y) ~ log (x)" in R 
> > > AFAIK this is perfectly equivalent to taking the log of both x
> > > and y, 
> > > and applying a linear regression on the resulting variables. So
> > > that 
> > > should be quite straightforward with GLM.jl (see its
> > > documentation). 
> > >
> > >
> > > Regards 
> > -- 
> > You received this message because you are subscribed to a topic in
> > the Google Groups "julia-stats" group.
> > To unsubscribe from this topic, visit https://groups.google.com/d/t
> > opic/julia-stats/wGH77VmDQDc/unsubscribe.
> > To unsubscribe from this group and all its topics, send an email to
> > [hidden email].
> > For more options, visit https://groups.google.com/d/optout.
> >
>
>

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "julia-stats" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/julia-stats/wGH77VmDQDc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--
Mario Henrique

--
You received this message because you are subscribed to a topic in the Google Groups "julia-stats" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/julia-stats/wGH77VmDQDc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--
Mario Henrique

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "julia-stats" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/julia-stats/wGH77VmDQDc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--
Mario Henrique



--
Mario Henrique

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

0412004v3.pdf (2M) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Log-log regression

Mario Silveira
Thanks Stefan for the article.
I'll
study it calmly and see how I can apply it. It is always good to have feedback from people from other fields.

2016-03-02 1:20 GMT-03:00 Stefan Karpinski <[hidden email]>:
I think the key claim here is:

You can estimate this model with OLS by simply using natural log values for the variables instead of their original scale.

There are people far more qualified than I am on this list confirm (or deny) this, but statistical best-practice seem to be that this is not a good way to estimate the model parameters. The attached paper gives a nice overview – I found it very informative in any case. They suggest using equation (5) to estimate the exponent of a power law. The preceding paragraph explains why using OLS on the log-transformed values is a bad idea, giving an example with synthetic data where that method gives a confidence interval that fails to contain the true parameter value. Using equation (5), on the other hand, gives a confidence interval perfectly centered on the true value.

On Tue, Mar 1, 2016 at 9:58 PM, Mario Silveira <[hidden email]> wrote:
Maybe I did not express myself properly. Here an example of what I meant http://www.dummies.com/how-to/content/econometrics-and-the-loglog-model.html

2016-03-01 23:49 GMT-03:00 Mario Silveira <[hidden email]>:
Econometrics eventually "developing independently" of the statistical form. The reason is the type of data we deal. In a basic course in statistics, linear regression takes only a few chapters of the book. In a first course of econometrics, the whole course is about regression models, when the estimators are biased, multicollinearity, heteroscedasticity, static or dynamic interpretation ...
I believe that no other area of study is so much concerned with the accuracy of regression as in econometrics.
Ps: in econometrics, statistical estimation happens in a second stage, first the model must be validated theoretically. Of course, there are several problems, but the log-log regression to elasticity is one of the most basic and reliable tools used every day worldwide.
If you interest in see a litle of how econometric works, an sugestion is Econometrics Analysis by Greene.

2016-03-01 23:26 GMT-03:00 Stefan Karpinski <[hidden email]>:
Cedric's point isn't that you can't have models that appear linear on a log-log plot – power laws are precisely this kind of model. The point is that you should not use linear regression on the log-transformed data to estimate the model parameters. If that's what's done in econometrics books, they may need revision.

On Tue, Mar 1, 2016 at 9:17 PM, Mario Silveira <[hidden email]> wrote:
Cédric,
in economics log-log models are widely used for various reasons, the main one is that it allows you to interpret the model in elasticity term.
Log-log with OLS appears in the majority of econometrics books, has been shown, empirical and mathematically be useful when used with the right premises.
The author must be right that log-log has its problems, but I think it is wrong to generalize it, at least he should present some quotes with proof and evidence, if unlike just merely an allegation without scientific value.

Thanks for atention Cédric.


2016-03-01 22:39 GMT-03:00 Cedric St-Jean <[hidden email]>:
This page seems relevant:

  1. Abusing linear regression makes the baby Gauss cry. Fitting a line to your log-log plot by least squares is a bad idea. It generally doesn't even give you a probability distribution, and even if your data do follow a power-law distribution, it gives you a bad estimate of the parameters. You cannot use the error estimates your regression software gives you, because those formulas incorporate assumptions which directly contradict the idea that you are seeing samples from a power law. And no, you cannot claim that because the line "explains" (really, describes) a lot of the variance that you must have a power law, because you can get a very high R^2 from other distributions (that test has no "power"). And this is without getting into the additional errors caused by trying to fit a line to binned histograms.
    It's true that fitting lines on log-log graphs is what Pareto did back in the day when he started this whole power-law business, but "the day" was the 1890s. There's a time and a place for being old school; this isn't it.
  2. Use maximum likelihood to estimate the scaling exponent. It's fast! The formula is easy! Best of all, it works! The method of maximum likelihood was invented in 1922 [parts 1 and 2], by someone who studied statistical mechanics, no less. The maximum likelihood estimators for the discrete (Zipf/zeta) and continuous (Pareto) power laws were worked out in 1952 and 1957 (respectively). They converge on the correct value of the scaling exponent with probability 1, and they do so efficiently. You can even work out their sampling distribution (it's an inverse gamma) and so get exact confidence intervals. Use the MLEs!
I don't usually work with power laws, so I don't have an opinion on this. But I believe that the issue is that the log on the y-axis distorts the Gaussian error distribution on which the least-square fit is predicated.

HTH,

Cédric

On Monday, February 22, 2016 at 8:59:29 AM UTC-5, Mario Henrique wrote:
Yes Michael. I'm economist and most softwares I know for econometrics have to add log variables firt to do regression. There is no problem in do it for me.
Thanks for help, Julia is great!!


2016-02-22 10:07 GMT-03:00 Michael Krabbe Borregaard <[hidden email]>:
Thanks for the clarification! And great news on the potential for functions to be included in formulas. It is not too annoying to add the variables, but it will help to create a more fluid and natural analytical workflow I feel.

On Mon, Feb 22, 2016 at 1:53 PM, Milan Bouchet-Valat <[hidden email]> wrote:
Le lundi 22 février 2016 à 08:48 -0300, Mario Silveira a écrit :
> Yes, i have to put log in data frame firt, take "lm(log(y) ~ log(x),
> data)" i get a error.
Yes, sorry I wasn't clear (I thought your question was about
statistical theory). Adding transformed variables to the data frame is
the preferred method AFAIK. And it's not too annoying to do either.

That said, transformation inside formulas will likely be supported at
some point. See this old issue :
https://github.com/JuliaStats/DataFrames.jl/issues/19


Regards

> Thanks for help.

>
> 2016-02-22 8:30 GMT-03:00 Michael Borregaard <[hidden email]>
> :
> > The documentation is not very explicit about the preferred way to
> > do something like that. It looks to me as if you have to put the
> > variables in a DataFrame, update the DataFrame with log versions,
> > then do the lm. 
> > using GLM, DataFrames
> > x = collect(1:5) + rand(5)
> > y = collect(1:5) + rand(5)
> > test = DataFrame(x = x, y = y)
> > test[:logx] = log(test[:x])
> > test[:logy] = log(test[:y])
> > lm(logy ~ logx, test)
> >
> > Is that the preferred method?
> >
> >
> > > Le dimanche 21 février 2016 à 15:54 -0800, Mario Henrique a
> > > écrit : 
> > > > How to run a log-log linear regression  in Julia? 
> > > > Like "lm(log (y) ~ log (x)" in R 
> > > AFAIK this is perfectly equivalent to taking the log of both x
> > > and y, 
> > > and applying a linear regression on the resulting variables. So
> > > that 
> > > should be quite straightforward with GLM.jl (see its
> > > documentation). 
> > >
> > >
> > > Regards 
> > -- 
> > You received this message because you are subscribed to a topic in
> > the Google Groups "julia-stats" group.
> > To unsubscribe from this topic, visit https://groups.google.com/d/t
> > opic/julia-stats/wGH77VmDQDc/unsubscribe.
> > To unsubscribe from this group and all its topics, send an email to
> > [hidden email].
> > For more options, visit https://groups.google.com/d/optout.
> >
>
>

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "julia-stats" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/julia-stats/wGH77VmDQDc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--
Mario Henrique

--
You received this message because you are subscribed to a topic in the Google Groups "julia-stats" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/julia-stats/wGH77VmDQDc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--
Mario Henrique

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "julia-stats" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/julia-stats/wGH77VmDQDc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--
Mario Henrique



--
Mario Henrique

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "julia-stats" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/julia-stats/wGH77VmDQDc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--
Mario Henrique

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Log-log regression

Andreas Noack
In reply to this post by Mario Silveira
There is some confusion here. The context for the blog post is not really explained. The kind of log-log plots that the blog is talking about are very different from the log-log regressions made in econometrics.

In the blog post, the author considers plots of P(X>x) against x with log scales on both axes, i.e. a single variable. Power laws would give a straight downward-sloping line. By construction, the dots will cross the y-axis at one regardless of the distribution that has created the data, but a regression line fitted to the data might not do that which is why the author of the blog post writes that "It [the least squares fit of the log transformed data] generally doesn't even give you a probability distribution".

The model often used in econometrics, and probably many other places, relates **two** or more log transformed variables. As Mario points out, the regression coefficients are then interpreted as elasticities. There is probably also bad things to say about this, but that would be something different than what the blog post is criticizing.

Economists are usually not that interested in power laws. The main exception is that the Pareto distribution is often used for modellng the upper tail in the income distribution.

By the way, the blog seems quite good if you are interested in statistics. I didn't know about it so thanks for the link.

On Tue, Mar 1, 2016 at 9:58 PM, Mario Silveira <[hidden email]> wrote:
Maybe I did not express myself properly. Here an example of what I meant http://www.dummies.com/how-to/content/econometrics-and-the-loglog-model.html

2016-03-01 23:49 GMT-03:00 Mario Silveira <[hidden email]>:
Econometrics eventually "developing independently" of the statistical form. The reason is the type of data we deal. In a basic course in statistics, linear regression takes only a few chapters of the book. In a first course of econometrics, the whole course is about regression models, when the estimators are biased, multicollinearity, heteroscedasticity, static or dynamic interpretation ...
I believe that no other area of study is so much concerned with the accuracy of regression as in econometrics.
Ps: in econometrics, statistical estimation happens in a second stage, first the model must be validated theoretically. Of course, there are several problems, but the log-log regression to elasticity is one of the most basic and reliable tools used every day worldwide.
If you interest in see a litle of how econometric works, an sugestion is Econometrics Analysis by Greene.

2016-03-01 23:26 GMT-03:00 Stefan Karpinski <[hidden email]>:
Cedric's point isn't that you can't have models that appear linear on a log-log plot – power laws are precisely this kind of model. The point is that you should not use linear regression on the log-transformed data to estimate the model parameters. If that's what's done in econometrics books, they may need revision.

On Tue, Mar 1, 2016 at 9:17 PM, Mario Silveira <[hidden email]> wrote:
Cédric,
in economics log-log models are widely used for various reasons, the main one is that it allows you to interpret the model in elasticity term.
Log-log with OLS appears in the majority of econometrics books, has been shown, empirical and mathematically be useful when used with the right premises.
The author must be right that log-log has its problems, but I think it is wrong to generalize it, at least he should present some quotes with proof and evidence, if unlike just merely an allegation without scientific value.

Thanks for atention Cédric.


2016-03-01 22:39 GMT-03:00 Cedric St-Jean <[hidden email]>:
This page seems relevant:

  1. Abusing linear regression makes the baby Gauss cry. Fitting a line to your log-log plot by least squares is a bad idea. It generally doesn't even give you a probability distribution, and even if your data do follow a power-law distribution, it gives you a bad estimate of the parameters. You cannot use the error estimates your regression software gives you, because those formulas incorporate assumptions which directly contradict the idea that you are seeing samples from a power law. And no, you cannot claim that because the line "explains" (really, describes) a lot of the variance that you must have a power law, because you can get a very high R^2 from other distributions (that test has no "power"). And this is without getting into the additional errors caused by trying to fit a line to binned histograms.
    It's true that fitting lines on log-log graphs is what Pareto did back in the day when he started this whole power-law business, but "the day" was the 1890s. There's a time and a place for being old school; this isn't it.
  2. Use maximum likelihood to estimate the scaling exponent. It's fast! The formula is easy! Best of all, it works! The method of maximum likelihood was invented in 1922 [parts 1 and 2], by someone who studied statistical mechanics, no less. The maximum likelihood estimators for the discrete (Zipf/zeta) and continuous (Pareto) power laws were worked out in 1952 and 1957 (respectively). They converge on the correct value of the scaling exponent with probability 1, and they do so efficiently. You can even work out their sampling distribution (it's an inverse gamma) and so get exact confidence intervals. Use the MLEs!
I don't usually work with power laws, so I don't have an opinion on this. But I believe that the issue is that the log on the y-axis distorts the Gaussian error distribution on which the least-square fit is predicated.

HTH,

Cédric

On Monday, February 22, 2016 at 8:59:29 AM UTC-5, Mario Henrique wrote:
Yes Michael. I'm economist and most softwares I know for econometrics have to add log variables firt to do regression. There is no problem in do it for me.
Thanks for help, Julia is great!!


2016-02-22 10:07 GMT-03:00 Michael Krabbe Borregaard <[hidden email]>:
Thanks for the clarification! And great news on the potential for functions to be included in formulas. It is not too annoying to add the variables, but it will help to create a more fluid and natural analytical workflow I feel.

On Mon, Feb 22, 2016 at 1:53 PM, Milan Bouchet-Valat <[hidden email]> wrote:
Le lundi 22 février 2016 à 08:48 -0300, Mario Silveira a écrit :
> Yes, i have to put log in data frame firt, take "lm(log(y) ~ log(x),
> data)" i get a error.
Yes, sorry I wasn't clear (I thought your question was about
statistical theory). Adding transformed variables to the data frame is
the preferred method AFAIK. And it's not too annoying to do either.

That said, transformation inside formulas will likely be supported at
some point. See this old issue :
https://github.com/JuliaStats/DataFrames.jl/issues/19


Regards

> Thanks for help.

>
> <a href="tel:2016-02-22" value="+4520160222" target="_blank">2016-02-22 8:30 GMT-03:00 Michael Borregaard <[hidden email]>
> :
> > The documentation is not very explicit about the preferred way to
> > do something like that. It looks to me as if you have to put the
> > variables in a DataFrame, update the DataFrame with log versions,
> > then do the lm. 
> > using GLM, DataFrames
> > x = collect(1:5) + rand(5)
> > y = collect(1:5) + rand(5)
> > test = DataFrame(x = x, y = y)
> > test[:logx] = log(test[:x])
> > test[:logy] = log(test[:y])
> > lm(logy ~ logx, test)
> >
> > Is that the preferred method?
> >
> >
> > > Le dimanche 21 février 2016 à 15:54 -0800, Mario Henrique a
> > > écrit : 
> > > > How to run a log-log linear regression  in Julia? 
> > > > Like "lm(log (y) ~ log (x)" in R 
> > > AFAIK this is perfectly equivalent to taking the log of both x
> > > and y, 
> > > and applying a linear regression on the resulting variables. So
> > > that 
> > > should be quite straightforward with GLM.jl (see its
> > > documentation). 
> > >
> > >
> > > Regards 
> > -- 
> > You received this message because you are subscribed to a topic in
> > the Google Groups "julia-stats" group.
> > To unsubscribe from this topic, visit https://groups.google.com/d/t
> > opic/julia-stats/wGH77VmDQDc/unsubscribe.
> > To unsubscribe from this group and all its topics, send an email to
> > [hidden email].
> > For more options, visit https://groups.google.com/d/optout.
> >
>
>

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "julia-stats" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/julia-stats/wGH77VmDQDc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--
Mario Henrique

--
You received this message because you are subscribed to a topic in the Google Groups "julia-stats" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/julia-stats/wGH77VmDQDc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--
Mario Henrique

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "julia-stats" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/julia-stats/wGH77VmDQDc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--
Mario Henrique



--
Mario Henrique

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Log-log regression

Mario Silveira
You got the point Andreas. Thank you!
This group is wonderful.

2016-03-02 1:35 GMT-03:00 Andreas Noack <[hidden email]>:
There is some confusion here. The context for the blog post is not really explained. The kind of log-log plots that the blog is talking about are very different from the log-log regressions made in econometrics.

In the blog post, the author considers plots of P(X>x) against x with log scales on both axes, i.e. a single variable. Power laws would give a straight downward-sloping line. By construction, the dots will cross the y-axis at one regardless of the distribution that has created the data, but a regression line fitted to the data might not do that which is why the author of the blog post writes that "It [the least squares fit of the log transformed data] generally doesn't even give you a probability distribution".

The model often used in econometrics, and probably many other places, relates **two** or more log transformed variables. As Mario points out, the regression coefficients are then interpreted as elasticities. There is probably also bad things to say about this, but that would be something different than what the blog post is criticizing.

Economists are usually not that interested in power laws. The main exception is that the Pareto distribution is often used for modellng the upper tail in the income distribution.

By the way, the blog seems quite good if you are interested in statistics. I didn't know about it so thanks for the link.

On Tue, Mar 1, 2016 at 9:58 PM, Mario Silveira <[hidden email]> wrote:
Maybe I did not express myself properly. Here an example of what I meant http://www.dummies.com/how-to/content/econometrics-and-the-loglog-model.html

2016-03-01 23:49 GMT-03:00 Mario Silveira <[hidden email]>:
Econometrics eventually "developing independently" of the statistical form. The reason is the type of data we deal. In a basic course in statistics, linear regression takes only a few chapters of the book. In a first course of econometrics, the whole course is about regression models, when the estimators are biased, multicollinearity, heteroscedasticity, static or dynamic interpretation ...
I believe that no other area of study is so much concerned with the accuracy of regression as in econometrics.
Ps: in econometrics, statistical estimation happens in a second stage, first the model must be validated theoretically. Of course, there are several problems, but the log-log regression to elasticity is one of the most basic and reliable tools used every day worldwide.
If you interest in see a litle of how econometric works, an sugestion is Econometrics Analysis by Greene.

2016-03-01 23:26 GMT-03:00 Stefan Karpinski <[hidden email]>:
Cedric's point isn't that you can't have models that appear linear on a log-log plot – power laws are precisely this kind of model. The point is that you should not use linear regression on the log-transformed data to estimate the model parameters. If that's what's done in econometrics books, they may need revision.

On Tue, Mar 1, 2016 at 9:17 PM, Mario Silveira <[hidden email]> wrote:
Cédric,
in economics log-log models are widely used for various reasons, the main one is that it allows you to interpret the model in elasticity term.
Log-log with OLS appears in the majority of econometrics books, has been shown, empirical and mathematically be useful when used with the right premises.
The author must be right that log-log has its problems, but I think it is wrong to generalize it, at least he should present some quotes with proof and evidence, if unlike just merely an allegation without scientific value.

Thanks for atention Cédric.


2016-03-01 22:39 GMT-03:00 Cedric St-Jean <[hidden email]>:
This page seems relevant:

  1. Abusing linear regression makes the baby Gauss cry. Fitting a line to your log-log plot by least squares is a bad idea. It generally doesn't even give you a probability distribution, and even if your data do follow a power-law distribution, it gives you a bad estimate of the parameters. You cannot use the error estimates your regression software gives you, because those formulas incorporate assumptions which directly contradict the idea that you are seeing samples from a power law. And no, you cannot claim that because the line "explains" (really, describes) a lot of the variance that you must have a power law, because you can get a very high R^2 from other distributions (that test has no "power"). And this is without getting into the additional errors caused by trying to fit a line to binned histograms.
    It's true that fitting lines on log-log graphs is what Pareto did back in the day when he started this whole power-law business, but "the day" was the 1890s. There's a time and a place for being old school; this isn't it.
  2. Use maximum likelihood to estimate the scaling exponent. It's fast! The formula is easy! Best of all, it works! The method of maximum likelihood was invented in 1922 [parts 1 and 2], by someone who studied statistical mechanics, no less. The maximum likelihood estimators for the discrete (Zipf/zeta) and continuous (Pareto) power laws were worked out in 1952 and 1957 (respectively). They converge on the correct value of the scaling exponent with probability 1, and they do so efficiently. You can even work out their sampling distribution (it's an inverse gamma) and so get exact confidence intervals. Use the MLEs!
I don't usually work with power laws, so I don't have an opinion on this. But I believe that the issue is that the log on the y-axis distorts the Gaussian error distribution on which the least-square fit is predicated.

HTH,

Cédric

On Monday, February 22, 2016 at 8:59:29 AM UTC-5, Mario Henrique wrote:
Yes Michael. I'm economist and most softwares I know for econometrics have to add log variables firt to do regression. There is no problem in do it for me.
Thanks for help, Julia is great!!


2016-02-22 10:07 GMT-03:00 Michael Krabbe Borregaard <[hidden email]>:
Thanks for the clarification! And great news on the potential for functions to be included in formulas. It is not too annoying to add the variables, but it will help to create a more fluid and natural analytical workflow I feel.

On Mon, Feb 22, 2016 at 1:53 PM, Milan Bouchet-Valat <[hidden email]> wrote:
Le lundi 22 février 2016 à 08:48 -0300, Mario Silveira a écrit :
> Yes, i have to put log in data frame firt, take "lm(log(y) ~ log(x),
> data)" i get a error.
Yes, sorry I wasn't clear (I thought your question was about
statistical theory). Adding transformed variables to the data frame is
the preferred method AFAIK. And it's not too annoying to do either.

That said, transformation inside formulas will likely be supported at
some point. See this old issue :
https://github.com/JuliaStats/DataFrames.jl/issues/19


Regards

> Thanks for help.

>
> <a href="tel:2016-02-22" value="+4520160222" target="_blank">2016-02-22 8:30 GMT-03:00 Michael Borregaard <[hidden email]>
> :
> > The documentation is not very explicit about the preferred way to
> > do something like that. It looks to me as if you have to put the
> > variables in a DataFrame, update the DataFrame with log versions,
> > then do the lm. 
> > using GLM, DataFrames
> > x = collect(1:5) + rand(5)
> > y = collect(1:5) + rand(5)
> > test = DataFrame(x = x, y = y)
> > test[:logx] = log(test[:x])
> > test[:logy] = log(test[:y])
> > lm(logy ~ logx, test)
> >
> > Is that the preferred method?
> >
> >
> > > Le dimanche 21 février 2016 à 15:54 -0800, Mario Henrique a
> > > écrit : 
> > > > How to run a log-log linear regression  in Julia? 
> > > > Like "lm(log (y) ~ log (x)" in R 
> > > AFAIK this is perfectly equivalent to taking the log of both x
> > > and y, 
> > > and applying a linear regression on the resulting variables. So
> > > that 
> > > should be quite straightforward with GLM.jl (see its
> > > documentation). 
> > >
> > >
> > > Regards 
> > -- 
> > You received this message because you are subscribed to a topic in
> > the Google Groups "julia-stats" group.
> > To unsubscribe from this topic, visit https://groups.google.com/d/t
> > opic/julia-stats/wGH77VmDQDc/unsubscribe.
> > To unsubscribe from this group and all its topics, send an email to
> > [hidden email].
> > For more options, visit https://groups.google.com/d/optout.
> >
>
>

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "julia-stats" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/julia-stats/wGH77VmDQDc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--
Mario Henrique

--
You received this message because you are subscribed to a topic in the Google Groups "julia-stats" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/julia-stats/wGH77VmDQDc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--
Mario Henrique

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "julia-stats" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/julia-stats/wGH77VmDQDc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--
Mario Henrique



--
Mario Henrique

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "julia-stats" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/julia-stats/wGH77VmDQDc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--
Mario Henrique

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Log-log regression

Cedric St-Jean-2
Thanks Andreas, you're right, I confused "power law" and "power law distribution".

On Tuesday, March 1, 2016 at 11:41:09 PM UTC-5, Mario Henrique wrote:
You got the point Andreas. Thank you!
This group is wonderful.

2016-03-02 1:35 GMT-03:00 Andreas Noack <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="NWhTYTouAwAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">andreasno...@...>:
There is some confusion here. The context for the blog post is not really explained. The kind of log-log plots that the blog is talking about are very different from the log-log regressions made in econometrics.

In the blog post, the author considers plots of P(X>x) against x with log scales on both axes, i.e. a single variable. Power laws would give a straight downward-sloping line. By construction, the dots will cross the y-axis at one regardless of the distribution that has created the data, but a regression line fitted to the data might not do that which is why the author of the blog post writes that "It [the least squares fit of the log transformed data] generally doesn't even give you a probability distribution".

The model often used in econometrics, and probably many other places, relates **two** or more log transformed variables. As Mario points out, the regression coefficients are then interpreted as elasticities. There is probably also bad things to say about this, but that would be something different than what the blog post is criticizing.

Economists are usually not that interested in power laws. The main exception is that the Pareto distribution is often used for modellng the upper tail in the income distribution.

By the way, the blog seems quite good if you are interested in statistics. I didn't know about it so thanks for the link.

On Tue, Mar 1, 2016 at 9:58 PM, Mario Silveira <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="NWhTYTouAwAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">mariofer...@...> wrote:
Maybe I did not express myself properly. Here an example of what I meant <a href="http://www.dummies.com/how-to/content/econometrics-and-the-loglog-model.html" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\75http%3A%2F%2Fwww.dummies.com%2Fhow-to%2Fcontent%2Feconometrics-and-the-loglog-model.html\46sa\75D\46sntz\0751\46usg\75AFQjCNForVe4JuJRU_JbEh7eSVwuSliUjg&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\75http%3A%2F%2Fwww.dummies.com%2Fhow-to%2Fcontent%2Feconometrics-and-the-loglog-model.html\46sa\75D\46sntz\0751\46usg\75AFQjCNForVe4JuJRU_JbEh7eSVwuSliUjg&#39;;return true;">http://www.dummies.com/how-to/content/econometrics-and-the-loglog-model.html

2016-03-01 23:49 GMT-03:00 Mario Silveira <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="NWhTYTouAwAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">mariofer...@...>:
Econometrics eventually "developing independently" of the statistical form. The reason is the type of data we deal. In a basic course in statistics, linear regression takes only a few chapters of the book. In a first course of econometrics, the whole course is about regression models, when the estimators are biased, multicollinearity, heteroscedasticity, static or dynamic interpretation ...
I believe that no other area of study is so much concerned with the accuracy of regression as in econometrics.
Ps: in econometrics, statistical estimation happens in a second stage, first the model must be validated theoretically. Of course, there are several problems, but the log-log regression to elasticity is one of the most basic and reliable tools used every day worldwide.
If you interest in see a litle of how econometric works, an sugestion is Econometrics Analysis by Greene.

2016-03-01 23:26 GMT-03:00 Stefan Karpinski <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="NWhTYTouAwAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">ste...@...>:
Cedric's point isn't that you can't have models that appear linear on a log-log plot – power laws are precisely this kind of model. The point is that you should not use linear regression on the log-transformed data to estimate the model parameters. If that's what's done in econometrics books, they may need revision.

On Tue, Mar 1, 2016 at 9:17 PM, Mario Silveira <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="NWhTYTouAwAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">mariofer...@...> wrote:
Cédric,
in economics log-log models are widely used for various reasons, the main one is that it allows you to interpret the model in elasticity term.
Log-log with OLS appears in the majority of econometrics books, has been shown, empirical and mathematically be useful when used with the right premises.
The author must be right that log-log has its problems, but I think it is wrong to generalize it, at least he should present some quotes with proof and evidence, if unlike just merely an allegation without scientific value.

Thanks for atention Cédric.


2016-03-01 22:39 GMT-03:00 Cedric St-Jean <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="NWhTYTouAwAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">cedric...@...>:
T<a href="http://bactra.org/weblog/491.html" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\75http%3A%2F%2Fbactra.org%2Fweblog%2F491.html\46sa\75D\46sntz\0751\46usg\75AFQjCNF0ekT66T46AQTmGbNz4fVejATWpQ&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\75http%3A%2F%2Fbactra.org%2Fweblog%2F491.html\46sa\75D\46sntz\0751\46usg\75AFQjCNF0ekT66T46AQTmGbNz4fVejATWpQ&#39;;return true;">his page seems relevant:

  1. Abusing linear regression makes the baby <a href="http://en.wikipedia.org/wiki/Method_of_least_squares#History" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\75http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FMethod_of_least_squares%23History\46sa\75D\46sntz\0751\46usg\75AFQjCNE-oY_vrzTUdHeu7dPNFOklN6IWJA&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\75http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FMethod_of_least_squares%23History\46sa\75D\46sntz\0751\46usg\75AFQjCNE-oY_vrzTUdHeu7dPNFOklN6IWJA&#39;;return true;">Gauss cry. Fitting a line to your log-log plot by least squares is a bad idea. It generally doesn't even give you a probability distribution, and even if your data do follow a power-law distribution, it gives you a bad estimate of the parameters. You cannot use the error estimates your regression software gives you, because those formulas incorporate assumptions which directly contradict the idea that you are seeing samples from a power law. And no, you cannot claim that because the line "explains" (really, describes) a lot of the variance that you must have a power law, because you can get a very high R^2 from other distributions (that test has no "power"). And this is without getting into the additional errors caused by trying to fit a line to binned histograms.
    It's true that fitting lines on log-log graphs is what Pareto did back in the day when he started this whole power-law business, but "the day" was the 1890s. There's a time and a place for being old school; this isn't it.
  2. Use maximum likelihood to estimate the scaling exponent. It's fast! The formula is easy! Best of all, it works! The method of maximum likelihood was invented in 1922 [parts <a href="http://digital.library.adelaide.edu.au/coll/special//fisher/18pt1.pdf" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\75http%3A%2F%2Fdigital.library.adelaide.edu.au%2Fcoll%2Fspecial%2F%2Ffisher%2F18pt1.pdf\46sa\75D\46sntz\0751\46usg\75AFQjCNEevRusM04ViCsqmYMtbcS3SMAKLw&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\75http%3A%2F%2Fdigital.library.adelaide.edu.au%2Fcoll%2Fspecial%2F%2Ffisher%2F18pt1.pdf\46sa\75D\46sntz\0751\46usg\75AFQjCNEevRusM04ViCsqmYMtbcS3SMAKLw&#39;;return true;">1 and <a href="http://digital.library.adelaide.edu.au/coll/special//fisher/18pt2.pdf" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\75http%3A%2F%2Fdigital.library.adelaide.edu.au%2Fcoll%2Fspecial%2F%2Ffisher%2F18pt2.pdf\46sa\75D\46sntz\0751\46usg\75AFQjCNH_g811kYSCSHHZtvWm8yEdl05vUw&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\75http%3A%2F%2Fdigital.library.adelaide.edu.au%2Fcoll%2Fspecial%2F%2Ffisher%2F18pt2.pdf\46sa\75D\46sntz\0751\46usg\75AFQjCNH_g811kYSCSHHZtvWm8yEdl05vUw&#39;;return true;">2], by <a href="http://digital.library.adelaide.edu.au/coll/special//fisher/" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\75http%3A%2F%2Fdigital.library.adelaide.edu.au%2Fcoll%2Fspecial%2F%2Ffisher%2F\46sa\75D\46sntz\0751\46usg\75AFQjCNET-PVqMs9Xly2rVbXzjkDtf6zktg&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\75http%3A%2F%2Fdigital.library.adelaide.edu.au%2Fcoll%2Fspecial%2F%2Ffisher%2F\46sa\75D\46sntz\0751\46usg\75AFQjCNET-PVqMs9Xly2rVbXzjkDtf6zktg&#39;;return true;">someone who studied statistical mechanics, no less. The maximum likelihood estimators for the discrete (Zipf/zeta) and continuous (Pareto) power laws were worked out in <a href="http://www.actuaries.org.uk/files/pdf/library/JIA-078/0115-0121.pdf" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\75http%3A%2F%2Fwww.actuaries.org.uk%2Ffiles%2Fpdf%2Flibrary%2FJIA-078%2F0115-0121.pdf\46sa\75D\46sntz\0751\46usg\75AFQjCNECsGa_j8sVlg7wehfp7orkFVZOWQ&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\75http%3A%2F%2Fwww.actuaries.org.uk%2Ffiles%2Fpdf%2Flibrary%2FJIA-078%2F0115-0121.pdf\46sa\75D\46sntz\0751\46usg\75AFQjCNECsGa_j8sVlg7wehfp7orkFVZOWQ&#39;;return true;">1952 and 1957 (respectively). They converge on the correct value of the scaling exponent with probability 1, and they do so efficiently. You can even work out their sampling distribution (it's an <a href="http://en.wikipedia.org/wiki/Inverse_gamma_distribution" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\75http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FInverse_gamma_distribution\46sa\75D\46sntz\0751\46usg\75AFQjCNEa7knJlzCLIHXTw5f6_fwdtQrftg&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\75http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FInverse_gamma_distribution\46sa\75D\46sntz\0751\46usg\75AFQjCNEa7knJlzCLIHXTw5f6_fwdtQrftg&#39;;return true;">inverse gamma) and so get exact confidence intervals. Use the MLEs!
I don't usually work with power laws, so I don't have an opinion on this. But I believe that the issue is that the log on the y-axis distorts the Gaussian error distribution on which the least-square fit is predicated.

HTH,

Cédric

On Monday, February 22, 2016 at 8:59:29 AM UTC-5, Mario Henrique wrote:
Yes Michael. I'm economist and most softwares I know for econometrics have to add log variables firt to do regression. There is no problem in do it for me.
Thanks for help, Julia is great!!


2016-02-22 10:07 GMT-03:00 Michael Krabbe Borregaard <[hidden email]>:
Thanks for the clarification! And great news on the potential for functions to be included in formulas. It is not too annoying to add the variables, but it will help to create a more fluid and natural analytical workflow I feel.

On Mon, Feb 22, 2016 at 1:53 PM, Milan Bouchet-Valat <[hidden email]> wrote:
Le lundi 22 février 2016 à 08:48 -0300, Mario Silveira a écrit :
> Yes, i have to put log in data frame firt, take "lm(log(y) ~ log(x),
> data)" i get a error.
Yes, sorry I wasn't clear (I thought your question was about
statistical theory). Adding transformed variables to the data frame is
the preferred method AFAIK. And it's not too annoying to do either.

That said, transformation inside formulas will likely be supported at
some point. See this old issue :
<a href="https://github.com/JuliaStats/DataFrames.jl/issues/19" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2FJuliaStats%2FDataFrames.jl%2Fissues%2F19\46sa\75D\46sntz\0751\46usg\75AFQjCNHZaASVgW4NiG2knUDmHJGil9KmWg&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2FJuliaStats%2FDataFrames.jl%2Fissues%2F19\46sa\75D\46sntz\0751\46usg\75AFQjCNHZaASVgW4NiG2knUDmHJGil9KmWg&#39;;return true;">https://github.com/JuliaStats/DataFrames.jl/issues/19


Regards

> Thanks for help.

>
> 2016-02-22 8:30 GMT-03:00 Michael Borregaard <[hidden email]>
> :
> > The documentation is not very explicit about the preferred way to
> > do something like that. It looks to me as if you have to put the
> > variables in a DataFrame, update the DataFrame with log versions,
> > then do the lm. 
> > using GLM, DataFrames
> > x = collect(1:5) + rand(5)
> > y = collect(1:5) + rand(5)
> > test = DataFrame(x = x, y = y)
> > test[:logx] = log(test[:x])
> > test[:logy] = log(test[:y])
> > lm(logy ~ logx, test)
> >
> > Is that the preferred method?
> >
> >
> > > Le dimanche 21 février 2016 à 15:54 -0800, Mario Henrique a
> > > écrit : 
> > > > How to run a log-log linear regression  in Julia? 
> > > > Like "lm(log (y) ~ log (x)" in R 
> > > AFAIK this is perfectly equivalent to taking the log of both x
> > > and y, 
> > > and applying a linear regression on the resulting variables. So
> > > that 
> > > should be quite straightforward with GLM.jl (see its
> > > documentation). 
> > >
> > >
> > > Regards 
> > -- 
> > You received this message because you are subscribed to a topic in
> > the Google Groups "julia-stats" group.
> > To unsubscribe from this topic, visit <a href="https://groups.google.com/d/t" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://groups.google.com/d/t&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/t&#39;;return true;">https://groups.google.com/d/t
> > opic/julia-stats/wGH77VmDQDc/unsubscribe.
> > To unsubscribe from this group and all its topics, send an email to
> > julia-stats...@googlegroups.com.
> > For more options, visit <a href="https://groups.google.com/d/optout" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.
> >
>
>

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to julia-stats...@googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "julia-stats" group.
To unsubscribe from this topic, visit <a href="https://groups.google.com/d/topic/julia-stats/wGH77VmDQDc/unsubscribe" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://groups.google.com/d/topic/julia-stats/wGH77VmDQDc/unsubscribe&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/topic/julia-stats/wGH77VmDQDc/unsubscribe&#39;;return true;">https://groups.google.com/d/topic/julia-stats/wGH77VmDQDc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to julia-stats...@googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.



--
Mario Henrique

--
You received this message because you are subscribed to a topic in the Google Groups "julia-stats" group.
To unsubscribe from this topic, visit <a href="https://groups.google.com/d/topic/julia-stats/wGH77VmDQDc/unsubscribe" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/topic/julia-stats/wGH77VmDQDc/unsubscribe&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/topic/julia-stats/wGH77VmDQDc/unsubscribe&#39;;return true;">https://groups.google.com/d/topic/julia-stats/wGH77VmDQDc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="NWhTYTouAwAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">julia-stats...@googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.



--
Mario Henrique

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="NWhTYTouAwAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">julia-stats...@googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "julia-stats" group.
To unsubscribe from this topic, visit <a href="https://groups.google.com/d/topic/julia-stats/wGH77VmDQDc/unsubscribe" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/topic/julia-stats/wGH77VmDQDc/unsubscribe&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/topic/julia-stats/wGH77VmDQDc/unsubscribe&#39;;return true;">https://groups.google.com/d/topic/julia-stats/wGH77VmDQDc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="NWhTYTouAwAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">julia-stats...@googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.



--
Mario Henrique



--
Mario Henrique

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="NWhTYTouAwAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">julia-stats...@googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "julia-stats" group.
To unsubscribe from this topic, visit <a href="https://groups.google.com/d/topic/julia-stats/wGH77VmDQDc/unsubscribe" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/topic/julia-stats/wGH77VmDQDc/unsubscribe&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/topic/julia-stats/wGH77VmDQDc/unsubscribe&#39;;return true;">https://groups.google.com/d/topic/julia-stats/wGH77VmDQDc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="NWhTYTouAwAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">julia-stats...@googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.



--
Mario Henrique

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Log-log regression

Stefan Karpinski
Glad we resolve that. Andreas' explanation makes sense.

On Wed, Mar 2, 2016 at 10:53 AM, Cedric St-Jean <[hidden email]> wrote:
Thanks Andreas, you're right, I confused "power law" and "power law distribution".

On Tuesday, March 1, 2016 at 11:41:09 PM UTC-5, Mario Henrique wrote:
You got the point Andreas. Thank you!
This group is wonderful.

2016-03-02 1:35 GMT-03:00 Andreas Noack <[hidden email]>:
There is some confusion here. The context for the blog post is not really explained. The kind of log-log plots that the blog is talking about are very different from the log-log regressions made in econometrics.

In the blog post, the author considers plots of P(X>x) against x with log scales on both axes, i.e. a single variable. Power laws would give a straight downward-sloping line. By construction, the dots will cross the y-axis at one regardless of the distribution that has created the data, but a regression line fitted to the data might not do that which is why the author of the blog post writes that "It [the least squares fit of the log transformed data] generally doesn't even give you a probability distribution".

The model often used in econometrics, and probably many other places, relates **two** or more log transformed variables. As Mario points out, the regression coefficients are then interpreted as elasticities. There is probably also bad things to say about this, but that would be something different than what the blog post is criticizing.

Economists are usually not that interested in power laws. The main exception is that the Pareto distribution is often used for modellng the upper tail in the income distribution.

By the way, the blog seems quite good if you are interested in statistics. I didn't know about it so thanks for the link.

On Tue, Mar 1, 2016 at 9:58 PM, Mario Silveira <[hidden email]> wrote:
Maybe I did not express myself properly. Here an example of what I meant http://www.dummies.com/how-to/content/econometrics-and-the-loglog-model.html

2016-03-01 23:49 GMT-03:00 Mario Silveira <[hidden email]>:
Econometrics eventually "developing independently" of the statistical form. The reason is the type of data we deal. In a basic course in statistics, linear regression takes only a few chapters of the book. In a first course of econometrics, the whole course is about regression models, when the estimators are biased, multicollinearity, heteroscedasticity, static or dynamic interpretation ...
I believe that no other area of study is so much concerned with the accuracy of regression as in econometrics.
Ps: in econometrics, statistical estimation happens in a second stage, first the model must be validated theoretically. Of course, there are several problems, but the log-log regression to elasticity is one of the most basic and reliable tools used every day worldwide.
If you interest in see a litle of how econometric works, an sugestion is Econometrics Analysis by Greene.

2016-03-01 23:26 GMT-03:00 Stefan Karpinski <[hidden email]>:
Cedric's point isn't that you can't have models that appear linear on a log-log plot – power laws are precisely this kind of model. The point is that you should not use linear regression on the log-transformed data to estimate the model parameters. If that's what's done in econometrics books, they may need revision.

On Tue, Mar 1, 2016 at 9:17 PM, Mario Silveira <[hidden email]> wrote:
Cédric,
in economics log-log models are widely used for various reasons, the main one is that it allows you to interpret the model in elasticity term.
Log-log with OLS appears in the majority of econometrics books, has been shown, empirical and mathematically be useful when used with the right premises.
The author must be right that log-log has its problems, but I think it is wrong to generalize it, at least he should present some quotes with proof and evidence, if unlike just merely an allegation without scientific value.

Thanks for atention Cédric.


2016-03-01 22:39 GMT-03:00 Cedric St-Jean <[hidden email]>:
This page seems relevant:

  1. Abusing linear regression makes the baby Gauss cry. Fitting a line to your log-log plot by least squares is a bad idea. It generally doesn't even give you a probability distribution, and even if your data do follow a power-law distribution, it gives you a bad estimate of the parameters. You cannot use the error estimates your regression software gives you, because those formulas incorporate assumptions which directly contradict the idea that you are seeing samples from a power law. And no, you cannot claim that because the line "explains" (really, describes) a lot of the variance that you must have a power law, because you can get a very high R^2 from other distributions (that test has no "power"). And this is without getting into the additional errors caused by trying to fit a line to binned histograms.
    It's true that fitting lines on log-log graphs is what Pareto did back in the day when he started this whole power-law business, but "the day" was the 1890s. There's a time and a place for being old school; this isn't it.
  2. Use maximum likelihood to estimate the scaling exponent. It's fast! The formula is easy! Best of all, it works! The method of maximum likelihood was invented in 1922 [parts 1 and 2], by someone who studied statistical mechanics, no less. The maximum likelihood estimators for the discrete (Zipf/zeta) and continuous (Pareto) power laws were worked out in 1952 and 1957 (respectively). They converge on the correct value of the scaling exponent with probability 1, and they do so efficiently. You can even work out their sampling distribution (it's an inverse gamma) and so get exact confidence intervals. Use the MLEs!
I don't usually work with power laws, so I don't have an opinion on this. But I believe that the issue is that the log on the y-axis distorts the Gaussian error distribution on which the least-square fit is predicated.

HTH,

Cédric

On Monday, February 22, 2016 at 8:59:29 AM UTC-5, Mario Henrique wrote:
Yes Michael. I'm economist and most softwares I know for econometrics have to add log variables firt to do regression. There is no problem in do it for me.
Thanks for help, Julia is great!!


2016-02-22 10:07 GMT-03:00 Michael Krabbe Borregaard <[hidden email]>:
Thanks for the clarification! And great news on the potential for functions to be included in formulas. It is not too annoying to add the variables, but it will help to create a more fluid and natural analytical workflow I feel.

On Mon, Feb 22, 2016 at 1:53 PM, Milan Bouchet-Valat <[hidden email]> wrote:
Le lundi 22 février 2016 à 08:48 -0300, Mario Silveira a écrit :
> Yes, i have to put log in data frame firt, take "lm(log(y) ~ log(x),
> data)" i get a error.
Yes, sorry I wasn't clear (I thought your question was about
statistical theory). Adding transformed variables to the data frame is
the preferred method AFAIK. And it's not too annoying to do either.

That said, transformation inside formulas will likely be supported at
some point. See this old issue :
https://github.com/JuliaStats/DataFrames.jl/issues/19


Regards

> Thanks for help.

>
> 2016-02-22 8:30 GMT-03:00 Michael Borregaard <[hidden email]>
> :
> > The documentation is not very explicit about the preferred way to
> > do something like that. It looks to me as if you have to put the
> > variables in a DataFrame, update the DataFrame with log versions,
> > then do the lm. 
> > using GLM, DataFrames
> > x = collect(1:5) + rand(5)
> > y = collect(1:5) + rand(5)
> > test = DataFrame(x = x, y = y)
> > test[:logx] = log(test[:x])
> > test[:logy] = log(test[:y])
> > lm(logy ~ logx, test)
> >
> > Is that the preferred method?
> >
> >
> > > Le dimanche 21 février 2016 à 15:54 -0800, Mario Henrique a
> > > écrit : 
> > > > How to run a log-log linear regression  in Julia? 
> > > > Like "lm(log (y) ~ log (x)" in R 
> > > AFAIK this is perfectly equivalent to taking the log of both x
> > > and y, 
> > > and applying a linear regression on the resulting variables. So
> > > that 
> > > should be quite straightforward with GLM.jl (see its
> > > documentation). 
> > >
> > >
> > > Regards 
> > -- 
> > You received this message because you are subscribed to a topic in
> > the Google Groups "julia-stats" group.
> > To unsubscribe from this topic, visit https://groups.google.com/d/t
> > opic/julia-stats/wGH77VmDQDc/unsubscribe.
> > To unsubscribe from this group and all its topics, send an email to
> > [hidden email].
> > For more options, visit https://groups.google.com/d/optout.
> >
>
>

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "julia-stats" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/julia-stats/wGH77VmDQDc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--
Mario Henrique

--
You received this message because you are subscribed to a topic in the Google Groups "julia-stats" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/julia-stats/wGH77VmDQDc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--
Mario Henrique

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "julia-stats" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/julia-stats/wGH77VmDQDc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--
Mario Henrique



--
Mario Henrique

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "julia-stats" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/julia-stats/wGH77VmDQDc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--
Mario Henrique

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Log-log regression

Jason Merrill
In reply to this post by Andreas Noack
On Tuesday, March 1, 2016 at 11:35:22 PM UTC-5, Andreas Noack wrote:
There is some confusion here. The context for the blog post is not really explained. The kind of log-log plots that the blog is talking about are very different from the log-log regressions made in econometrics.

...
 
The model often used in econometrics, and probably many other places, relates **two** or more log transformed variables. As Mario points out, the regression coefficients are then interpreted as elasticities. There is probably also bad things to say about this, but that would be something different than what the blog post is criticizing.

I would like to take a crack at this. I guess it's a bit of a hobby horse. I don't know enough economics to criticize the economics scenario, but I have seen a lot of questionable log-transformed least-squares in the hard sciences.

The typical scenario that justifies standard least squares is a model that looks like this:

(1) y_i = f(x_i; a) + σ ϵ_i

where f is some deterministic model function, x_i and y_i are the independent and dependent data variables, a is a free parameter or a whole collection of free parameters, σ is the standard deviation of the error (which is typically unknown or uncertain), and the ϵ_i are independent samples from a standard normal distribution.

The known (standard normal) joint distribution of the ϵ_i justifies taking the likelihood in terms of y_i - f(x_i; a) to be multivariate normal, which in turn justifies least squares as maximum likelihood. This story is told in various notations at the beginning of essentially every treatment of maximum likelihood.

If your error is multiplicative and log-normally distributed instead of additive and normally distributed, then the model instead looks like

(2) y_i = f(x_i; a)exp(σ ϵ_i)

where all the symbols have exactly the same meaning as in (1) (note that exponentiating a normally distributed variable gets you a log-normally distributed variable). Then, taking logs gets you back to something that looks like (1) in terms of transformed variables

log(y_i) = log(f(x_i; a)) + σ ϵ_i

which justifies log-transformed least squares as maximum likelihood in the same way as before.

This is fine in theory, but there are a couple problems in practice:

1. People very frequently decide to do log-transformed least-squares based on the algebraic form of f: if f is exponential or a power law, the log transformation turns a non-linear least-squares problem into linear least squares. Linear least squares is easier to execute, so that's what people frequently do. But the algebraic form of f is a totally independent issue from the question of whether the errors are additive or multiplicative (or enters in some even more complicated way). Therefore, the algebraic form of f is totally independent from the statistical justification for log-transformed least-squares, contrary to folk lore and popular practice.

2. Additive noise of some kind almost always exists in real measurements, even if there is *also* multiplicative noise. If you put something through an electronic circuit, you'll end up with at least some additive Johnson noise. Additionally, there is very commonly an uncertain additive background of some kind. So even if multiplicative noise is the dominant effect, more realistic models look like

(3) y_i = f(x_i; a)exp(σ ϵ_i) + b + ω δ_i

where b is an uncertain additive background,  ω is the standard deviation of the additive noise, and δ_i represents independent samples from a standard normal distribution, just like ϵ_i.

If you ignore the additive noise, and try to account for the additive background by subtracting off an uncertain estimate of it, and then perform log-transformed least-squares, you end up with big problems if you have any data where y_i is small compared to the uncertainty in b (the background), or compared to ω (the size of the additive noise). Poorly accounted-for additive effects might make some of your data negative (maybe only after background subtraction), which makes the log-transformed procedure totally blow up. You sometimes see people try to fix this up by clamping the data to be above some very small positive value. Even when nothing ends up negative, very small y_i often end up with very large relative error. Because log-transformed least-squares essentially assumes constant relative error, your whole fit may end up being dominated by very small data values and their anomalously large relative error.

Doing standard least-squares when the error is actually multiplicative is often less bad than doing log-transformed least squares when the error is actually additive, because it's often preferable to have your fit dominated by large values and their (possibly) anomalously large absolute error than it is to have your fit dominated by small values and their (possibly) anomalously large relative error. You usually want to err on the side of accurately modeling the part of your data that is not very close to zero.

But it's also possible to turn the maximum-likelihood crank on the full model (3), and with the help of software, I don't think this actually has to be so much more onerous than any other regression procedure.

I'm not sure this sketch is enough to convince anyone who doesn't already know about all of this, and I also think that various aspects of it don't apply directly to the economics scenario. But I wanted to mention it since log-transformed least-squares does run into big problems even in the variable-response scenario, albeit somewhat different problems from the ones covered by Shalizi and Newman for the distribution-fitting scenario.

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Log-log regression

Stefan Karpinski
Thanks for writing that up. I really enjoyed reading it. Would actually make a rather nice blog post.

On Fri, Mar 4, 2016 at 11:23 AM, Jason Merrill <[hidden email]> wrote:
On Tuesday, March 1, 2016 at 11:35:22 PM UTC-5, Andreas Noack wrote:
There is some confusion here. The context for the blog post is not really explained. The kind of log-log plots that the blog is talking about are very different from the log-log regressions made in econometrics.

...
 
The model often used in econometrics, and probably many other places, relates **two** or more log transformed variables. As Mario points out, the regression coefficients are then interpreted as elasticities. There is probably also bad things to say about this, but that would be something different than what the blog post is criticizing.

I would like to take a crack at this. I guess it's a bit of a hobby horse. I don't know enough economics to criticize the economics scenario, but I have seen a lot of questionable log-transformed least-squares in the hard sciences.

The typical scenario that justifies standard least squares is a model that looks like this:

(1) y_i = f(x_i; a) + σ ϵ_i

where f is some deterministic model function, x_i and y_i are the independent and dependent data variables, a is a free parameter or a whole collection of free parameters, σ is the standard deviation of the error (which is typically unknown or uncertain), and the ϵ_i are independent samples from a standard normal distribution.

The known (standard normal) joint distribution of the ϵ_i justifies taking the likelihood in terms of y_i - f(x_i; a) to be multivariate normal, which in turn justifies least squares as maximum likelihood. This story is told in various notations at the beginning of essentially every treatment of maximum likelihood.

If your error is multiplicative and log-normally distributed instead of additive and normally distributed, then the model instead looks like

(2) y_i = f(x_i; a)exp(σ ϵ_i)

where all the symbols have exactly the same meaning as in (1) (note that exponentiating a normally distributed variable gets you a log-normally distributed variable). Then, taking logs gets you back to something that looks like (1) in terms of transformed variables

log(y_i) = log(f(x_i; a)) + σ ϵ_i

which justifies log-transformed least squares as maximum likelihood in the same way as before.

This is fine in theory, but there are a couple problems in practice:

1. People very frequently decide to do log-transformed least-squares based on the algebraic form of f: if f is exponential or a power law, the log transformation turns a non-linear least-squares problem into linear least squares. Linear least squares is easier to execute, so that's what people frequently do. But the algebraic form of f is a totally independent issue from the question of whether the errors are additive or multiplicative (or enters in some even more complicated way). Therefore, the algebraic form of f is totally independent from the statistical justification for log-transformed least-squares, contrary to folk lore and popular practice.

2. Additive noise of some kind almost always exists in real measurements, even if there is *also* multiplicative noise. If you put something through an electronic circuit, you'll end up with at least some additive Johnson noise. Additionally, there is very commonly an uncertain additive background of some kind. So even if multiplicative noise is the dominant effect, more realistic models look like

(3) y_i = f(x_i; a)exp(σ ϵ_i) + b + ω δ_i

where b is an uncertain additive background,  ω is the standard deviation of the additive noise, and δ_i represents independent samples from a standard normal distribution, just like ϵ_i.

If you ignore the additive noise, and try to account for the additive background by subtracting off an uncertain estimate of it, and then perform log-transformed least-squares, you end up with big problems if you have any data where y_i is small compared to the uncertainty in b (the background), or compared to ω (the size of the additive noise). Poorly accounted-for additive effects might make some of your data negative (maybe only after background subtraction), which makes the log-transformed procedure totally blow up. You sometimes see people try to fix this up by clamping the data to be above some very small positive value. Even when nothing ends up negative, very small y_i often end up with very large relative error. Because log-transformed least-squares essentially assumes constant relative error, your whole fit may end up being dominated by very small data values and their anomalously large relative error.

Doing standard least-squares when the error is actually multiplicative is often less bad than doing log-transformed least squares when the error is actually additive, because it's often preferable to have your fit dominated by large values and their (possibly) anomalously large absolute error than it is to have your fit dominated by small values and their (possibly) anomalously large relative error. You usually want to err on the side of accurately modeling the part of your data that is not very close to zero.

But it's also possible to turn the maximum-likelihood crank on the full model (3), and with the help of software, I don't think this actually has to be so much more onerous than any other regression procedure.

I'm not sure this sketch is enough to convince anyone who doesn't already know about all of this, and I also think that various aspects of it don't apply directly to the economics scenario. But I wanted to mention it since log-transformed least-squares does run into big problems even in the variable-response scenario, albeit somewhat different problems from the ones covered by Shalizi and Newman for the distribution-fitting scenario.

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
12