

The current formula interface for packages like GLM and MixedModels emulates that of R in that a formula is written like
y ~ 1 + x + z
The difficulty with this form is that the ~ character is used elsewhere in Julia so somewhat nasty tricks need to be used to parse such an expression as a formula.
One way to break away from this Rcentric approach is to use a Pair to represent a formula. Because we don't want to evaluate the expressions in a formula at function call it would be necessary to use Pair(Symbol,Expr) or Pair(Expr,Expr) to represent a twosided formula. The translation of the previous formula would be
:y => :(1 + x + z)
This requires a few extra keystrokes but is not a terrible burden and it would use a native Julia construct. It also serves to visually distinguish a formula in Julia from a formula in R so that we can make other changes in the formula language (e.g. require an explicit 1 for the intercept term) with less confusion for users. Because a formula in Julia looks different from a formula in R it is less confusing that other aspects of the formula syntax are different in Julia and in R.

You received this message because you are subscribed to the Google Groups "juliastats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.


Using Pairs (and therefore =>) sounds like a good idea to me, as it
conveys exactly the meaning of associating two parts of a formula
together, in a structure designed for that. (Well, the direction of the
arrow isn't very natural, but...)
But maybe to make it nicer to read we could make the whole formula an
expression, i.e.:
:(y => 1 + x + z)
instead of:
:y => :(1 + x + z)
That would make the syntax very close to what macros would allow to fit
a model:
@fit(LinearModel, y => 1 + x + z, data)
(Such a macro, while not strictly necessary, could also allow saving
the full call expression and the name of the dataset used when fitting
the model, to print it to the user as R does.)
Maybe more importantly, it would remove the requirement for the left
handside of the formula to be a symbol. Indeed, some models (like PLS
regression) accept several dependent variables, which could be written
like this:
:(y + z => 1 + x)
My two cents
Le lundi 01 février 2016 à 13:18 0800, Douglas Bates a écrit :
> The current formula interface for packages like GLM and MixedModels
> emulates that of R in that a formula is written like
>
> y ~ 1 + x + z
>
> The difficulty with this form is that the ~ character is used
> elsewhere in Julia so somewhat nasty tricks need to be used to parse
> such an expression as a formula.
>
> One way to break away from this Rcentric approach is to use a Pair
> to represent a formula. Because we don't want to evaluate the
> expressions in a formula at function call it would be necessary to
> use Pair(Symbol,Expr) or Pair(Expr,Expr) to represent a twosided
> formula. The translation of the previous formula would be
>
> :y => :(1 + x + z)
>
> This requires a few extra keystrokes but is not a terrible burden and
> it would use a native Julia construct. It also serves to visually
> distinguish a formula in Julia from a formula in R so that we can
> make other changes in the formula language (e.g. require an explicit
> 1 for the intercept term) with less confusion for users. Because a
> formula in Julia looks different from a formula in R it is less
> confusing that other aspects of the formula syntax are different in
> Julia and in R.
>
> 
> You received this message because you are subscribed to the Google
> Groups "juliastats" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to [hidden email].
> For more options, visit https://groups.google.com/d/optout.

You received this message because you are subscribed to the Google Groups "juliastats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.


Le mardi 02 février 2016 à 15:30 +0100, Milan BouchetValat a écrit :
> Using Pairs (and therefore =>) sounds like a good idea to me, as it
> conveys exactly the meaning of associating two parts of a formula
> together, in a structure designed for that. (Well, the direction of the
> arrow isn't very natural, but...)
>
> But maybe to make it nicer to read we could make the whole formula an
> expression, i.e.:
> :(y => 1 + x + z)
>
> instead of:
> :y => :(1 + x + z)
>
>
> That would make the syntax very close to what macros would allow to fit
> a model:
> @fit(LinearModel, y => 1 + x + z, data)
>
> (Such a macro, while not strictly necessary, could also allow saving
> the full call expression and the name of the dataset used when fitting
> the model, to print it to the user as R does.)
>
>
> Maybe more importantly, it would remove the requirement for the left
> handside of the formula to be a symbol. Indeed, some models (like PLS
> regression) accept several dependent variables, which could be written
> like this:
> :(y + z => 1 + x)
Actually, scratch that, as the two features are orthogonal.
:(y => 1 + x + z) and :y => :(1 + x + z) both have Symbol as the type
for the LHS. The only difference is whether we have a Pair of
expressions/symbols or a => call with two expression/symbol arguments.
Yet it might be a bit nicer to write
:(y + z => 1 + x)
rather than
:(y + z) => :(1 + x)
> My two cents
>
>
> Le lundi 01 février 2016 à 13:18 0800, Douglas Bates a écrit :
> > The current formula interface for packages like GLM and MixedModels
> > emulates that of R in that a formula is written like
> >
> > y ~ 1 + x + z
> >
> > The difficulty with this form is that the ~ character is used
> > elsewhere in Julia so somewhat nasty tricks need to be used to
> > parse
> > such an expression as a formula.
> >
> > One way to break away from this Rcentric approach is to use a Pair
> > to represent a formula. Because we don't want to evaluate the
> > expressions in a formula at function call it would be necessary to
> > use Pair(Symbol,Expr) or Pair(Expr,Expr) to represent a twosided
> > formula. The translation of the previous formula would be
> >
> > :y => :(1 + x + z)
> >
> > This requires a few extra keystrokes but is not a terrible burden
> > and
> > it would use a native Julia construct. It also serves to visually
> > distinguish a formula in Julia from a formula in R so that we can
> > make other changes in the formula language (e.g. require an
> > explicit
> > 1 for the intercept term) with less confusion for users. Because a
> > formula in Julia looks different from a formula in R it is less
> > confusing that other aspects of the formula syntax are different in
> > Julia and in R.
> >
> > 
> > You received this message because you are subscribed to the Google
> > Groups "juliastats" group.
> > To unsubscribe from this group and stop receiving emails from it,
> > send an email to [hidden email].
> > For more options, visit https://groups.google.com/d/optout.
>

You received this message because you are subscribed to the Google Groups "juliastats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.


I don't have a strong feeling about `~` versus `=>` – although the R tradition of using `~` seems to at least give a hint about what's going on, which is kind of nice. But I do think that it would be good to get rid of the macro business for ~ and start spelling model specifications as `@model y ~ 1 + x + z` or `@model y => 1 + x + z` and returning some kind of Model type instead of using bare expression objects for this kind of thing. Expression objects already have a meaning in Julia code and it is not to specify statistical models – it is to represent Julia expression trees. The fact that those two meanings can usually be disambiguated easily doesn't mean they should be represented the same way.

You received this message because you are subscribed to the Google Groups "juliastats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.


Le mardi 02 février 2016 à 10:50 0500, Stefan Karpinski a écrit :
> I don't have a strong feeling about `~` versus `=>` – although the R
> tradition of using `~` seems to at least give a hint about what's
> going on, which is kind of nice. But I do think that it would be good
> to get rid of the macro business for ~ and start spelling model
> specifications as `@model y ~ 1 + x + z` or `@model y => 1 + x + z`
> and returning some kind of Model type instead of using bare
> expression objects for this kind of thing. Expression objects already
> have a meaning in Julia code and it is not to specify statistical
> models – it is to represent Julia expression trees. The fact that
> those two meanings can usually be disambiguated easily doesn't mean
> they should be represented the same way.
That would be a Formula type, rather than a Model (a model includes
other details like a link function, and error distribution, etc.).
We could imagine two interfaces:
 @formula(y ~ 1 + x + z) would create a Formula object, which could be
passed to fit(), etc.
 @fit(ModelType, y ~ 1 + x + z, ...) would be a shorthand for
fit(ModelType, @formula(y ~ 1 + x + z), ...)
At that point, the choice of ~ or => doesn't make much of a difference
technically, so we may as well keep ~.
> On Tue, Feb 2, 2016 at 10:15 AM, Milan BouchetValat wrote:
> > Le mardi 02 février 2016 à 15:30 +0100, Milan BouchetValat a écrit :
> > > Using Pairs (and therefore =>) sounds like a good idea to me, as it
> > > conveys exactly the meaning of associating two parts of a formula
> > > together, in a structure designed for that. (Well, the direction of the
> > > arrow isn't very natural, but...)
> > >
> > > But maybe to make it nicer to read we could make the whole formula an
> > > expression, i.e.:
> > > :(y => 1 + x + z)
> > >
> > > instead of:
> > > :y => :(1 + x + z)
> > >
> > >
> > > That would make the syntax very close to what macros would allow to fit
> > > a model:
> > > @fit(LinearModel, y => 1 + x + z, data)
> > >
> > > (Such a macro, while not strictly necessary, could also allow saving
> > > the full call expression and the name of the dataset used when fitting
> > > the model, to print it to the user as R does.)
> > >
> > >
> > > Maybe more importantly, it would remove the requirement for the left
> > > handside of the formula to be a symbol. Indeed, some models (like PLS
> > > regression) accept several dependent variables, which could be written
> > > like this:
> > > :(y + z => 1 + x)
> > Actually, scratch that, as the two features are orthogonal.
> > :(y => 1 + x + z) and :y => :(1 + x + z) both have Symbol as the type
> > for the LHS. The only difference is whether we have a Pair of
> > expressions/symbols or a => call with two expression/symbol arguments.
> >
> > Yet it might be a bit nicer to write
> > :(y + z => 1 + x)
> > rather than
> > :(y + z) => :(1 + x)
> >
> >
> > > My two cents
> > >
> > >
> > > Le lundi 01 février 2016 à 13:18 0800, Douglas Bates a écrit :
> > > > The current formula interface for packages like GLM and MixedModels
> > > > emulates that of R in that a formula is written like
> > > >
> > > > y ~ 1 + x + z
> > > >
> > > > The difficulty with this form is that the ~ character is used
> > > > elsewhere in Julia so somewhat nasty tricks need to be used to
> > > > parse
> > > > such an expression as a formula.
> > > >
> > > > One way to break away from this Rcentric approach is to use a Pair
> > > > to represent a formula. Because we don't want to evaluate the
> > > > expressions in a formula at function call it would be necessary to
> > > > use Pair(Symbol,Expr) or Pair(Expr,Expr) to represent a twosided
> > > > formula. The translation of the previous formula would be
> > > >
> > > > :y => :(1 + x + z)
> > > >
> > > > This requires a few extra keystrokes but is not a terrible burden
> > > > and
> > > > it would use a native Julia construct. It also serves to visually
> > > > distinguish a formula in Julia from a formula in R so that we can
> > > > make other changes in the formula language (e.g. require an
> > > > explicit
> > > > 1 for the intercept term) with less confusion for users. Because a
> > > > formula in Julia looks different from a formula in R it is less
> > > > confusing that other aspects of the formula syntax are different in
> > > > Julia and in R.
> > > >
> > > > 
> > > > You received this message because you are subscribed to the Google
> > > > Groups "juliastats" group.
> > > > To unsubscribe from this group and stop receiving emails from it,
> > > > send an email to [hidden email].
> > > > For more options, visit https://groups.google.com/d/optout.
> > >
> >
> > 
> > You received this message because you are subscribed to the Google Groups "juliastats" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
> > For more options, visit https://groups.google.com/d/optout.
> >
> 
> You received this message because you are subscribed to the Google Groups "juliastats" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
> For more options, visit https://groups.google.com/d/optout.

You received this message because you are subscribed to the Google Groups "juliastats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.


Without considering the technical implications, how about something like E(y  1 + x + z)? This is mathematically correct for GLMs, but not when modelling the median via quantile regression for example. Perhaps @median(y  1 + x + z) for quantile regression and @mean(y  1 + x + z) for GLMs? On Wednesday, 3 February 2016 03:30:27 UTC+11, Milan BouchetValat wrote: Le mardi 02 février 2016 à 10:50 0500, Stefan Karpinski a écrit :
> I don't have a strong feeling about `~` versus `=>` – although the R
> tradition of using `~` seems to at least give a hint about what's
> going on, which is kind of nice. But I do think that it would be good
> to get rid of the macro business for ~ and start spelling model
> specifications as `@model y ~ 1 + x + z` or `@model y => 1 + x + z`
> and returning some kind of Model type instead of using bare
> expression objects for this kind of thing. Expression objects already
> have a meaning in Julia code and it is not to specify statistical
> models – it is to represent Julia expression trees. The fact that
> those two meanings can usually be disambiguated easily doesn't mean
> they should be represented the same way.
That would be a Formula type, rather than a Model (a model includes
other details like a link function, and error distribution, etc.).
We could imagine two interfaces:
 @formula(y ~ 1 + x + z) would create a Formula object, which could be
passed to fit(), etc.
 @fit(ModelType, y ~ 1 + x + z, ...) would be a shorthand for
fit(ModelType, @formula(y ~ 1 + x + z), ...)
At that point, the choice of ~ or => doesn't make much of a difference
technically, so we may as well keep ~.
> On Tue, Feb 2, 2016 at 10:15 AM, Milan BouchetValat wrote:
> > Le mardi 02 février 2016 à 15:30 +0100, Milan BouchetValat a écrit :
> > > Using Pairs (and therefore =>) sounds like a good idea to me, as it
> > > conveys exactly the meaning of associating two parts of a formula
> > > together, in a structure designed for that. (Well, the direction of the
> > > arrow isn't very natural, but...)
> > >
> > > But maybe to make it nicer to read we could make the whole formula an
> > > expression, i.e.:
> > > :(y => 1 + x + z)
> > >
> > > instead of:
> > > :y => :(1 + x + z)
> > >
> > >
> > > That would make the syntax very close to what macros would allow to fit
> > > a model:
> > > @fit(LinearModel, y => 1 + x + z, data)
> > >
> > > (Such a macro, while not strictly necessary, could also allow saving
> > > the full call expression and the name of the dataset used when fitting
> > > the model, to print it to the user as R does.)
> > >
> > >
> > > Maybe more importantly, it would remove the requirement for the left
> > > handside of the formula to be a symbol. Indeed, some models (like PLS
> > > regression) accept several dependent variables, which could be written
> > > like this:
> > > :(y + z => 1 + x)
> > Actually, scratch that, as the two features are orthogonal.
> > :(y => 1 + x + z) and :y => :(1 + x + z) both have Symbol as the type
> > for the LHS. The only difference is whether we have a Pair of
> > expressions/symbols or a => call with two expression/symbol arguments.
> >
> > Yet it might be a bit nicer to write
> > :(y + z => 1 + x)
> > rather than
> > :(y + z) => :(1 + x)
> >
> >
> > > My two cents
> > >
> > >
> > > Le lundi 01 février 2016 à 13:18 0800, Douglas Bates a écrit :
> > > > The current formula interface for packages like GLM and MixedModels
> > > > emulates that of R in that a formula is written like
> > > >
> > > > y ~ 1 + x + z
> > > >
> > > > The difficulty with this form is that the ~ character is used
> > > > elsewhere in Julia so somewhat nasty tricks need to be used to
> > > > parse
> > > > such an expression as a formula.
> > > >
> > > > One way to break away from this Rcentric approach is to use a Pair
> > > > to represent a formula. Because we don't want to evaluate the
> > > > expressions in a formula at function call it would be necessary to
> > > > use Pair(Symbol,Expr) or Pair(Expr,Expr) to represent a twosided
> > > > formula. The translation of the previous formula would be
> > > >
> > > > :y => :(1 + x + z)
> > > >
> > > > This requires a few extra keystrokes but is not a terrible burden
> > > > and
> > > > it would use a native Julia construct. It also serves to visually
> > > > distinguish a formula in Julia from a formula in R so that we can
> > > > make other changes in the formula language (e.g. require an
> > > > explicit
> > > > 1 for the intercept term) with less confusion for users. Because a
> > > > formula in Julia looks different from a formula in R it is less
> > > > confusing that other aspects of the formula syntax are different in
> > > > Julia and in R.
> > > >
> > > > 
> > > > You received this message because you are subscribed to the Google
> > > > Groups "juliastats" group.
> > > > To unsubscribe from this group and stop receiving emails from it,
> > > > send an email to <a href="javascript:" target="_blank" gdfobfuscatedmailto="HuwtJ_bFEQAJ" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">juliastats...@googlegroups.com.
> > > > For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.
> > >
> >
> > 
> > You received this message because you are subscribed to the Google Groups "juliastats" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdfobfuscatedmailto="HuwtJ_bFEQAJ" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">juliastats...@googlegroups.com.
> > For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.
> >
> 
> You received this message because you are subscribed to the Google Groups "juliastats" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdfobfuscatedmailto="HuwtJ_bFEQAJ" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">juliastats...@googlegroups.com.
> For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

You received this message because you are subscribed to the Google Groups "juliastats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.


Le mardi 02 février 2016 à 13:46 0800, [hidden email] a écrit :
> Without considering the technical implications, how about something
> like E(y  1 + x + z)?
I think the point is mostly about "technical implications". :)
The R convention of using ~ works quite well, so there should be a strong reason to invent something else (e.g. consistency with Julia parsing).
> This is mathematically correct for GLMs, but not when modelling the
> median via quantile regression for example.
> Perhaps @median(y  1 + x + z) for quantile regression and @mean(y 
> 1 + x + z) for GLMs?
What would be the point of forcing people to use different ways of
specifying formulas? Currently, the type of model you fit defines
whether the formula describes e.g. the mean, the median, or something
else. Repeating this when creating the formula doesn't add anything
AFAICT, as you cannot fit e.g. a GLM for the median anyways.
That said, we could think about how we could support other kinds of
models like survival models, where the LHS of the formula must take a
time to event and a censoring status, or the start and end of a period
and its censoring status. In R, these are supported by pseudofunctions
in the formula, like this:
Surv(start, stop, event) ~ 1 + x
Maybe that's OK, but maybe we can find something better. A bit off
topic though.
Regards
> On Wednesday, 3 February 2016 03:30:27 UTC+11, Milan BouchetValat
> wrote:
> > Le mardi 02 février 2016 à 10:50 0500, Stefan Karpinski a écrit :
> > > I don't have a strong feeling about `~` versus `=>` – although
> > the R
> > > tradition of using `~` seems to at least give a hint about
> > what's
> > > going on, which is kind of nice. But I do think that it would be
> > good
> > > to get rid of the macro business for ~ and start spelling model
> > > specifications as `@model y ~ 1 + x + z` or `@model y => 1 + x +
> > z`
> > > and returning some kind of Model type instead of using bare
> > > expression objects for this kind of thing. Expression objects
> > already
> > > have a meaning in Julia code and it is not to specify
> > statistical
> > > models – it is to represent Julia expression trees. The fact
> > that
> > > those two meanings can usually be disambiguated easily doesn't
> > mean
> > > they should be represented the same way.
> > That would be a Formula type, rather than a Model (a model
> > includes
> > other details like a link function, and error distribution, etc.).
> >
> > We could imagine two interfaces:
> >  @formula(y ~ 1 + x + z) would create a Formula object, which
> > could be
> > passed to fit(), etc.
> >  @fit(ModelType, y ~ 1 + x + z, ...) would be a shorthand for
> > fit(ModelType, @formula(y ~ 1 + x + z), ...)
> >
> > At that point, the choice of ~ or => doesn't make much of a
> > difference
> > technically, so we may as well keep ~.
> >
> > > On Tue, Feb 2, 2016 at 10:15 AM, Milan BouchetValat wrote:
> > > > Le mardi 02 février 2016 à 15:30 +0100, Milan BouchetValat a
> > écrit :
> > > > > Using Pairs (and therefore =>) sounds like a good idea to me,
> > as it
> > > > > conveys exactly the meaning of associating two parts of a
> > formula
> > > > > together, in a structure designed for that. (Well, the
> > direction of the
> > > > > arrow isn't very natural, but...)
> > > > >
> > > > > But maybe to make it nicer to read we could make the whole
> > formula an
> > > > > expression, i.e.:
> > > > > :(y => 1 + x + z)
> > > > >
> > > > > instead of:
> > > > > :y => :(1 + x + z)
> > > > >
> > > > >
> > > > > That would make the syntax very close to what macros would
> > allow to fit
> > > > > a model:
> > > > > @fit(LinearModel, y => 1 + x + z, data)
> > > > >
> > > > > (Such a macro, while not strictly necessary, could also allow
> > saving
> > > > > the full call expression and the name of the dataset used
> > when fitting
> > > > > the model, to print it to the user as R does.)
> > > > >
> > > > >
> > > > > Maybe more importantly, it would remove the requirement for
> > the left
> > > > > handside of the formula to be a symbol. Indeed, some models
> > (like PLS
> > > > > regression) accept several dependent variables, which could
> > be written
> > > > > like this:
> > > > > :(y + z => 1 + x)
> > > > Actually, scratch that, as the two features are orthogonal.
> > > > :(y => 1 + x + z) and :y => :(1 + x + z) both have Symbol as
> > the type
> > > > for the LHS. The only difference is whether we have a Pair of
> > > > expressions/symbols or a => call with two expression/symbol
> > arguments.
> > > >
> > > > Yet it might be a bit nicer to write
> > > > :(y + z => 1 + x)
> > > > rather than
> > > > :(y + z) => :(1 + x)
> > > >
> > > >
> > > > > My two cents
> > > > >
> > > > >
> > > > > Le lundi 01 février 2016 à 13:18 0800, Douglas Bates a écrit
> > :
> > > > > > The current formula interface for packages like GLM and
> > MixedModels
> > > > > > emulates that of R in that a formula is written like
> > > > > >
> > > > > > y ~ 1 + x + z
> > > > > >
> > > > > > The difficulty with this form is that the ~ character is
> > used
> > > > > > elsewhere in Julia so somewhat nasty tricks need to be used
> > to
> > > > > > parse
> > > > > > such an expression as a formula.
> > > > > >
> > > > > > One way to break away from this Rcentric approach is to
> > use a Pair
> > > > > > to represent a formula. Because we don't want to evaluate
> > the
> > > > > > expressions in a formula at function call it would be
> > necessary to
> > > > > > use Pair(Symbol,Expr) or Pair(Expr,Expr) to represent a
> > twosided
> > > > > > formula. The translation of the previous formula would be
> > > > > >
> > > > > > :y => :(1 + x + z)
> > > > > >
> > > > > > This requires a few extra keystrokes but is not a terrible
> > burden
> > > > > > and
> > > > > > it would use a native Julia construct. It also serves to
> > visually
> > > > > > distinguish a formula in Julia from a formula in R so that
> > we can
> > > > > > make other changes in the formula language (e.g. require
> > an
> > > > > > explicit
> > > > > > 1 for the intercept term) with less confusion for
> > users. Because a
> > > > > > formula in Julia looks different from a formula in R it is
> > less
> > > > > > confusing that other aspects of the formula syntax are
> > different in
> > > > > > Julia and in R.
> > > > > >
> > > > > > 
> > > > > > You received this message because you are subscribed to the
> > Google
> > > > > > Groups "juliastats" group.
> > > > > > To unsubscribe from this group and stop receiving emails
> > from it,
> > > > > > send an email to [hidden email].
> > > > > > For more options, visit https://groups.google.com/d/optout.
> >
> > > > >
> > > >
> > > > 
> > > > You received this message because you are subscribed to the
> > Google Groups "juliastats" group.
> > > > To unsubscribe from this group and stop receiving emails from
> > it, send an email to [hidden email].
> > > > For more options, visit https://groups.google.com/d/optout.
> > > >
> > > 
> > > You received this message because you are subscribed to the
> > Google Groups "juliastats" group.
> > > To unsubscribe from this group and stop receiving emails from it,
> > send an email to [hidden email].
> > > For more options, visit https://groups.google.com/d/optout.
> 
> You received this message because you are subscribed to the Google
> Groups "juliastats" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to [hidden email].
> For more options, visit https://groups.google.com/d/optout.

You received this message because you are subscribed to the Google Groups "juliastats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.


True, the approach I suggested conflates formula and model, and indeed repeating the intended model adds nothing. The main point is that from a modelling point of view (not a parsing point of view) we are considering the dependence of y on x, z, etc. In probabilistic syntax this is represented generally as Pr(y  x, z), from which a GLM can be expressed as E(y  x, z). I was merely trying to use this mathematical representation, and use formulalike syntax to remove any ambiguity about the form of the RHS. I'm not claiming this particular representation is the way to go. Rather, I think this idea merits further brainstorming. E.g., :(y  1 + x + z). Put another way, go with convention or explore possible improvements? On Wednesday, 3 February 2016 09:25:17 UTC+11, Milan BouchetValat wrote: Le mardi 02 février 2016 à 13:46 0800, <a href="javascript:" target="_blank" gdfobfuscatedmailto="CCJkDFPZEQAJ" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">jock....@... a écrit :
> Without considering the technical implications, how about something
> like E(y  1 + x + z)?
I think the point is mostly about "technical implications". :)
The R convention of using ~ works quite well, so there should be a strong reason to invent something else (e.g. consistency with Julia parsing).
> This is mathematically correct for GLMs, but not when modelling the
> median via quantile regression for example.
> Perhaps @median(y  1 + x + z) for quantile regression and @mean(y 
> 1 + x + z) for GLMs?
What would be the point of forcing people to use different ways of
specifying formulas? Currently, the type of model you fit defines
whether the formula describes e.g. the mean, the median, or something
else. Repeating this when creating the formula doesn't add anything
AFAICT, as you cannot fit e.g. a GLM for the median anyways.
That said, we could think about how we could support other kinds of
models like survival models, where the LHS of the formula must take a
time to event and a censoring status, or the start and end of a period
and its censoring status. In R, these are supported by pseudofunctions
in the formula, like this:
Surv(start, stop, event) ~ 1 + x
Maybe that's OK, but maybe we can find something better. A bit off
topic though.
Regards
> On Wednesday, 3 February 2016 03:30:27 UTC+11, Milan BouchetValat
> wrote:
> > Le mardi 02 février 2016 à 10:50 0500, Stefan Karpinski a écrit :
> > > I don't have a strong feeling about `~` versus `=>` – although
> > the R
> > > tradition of using `~` seems to at least give a hint about
> > what's
> > > going on, which is kind of nice. But I do think that it would be
> > good
> > > to get rid of the macro business for ~ and start spelling model
> > > specifications as `@model y ~ 1 + x + z` or `@model y => 1 + x +
> > z`
> > > and returning some kind of Model type instead of using bare
> > > expression objects for this kind of thing. Expression objects
> > already
> > > have a meaning in Julia code and it is not to specify
> > statistical
> > > models – it is to represent Julia expression trees. The fact
> > that
> > > those two meanings can usually be disambiguated easily doesn't
> > mean
> > > they should be represented the same way.
> > That would be a Formula type, rather than a Model (a model
> > includes
> > other details like a link function, and error distribution, etc.).
> >
> > We could imagine two interfaces:
> >  @formula(y ~ 1 + x + z) would create a Formula object, which
> > could be
> > passed to fit(), etc.
> >  @fit(ModelType, y ~ 1 + x + z, ...) would be a shorthand for
> > fit(ModelType, @formula(y ~ 1 + x + z), ...)
> >
> > At that point, the choice of ~ or => doesn't make much of a
> > difference
> > technically, so we may as well keep ~.
> >
> > > On Tue, Feb 2, 2016 at 10:15 AM, Milan BouchetValat wrote:
> > > > Le mardi 02 février 2016 à 15:30 +0100, Milan BouchetValat a
> > écrit :
> > > > > Using Pairs (and therefore =>) sounds like a good idea to me,
> > as it
> > > > > conveys exactly the meaning of associating two parts of a
> > formula
> > > > > together, in a structure designed for that. (Well, the
> > direction of the
> > > > > arrow isn't very natural, but...)
> > > > >
> > > > > But maybe to make it nicer to read we could make the whole
> > formula an
> > > > > expression, i.e.:
> > > > > :(y => 1 + x + z)
> > > > >
> > > > > instead of:
> > > > > :y => :(1 + x + z)
> > > > >
> > > > >
> > > > > That would make the syntax very close to what macros would
> > allow to fit
> > > > > a model:
> > > > > @fit(LinearModel, y => 1 + x + z, data)
> > > > >
> > > > > (Such a macro, while not strictly necessary, could also allow
> > saving
> > > > > the full call expression and the name of the dataset used
> > when fitting
> > > > > the model, to print it to the user as R does.)
> > > > >
> > > > >
> > > > > Maybe more importantly, it would remove the requirement for
> > the left
> > > > > handside of the formula to be a symbol. Indeed, some models
> > (like PLS
> > > > > regression) accept several dependent variables, which could
> > be written
> > > > > like this:
> > > > > :(y + z => 1 + x)
> > > > Actually, scratch that, as the two features are orthogonal.
> > > > :(y => 1 + x + z) and :y => :(1 + x + z) both have Symbol as
> > the type
> > > > for the LHS. The only difference is whether we have a Pair of
> > > > expressions/symbols or a => call with two expression/symbol
> > arguments.
> > > >
> > > > Yet it might be a bit nicer to write
> > > > :(y + z => 1 + x)
> > > > rather than
> > > > :(y + z) => :(1 + x)
> > > >
> > > >
> > > > > My two cents
> > > > >
> > > > >
> > > > > Le lundi 01 février 2016 à 13:18 0800, Douglas Bates a écrit
> > :
> > > > > > The current formula interface for packages like GLM and
> > MixedModels
> > > > > > emulates that of R in that a formula is written like
> > > > > >
> > > > > > y ~ 1 + x + z
> > > > > >
> > > > > > The difficulty with this form is that the ~ character is
> > used
> > > > > > elsewhere in Julia so somewhat nasty tricks need to be used
> > to
> > > > > > parse
> > > > > > such an expression as a formula.
> > > > > >
> > > > > > One way to break away from this Rcentric approach is to
> > use a Pair
> > > > > > to represent a formula. Because we don't want to evaluate
> > the
> > > > > > expressions in a formula at function call it would be
> > necessary to
> > > > > > use Pair(Symbol,Expr) or Pair(Expr,Expr) to represent a
> > twosided
> > > > > > formula. The translation of the previous formula would be
> > > > > >
> > > > > > :y => :(1 + x + z)
> > > > > >
> > > > > > This requires a few extra keystrokes but is not a terrible
> > burden
> > > > > > and
> > > > > > it would use a native Julia construct. It also serves to
> > visually
> > > > > > distinguish a formula in Julia from a formula in R so that
> > we can
> > > > > > make other changes in the formula language (e.g. require
> > an
> > > > > > explicit
> > > > > > 1 for the intercept term) with less confusion for
> > users. Because a
> > > > > > formula in Julia looks different from a formula in R it is
> > less
> > > > > > confusing that other aspects of the formula syntax are
> > different in
> > > > > > Julia and in R.
> > > > > >
> > > > > > 
> > > > > > You received this message because you are subscribed to the
> > Google
> > > > > > Groups "juliastats" group.
> > > > > > To unsubscribe from this group and stop receiving emails
> > from it,
> > > > > > send an email to juliastats...@googlegroups.com.
> > > > > > For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/ optout.
> >
> > > > >
> > > >
> > > > 
> > > > You received this message because you are subscribed to the
> > Google Groups "juliastats" group.
> > > > To unsubscribe from this group and stop receiving emails from
> > it, send an email to juliastats...@googlegroups.com.
> > > > For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.
> > > >
> > > 
> > > You received this message because you are subscribed to the
> > Google Groups "juliastats" group.
> > > To unsubscribe from this group and stop receiving emails from it,
> > send an email to juliastats...@googlegroups.com.
> > > For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.
> 
> You received this message because you are subscribed to the Google
> Groups "juliastats" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to <a href="javascript:" target="_blank" gdfobfuscatedmailto="CCJkDFPZEQAJ" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">juliastats...@googlegroups.com.
> For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

You received this message because you are subscribed to the Google Groups "juliastats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.


I definitely agree with Stefan on this.
Milan, you mentioned that we should have a strong reason to break the convention if we choose to do so. I've always found R's use of `~` to be a bit unfortunate. It's a carryover from S, which introduced `~` for formulas before R was even a thing. But S is also the language that brought us `<` for assignment, so I tend to be wary of its gifts. ;) At this point I think other languages that offer statistical modeling facilities are using `~` just because R does it. Julia has the opportunity to potentially set a new precedent as it gains traction for stats, so I think we should think carefully about the choices.
Stefan's suggestion of `@model` opens the possibility for just about any kind of separator because it just becomes the `head` in the `Expr` that goes into the macro. But I think the syntax for pairs, i.e. `=>`, would make the most sense in terms of consistency with existing Julia structures because a model is essentially a pair; it's some combination of responses paired with some combination of predictors.
Anyway, just thinking aloud. Alex On Tuesday, February 2, 2016 at 7:51:07 AM UTC8, Stefan Karpinski wrote: I don't have a strong feeling about `~` versus `=>` – although the R tradition of using `~` seems to at least give a hint about what's going on, which is kind of nice. But I do think that it would be good to get rid of the macro business for ~ and start spelling model specifications as `@model y ~ 1 + x + z` or `@model y => 1 + x + z` and returning some kind of Model type instead of using bare expression objects for this kind of thing. Expression objects already have a meaning in Julia code and it is not to specify statistical models – it is to represent Julia expression trees. The fact that those two meanings can usually be disambiguated easily doesn't mean they should be represented the same way. On Tue, Feb 2, 2016 at 10:15 AM, Milan BouchetValat <<a href="javascript:" target="_blank" gdfobfuscatedmailto="_Iuq9DDEQAJ" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">nali...@...> wrote: Le mardi 02 février 2016 à 15:30 +0100, Milan BouchetValat a écrit :
> Using Pairs (and therefore =>) sounds like a good idea to me, as it
> conveys exactly the meaning of associating two parts of a formula
> together, in a structure designed for that. (Well, the direction of the
> arrow isn't very natural, but...)
>
> But maybe to make it nicer to read we could make the whole formula an
> expression, i.e.:
> :(y => 1 + x + z)
>
> instead of:
> :y => :(1 + x + z)
>
>
> That would make the syntax very close to what macros would allow to fit
> a model:
> @fit(LinearModel, y => 1 + x + z, data)
>
> (Such a macro, while not strictly necessary, could also allow saving
> the full call expression and the name of the dataset used when fitting
> the model, to print it to the user as R does.)
>
>
> Maybe more importantly, it would remove the requirement for the left
> handside of the formula to be a symbol. Indeed, some models (like PLS
> regression) accept several dependent variables, which could be written
> like this:
> :(y + z => 1 + x)
Actually, scratch that, as the two features are orthogonal.
:(y => 1 + x + z) and :y => :(1 + x + z) both have Symbol as the type
for the LHS. The only difference is whether we have a Pair of
expressions/symbols or a => call with two expression/symbol arguments.
Yet it might be a bit nicer to write
:(y + z => 1 + x)
rather than
:(y + z) => :(1 + x)
> My two cents
>
>
> Le lundi 01 février 2016 à 13:18 0800, Douglas Bates a écrit :
> > The current formula interface for packages like GLM and MixedModels
> > emulates that of R in that a formula is written like
> >
> > y ~ 1 + x + z
> >
> > The difficulty with this form is that the ~ character is used
> > elsewhere in Julia so somewhat nasty tricks need to be used to
> > parse
> > such an expression as a formula.
> >
> > One way to break away from this Rcentric approach is to use a Pair
> > to represent a formula. Because we don't want to evaluate the
> > expressions in a formula at function call it would be necessary to
> > use Pair(Symbol,Expr) or Pair(Expr,Expr) to represent a twosided
> > formula. The translation of the previous formula would be
> >
> > :y => :(1 + x + z)
> >
> > This requires a few extra keystrokes but is not a terrible burden
> > and
> > it would use a native Julia construct. It also serves to visually
> > distinguish a formula in Julia from a formula in R so that we can
> > make other changes in the formula language (e.g. require an
> > explicit
> > 1 for the intercept term) with less confusion for users. Because a
> > formula in Julia looks different from a formula in R it is less
> > confusing that other aspects of the formula syntax are different in
> > Julia and in R.
> >
> > 
> > You received this message because you are subscribed to the Google
> > Groups "juliastats" group.
> > To unsubscribe from this group and stop receiving emails from it,
> > send an email to <a href="javascript:" target="_blank" gdfobfuscatedmailto="_Iuq9DDEQAJ" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">juliastats...@googlegroups.com.
> > For more options, visit <a href="https://groups.google.com/d/optout" rel="nofollow" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.
>

You received this message because you are subscribed to the Google Groups "juliastats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdfobfuscatedmailto="_Iuq9DDEQAJ" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">juliastats...@googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" rel="nofollow" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

You received this message because you are subscribed to the Google Groups "juliastats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.


+1 for using @model y ~ 1 + x + z
Its clean, has precedent, and evokes the idea of using `~` to describe the distribution of a random variable (which I don't think is entirely wrong in this context).
I really don't like => aesthetically. I can come up with other minor and maybe stupid reasons to not prefer it... e.g. potential confusion with inequalities.
Just my two cents.
 Alex

You received this message because you are subscribed to the Google Groups "juliastats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.


Le mardi 17 mai 2016 à 22:22 0700, Alex Arslan a écrit :
> I definitely agree with Stefan on this.
>
> Milan, you mentioned that we should have a strong reason to break the
> convention if we choose to do so. I've always found R's use of `~` to
> be a bit unfortunate. It's a carryover from S, which introduced `~`
> for formulas before R was even a thing. But S is also the language
> that brought us `<` for assignment, so I tend to be wary of its
> gifts. ;) At this point I think other languages that offer
> statistical modeling facilities are using `~` just because R does it.
> Julia has the opportunity to potentially set a new precedent as it
> gains traction for stats, so I think we should think carefully about
> the choices.
>
> Stefan's suggestion of `@model` opens the possibility for just about
> any kind of separator because it just becomes the `head` in the
> `Expr` that goes into the macro. But I think the syntax for pairs,
> i.e. `=>`, would make the most sense in terms of consistency with
> existing Julia structures because a model is essentially a pair; it's
> some combination of responses paired with some combination of
> predictors.
As I said, I find => a good idea too. @model sounds a bit verbose to me
(it should really be called @formula anyway), but maybe that's OK if in
practice we can use @fit as a shorthand.
Anyway, I don't really have strong feelings about this either.
Regards
> Anyway, just thinking aloud.
> Alex
>
> > I don't have a strong feeling about `~` versus `=>` – although the
> > R tradition of using `~` seems to at least give a hint about what's
> > going on, which is kind of nice. But I do think that it would be
> > good to get rid of the macro business for ~ and start spelling
> > model specifications as `@model y ~ 1 + x + z` or `@model y => 1 +
> > x + z` and returning some kind of Model type instead of using bare
> > expression objects for this kind of thing. Expression objects
> > already have a meaning in Julia code and it is not to specify
> > statistical models – it is to represent Julia expression trees. The
> > fact that those two meanings can usually be disambiguated easily
> > doesn't mean they should be represented the same way.
> >
> > On Tue, Feb 2, 2016 at 10:15 AM, Milan BouchetValat wrote:
> > > Le mardi 02 février 2016 à 15:30 +0100, Milan BouchetValat a écrit :
> > > > Using Pairs (and therefore =>) sounds like a good idea to me, as it
> > > > conveys exactly the meaning of associating two parts of a formula
> > > > together, in a structure designed for that. (Well, the direction of the
> > > > arrow isn't very natural, but...)
> > > >
> > > > But maybe to make it nicer to read we could make the whole formula an
> > > > expression, i.e.:
> > > > :(y => 1 + x + z)
> > > >
> > > > instead of:
> > > > :y => :(1 + x + z)
> > > >
> > > >
> > > > That would make the syntax very close to what macros would allow to fit
> > > > a model:
> > > > @fit(LinearModel, y => 1 + x + z, data)
> > > >
> > > > (Such a macro, while not strictly necessary, could also allow saving
> > > > the full call expression and the name of the dataset used when fitting
> > > > the model, to print it to the user as R does.)
> > > >
> > > >
> > > > Maybe more importantly, it would remove the requirement for the left
> > > > handside of the formula to be a symbol. Indeed, some models (like PLS
> > > > regression) accept several dependent variables, which could be written
> > > > like this:
> > > > :(y + z => 1 + x)
> > > Actually, scratch that, as the two features are orthogonal.
> > > :(y => 1 + x + z) and :y => :(1 + x + z) both have Symbol as the type
> > > for the LHS. The only difference is whether we have a Pair of
> > > expressions/symbols or a => call with two expression/symbol arguments.
> > >
> > > Yet it might be a bit nicer to write
> > > :(y + z => 1 + x)
> > > rather than
> > > :(y + z) => :(1 + x)
> > >
> > >
> > > > My two cents
> > > >
> > > >
> > > > Le lundi 01 février 2016 à 13:18 0800, Douglas Bates a écrit :
> > > > > The current formula interface for packages like GLM and MixedModels
> > > > > emulates that of R in that a formula is written like
> > > > >
> > > > > y ~ 1 + x + z
> > > > >
> > > > > The difficulty with this form is that the ~ character is used
> > > > > elsewhere in Julia so somewhat nasty tricks need to be used to
> > > > > parse
> > > > > such an expression as a formula.
> > > > >
> > > > > One way to break away from this Rcentric approach is to use a Pair
> > > > > to represent a formula. Because we don't want to evaluate the
> > > > > expressions in a formula at function call it would be necessary to
> > > > > use Pair(Symbol,Expr) or Pair(Expr,Expr) to represent a twosided
> > > > > formula. The translation of the previous formula would be
> > > > >
> > > > > :y => :(1 + x + z)
> > > > >
> > > > > This requires a few extra keystrokes but is not a terrible burden
> > > > > and
> > > > > it would use a native Julia construct. It also serves to visually
> > > > > distinguish a formula in Julia from a formula in R so that we can
> > > > > make other changes in the formula language (e.g. require an
> > > > > explicit
> > > > > 1 for the intercept term) with less confusion for users. Because a
> > > > > formula in Julia looks different from a formula in R it is less
> > > > > confusing that other aspects of the formula syntax are different in
> > > > > Julia and in R.
> > > > >
> > > > > 
> > > > > You received this message because you are subscribed to the Google
> > > > > Groups "juliastats" group.
> > > > > To unsubscribe from this group and stop receiving emails from it,
> > > > > send an email to [hidden email].
> > > > > For more options, visit https://groups.google.com/d/optout.
> > > >
> > >
> > > 
> > > You received this message because you are subscribed to the
> > > Google Groups "juliastats" group.
> > > To unsubscribe from this group and stop receiving emails from it,
> > > send an email to [hidden email].
> > > For more options, visit https://groups.google.com/d/optout.
> > >
> >
> 
> You received this message because you are subscribed to the Google
> Groups "juliastats" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to [hidden email].
> For more options, visit https://groups.google.com/d/optout.

You received this message because you are subscribed to the Google Groups "juliastats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.


One might argue that the mathematical symbol ⇒ means something entirely different from what is implied by the formula operator: that the left side leads to the right by material implication . Also, the intuitive interpretation of => (that the left side leads to the right) is wrong.

You received this message because you are subscribed to the Google Groups "juliastats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.


In stata one specifies a formula without ~ or +
y x1 x2
It works pretty well in my experience. How about dropping ~ and +?

You received this message because you are subscribed to the Google Groups "juliastats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.


Le lundi 15 août 2016 à 20:33 0700, Matthieu a écrit :
> In stata one specifies a formula without ~ or +
> y x1 x2
> It works pretty well in my experience. How about dropping ~ and +?
I don't find it particularly clear that in Stata the response isn't
visually separated from the dependent variables. ~ is really useful
IMHO.
As regards +, it's needed so that the formula is a valid Julia
expression, which is good for consistency (even if formulas end up
being written as strings). That convention also follows the Wilkinson &
Rodgers notation, so there's a precedent in the literature other than
R.
Regards

You received this message because you are subscribed to the Google Groups "juliastats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.


Agreed regarding Stata
formulas.
We could always go the way of SAS and use = to separate the response
from predictors, though that could get a tricky and/or confusing due to
the similarity with how keyword arguments are specified. Not to mention
problematic behavior if used incorrectly...
August
17, 2016 at 7:32 AM
I don't find it
particularly clear that in Stata the response isn't visually
separated from the dependent variables. ~ is really useful IMHO.
As
regards +, it's needed so that the formula is a valid Julia expression,
which is good for consistency (even if formulas end up being written
as strings). That convention also follows the Wilkinson & Rodgers
notation, so there's a precedent in the literature other than R.
Regards
August
15, 2016 at 8:33 PM
In stata one specifies a
formula without ~ or + y x1 x2 It works pretty well in my
experience. How about dropping ~ and +?
One might argue
that the mathematical symbol ⇒ means something entirely
different from what is implied by the formula operator: that the left
side leads to the right by material
implication . Also, the intuitive interpretation of => (that the
left side leads to the right) is wrong.

You received this message because you are subscribed to a topic in the
Google Groups "juliastats" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/juliastats/LdozV7o4zuM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Le mardi 17 mai 2016 à 22:22 0700, Alex Arslan a écrit :
I definitely agree with Stefan on this.
Milan, you mentioned that we should have a strong reason to break the
convention if we choose to do so. I've always found R's use of `~` to
be a bit unfortunate. It's a carryover from S, which introduced `~`
for formulas before R was even a thing. But S is also the language
that brought us `<` for assignment, so I tend to be wary of its
gifts. ;) At this point I think other languages that offer
statistical modeling facilities are using `~` just because R does it.
Julia has the opportunity to potentially set a new precedent as it
gains traction for stats, so I think we should think carefully about
the choices.
Stefan's suggestion of `@model` opens the possibility for just about
any kind of separator because it just becomes the `head` in the
`Expr` that goes into the macro. But I think the syntax for pairs,
i.e. `=>`, would make the most sense in terms of consistency with
existing Julia structures because a model is essentially a pair; it's
some combination of responses paired with some combination of
predictors.
As I said, I find => a good idea too. @model sounds a bit verbose to me
(it should really be called @formula anyway), but maybe that's OK if in
practice we can use @fit as a shorthand.
Anyway, I don't really have strong feelings about this either.
Regards
Anyway, just thinking aloud.
Alex
I don't have a strong feeling about `~` versus `=>` – although the
R tradition of using `~` seems to at least give a hint about what's
going on, which is kind of nice. But I do think that it would be
good to get rid of the macro business for ~ and start spelling
model specifications as `@model y ~ 1 + x + z` or `@model y => 1 +
x + z` and returning some kind of Model type instead of using bare
expression objects for this kind of thing. Expression objects
already have a meaning in Julia code and it is not to specify
statistical models – it is to represent Julia expression trees. The
fact that those two meanings can usually be disambiguated easily
doesn't mean they should be represented the same way.
On Tue, Feb 2, 2016 at 10:15 AM, Milan BouchetValat wrote:
Le mardi 02 février 2016 à 15:30 +0100, Milan BouchetValat a écrit :
Using Pairs (and therefore =>) sounds like a good idea to me, as it
conveys exactly the meaning of associating two parts of a formula
together, in a structure designed for that. (Well, the direction of the
arrow isn't very natural, but...)
But maybe to make it nicer to read we could make the whole formula an
expression, i.e.:
:(y => 1 + x + z)
instead of:
:y => :(1 + x + z)
That would make the syntax very close to what macros would allow to fit
a model:
@fit(LinearModel, y => 1 + x + z, data)
(Such a macro, while not strictly necessary, could also allow saving
the full call expression and the name of the dataset used when fitting
the model, to print it to the user as R does.)
Maybe more importantly, it would remove the requirement for the left
handside of the formula to be a symbol. Indeed, some models (like PLS
regression) accept several dependent variables, which could be written
like this:
:(y + z => 1 + x)
Actually, scratch that, as the two features are orthogonal.
:(y => 1 + x + z) and :y => :(1 + x + z) both have Symbol as the type
for the LHS. The only difference is whether we have a Pair of
expressions/symbols or a => call with two expression/symbol arguments.
Yet it might be a bit nicer to write
:(y + z => 1 + x)
rather than
:(y + z) => :(1 + x)
My two cents
Le lundi 01 février 2016 à 13:18 0800, Douglas Bates a écrit :
The current formula interface for packages like GLM and MixedModels
emulates that of R in that a formula is written like
y ~ 1 + x + z
The difficulty with this form is that the ~ character is used
elsewhere in Julia so somewhat nasty tricks need to be used to
parse
such an expression as a formula.
One way to break away from this Rcentric approach is to use a Pair
to represent a formula. Because we don't want to evaluate the
expressions in a formula at function call it would be necessary to
use Pair(Symbol,Expr) or Pair(Expr,Expr) to represent a twosided
formula. The translation of the previous formula would be
:y => :(1 + x + z)
This requires a few extra keystrokes but is not a terrible burden
and
it would use a native Julia construct. It also serves to visually
distinguish a formula in Julia from a formula in R so that we can
make other changes in the formula language (e.g. require an
explicit
1 for the intercept term) with less confusion for users. Because a
formula in Julia looks different from a formula in R it is less
confusing that other aspects of the formula syntax are different in
Julia and in R.

You received this message because you are subscribed to the Google
Groups "juliastats" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

You received this message because you are subscribed to the
Google Groups "juliastats" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

You received this message because you are subscribed to the Google
Groups "juliastats" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

You received this message because you are subscribed to the Google Groups "juliastats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.


Some of the recent work I've been doing to demo out automatic lifting for tables makes me think that the syntax here doesn't matter too much (as long as it's easy to manipulate inside a macro). 0.5 actually offers us the chance to use substantially more interesting semantics than we have had in the past. We could realistically hope to do something like this now:
@model_matrix(y ~ x + log(x)  sin(x^2), data)
John On Wednesday, August 17, 2016 at 3:34:46 PM UTC7, Alex Arslan wrote:
Agreed regarding Stata
formulas.
We could always go the way of SAS and use = to separate the response
from predictors, though that could get a tricky and/or confusing due to
the similarity with how keyword arguments are specified. Not to mention
problematic behavior if used incorrectly...
August
17, 2016 at 7:32 AM
I don't find it
particularly clear that in Stata the response isn't visually
separated from the dependent variables. ~ is really useful IMHO.
As
regards +, it's needed so that the formula is a valid Julia expression,
which is good for consistency (even if formulas end up being written
as strings). That convention also follows the Wilkinson & Rodgers
notation, so there's a precedent in the literature other than R.
Regards
August
15, 2016 at 8:33 PM
In stata one specifies a
formula without ~ or + y x1 x2 It works pretty well in my
experience. How about dropping ~ and +?
One might argue
that the mathematical symbol ⇒ means something entirely
different from what is implied by the formula operator: that the left
side leads to the right by <a title="Material conditional" href="https://en.wikipedia.org/wiki/Material_conditional" target="_blank" rel="nofollow" onmousedown="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fen.wikipedia.org%2Fwiki%2FMaterial_conditional\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHEUbcJAyZnCMdTCzGstwbbBYKGSw';return true;" onclick="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fen.wikipedia.org%2Fwiki%2FMaterial_conditional\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHEUbcJAyZnCMdTCzGstwbbBYKGSw';return true;">material
implication . Also, the intuitive interpretation of => (that the
left side leads to the right) is wrong.

You received this message because you are subscribed to a topic in the
Google Groups "juliastats" group.
To unsubscribe from this topic, visit <a href="https://groups.google.com/d/topic/juliastats/LdozV7o4zuM/unsubscribe" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/topic/juliastats/LdozV7o4zuM/unsubscribe';return true;" onclick="this.href='https://groups.google.com/d/topic/juliastats/LdozV7o4zuM/unsubscribe';return true;">https://groups.google.com/d/ topic/juliastats/LdozV7o4zuM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.
Le mardi 17 mai 2016 à 22:22 0700, Alex Arslan a écrit :
I definitely agree with Stefan on this.
Milan, you mentioned that we should have a strong reason to break the
convention if we choose to do so. I've always found R's use of `~` to
be a bit unfortunate. It's a carryover from S, which introduced `~`
for formulas before R was even a thing. But S is also the language
that brought us `<` for assignment, so I tend to be wary of its
gifts. ;) At this point I think other languages that offer
statistical modeling facilities are using `~` just because R does it.
Julia has the opportunity to potentially set a new precedent as it
gains traction for stats, so I think we should think carefully about
the choices.
Stefan's suggestion of `@model` opens the possibility for just about
any kind of separator because it just becomes the `head` in the
`Expr` that goes into the macro. But I think the syntax for pairs,
i.e. `=>`, would make the most sense in terms of consistency with
existing Julia structures because a model is essentially a pair; it's
some combination of responses paired with some combination of
predictors.
As I said, I find => a good idea too. @model sounds a bit verbose to me
(it should really be called @formula anyway), but maybe that's OK if in
practice we can use @fit as a shorthand.
Anyway, I don't really have strong feelings about this either.
Regards
Anyway, just thinking aloud.
Alex
I don't have a strong feeling about `~` versus `=>` – although the
R tradition of using `~` seems to at least give a hint about what's
going on, which is kind of nice. But I do think that it would be
good to get rid of the macro business for ~ and start spelling
model specifications as `@model y ~ 1 + x + z` or `@model y => 1 +
x + z` and returning some kind of Model type instead of using bare
expression objects for this kind of thing. Expression objects
already have a meaning in Julia code and it is not to specify
statistical models – it is to represent Julia expression trees. The
fact that those two meanings can usually be disambiguated easily
doesn't mean they should be represented the same way.
On Tue, Feb 2, 2016 at 10:15 AM, Milan BouchetValat wrote:
Le mardi 02 février 2016 à 15:30 +0100, Milan BouchetValat a écrit :
Using Pairs (and therefore =>) sounds like a good idea to me, as it
conveys exactly the meaning of associating two parts of a formula
together, in a structure designed for that. (Well, the direction of the
arrow isn't very natural, but...)
But maybe to make it nicer to read we could make the whole formula an
expression, i.e.:
:(y => 1 + x + z)
instead of:
:y => :(1 + x + z)
That would make the syntax very close to what macros would allow to fit
a model:
@fit(LinearModel, y => 1 + x + z, data)
(Such a macro, while not strictly necessary, could also allow saving
the full call expression and the name of the dataset used when fitting
the model, to print it to the user as R does.)
Maybe more importantly, it would remove the requirement for the left
handside of the formula to be a symbol. Indeed, some models (like PLS
regression) accept several dependent variables, which could be written
like this:
:(y + z => 1 + x)
Actually, scratch that, as the two features are orthogonal.
:(y => 1 + x + z) and :y => :(1 + x + z) both have Symbol as the type
for the LHS. The only difference is whether we have a Pair of
expressions/symbols or a => call with two expression/symbol arguments.
Yet it might be a bit nicer to write
:(y + z => 1 + x)
rather than
:(y + z) => :(1 + x)
My two cents
Le lundi 01 février 2016 à 13:18 0800, Douglas Bates a écrit :
The current formula interface for packages like GLM and MixedModels
emulates that of R in that a formula is written like
y ~ 1 + x + z
The difficulty with this form is that the ~ character is used
elsewhere in Julia so somewhat nasty tricks need to be used to
parse
such an expression as a formula.
One way to break away from this Rcentric approach is to use a Pair
to represent a formula. Because we don't want to evaluate the
expressions in a formula at function call it would be necessary to
use Pair(Symbol,Expr) or Pair(Expr,Expr) to represent a twosided
formula. The translation of the previous formula would be
:y => :(1 + x + z)
This requires a few extra keystrokes but is not a terrible burden
and
it would use a native Julia construct. It also serves to visually
distinguish a formula in Julia from a formula in R so that we can
make other changes in the formula language (e.g. require an
explicit
1 for the intercept term) with less confusion for users. Because a
formula in Julia looks different from a formula in R it is less
confusing that other aspects of the formula syntax are different in
Julia and in R.

You received this message because you are subscribed to the Google
Groups "juliastats" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to juliastats...@googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

You received this message because you are subscribed to the
Google Groups "juliastats" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to juliastats...@googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

You received this message because you are subscribed to the Google
Groups "juliastats" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to [hidden email].
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

You received this message because you are subscribed to the Google Groups "juliastats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.


Amazing! :+1:

You received this message because you are subscribed to the Google Groups "juliastats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.


As part of the JuliaML org/initiative, I'm working on some experimental stuff in Transformations which may be of interest. The goal is that you can define functions using valid julia syntax and let it be parsed into a backendagnostic call graph constructor. To simplify: you would create your "formula", which could be scalar or tensor operations, and that builds type(s) with generated constructors/methods in order to calculate values, derivatives, or whatever else. It would know which variables are inputs, constants, functions, or "learnable parameters".
My vision is that one could define a formula once, then optimize/learn the free parameters, or generate probabilistic samples etc, using Optim, TensorFlow, or any other "backends" that may be able to do something useful. It's very similar to the Plots approach, and I see no reason why it wouldn't work here.
For demonstrations sake, I added a recipe for Mike Innes' Flow.jl, which may form the basis of the "parsing":
using Transformations @flow y(x) = x + log(x)  sin(x^2) using Plots; plot(ans)
As I said, this is experimental so I don't expect everyone to drop what they're doing and help out. However it would be great to have more people involved if you're interested.
Best, Tom

You received this message because you are subscribed to the Google Groups "juliastats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.


this looks like a convention to me, and julia decided to follow the R convention
(R actually follows the Stata convention in: lm(mydataframe))
one thing where i do find Stata superior is the possibility to use wild cards in formulas (including factors and interactions)
it would be nice if julia's formula interface would allow for that as well

You received this message because you are subscribed to the Google Groups "juliastats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

