Julia and the Tower of Babel

classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Julia and the Tower of Babel

Gabriel Gellner

Something that I have been noticing, as I convert more of my research code over to Julia, is how the super easy to use package manager (which I love), coupled with the talent base of the Julia community seems to have a detrimental effect on the API consistency of the many “micro” packages that cover what I would consider the de-facto standard library.

What I mean is that whereas a commercial package like Matlab/Mathematica etc., being written under one large umbrella, will largely (clearly not always) choose consistent names for similar API keyword arguments, and have similar calling conventions for master function like tools (`optimize` versus `lbfgs`, etc), which I am starting to realize is one of the great selling points of these packages as an end user. I can usually guess what a keyword will be in Mathematica, whereas even after a year of using Julia almost exclusively I find I have to look at the documentation (or the source code depending on the documentation ...) to figure out the keyword names in many common packages.

Similarly, in my experience with open source tools, due to the complexity of the package management, we get large “batteries included” distributions that cover a lot of the standard stuff for doing science, like python’s numpy + scipy combination. Whereas in Julia the equivalent of scipy is split over many, separately developed packages (Base, Optim.jl, NLopt.jl, Roots.jl, NLsolve.jl, ODE.jl/DifferentialEquations.jl). Many of these packages are stupid awesome, but they can have dramatically different naming conventions and calling behavior, for essential equivalent behavior. Recently I noticed that tolerances, for example, are named as `atol/rtol` versus `abstol/reltol` versus `abs_tol/rel_tol`, which means is extremely easy to have a piece of scientific code that will need to use all three conventions across different calls to seemingly similar libraries.

Having brought this up I find that the community is largely sympathetic and, in general, would support a common convention, the issue I have slowly realized is that it is rarely that straightforward. In the above example the abstol/reltol versus abs_tol/rel_tol seems like an easy example of what can be tidied up, but the latter underscored name is consistent with similar naming conventions from Optim.jl for other tolerances, so that community is reluctant to change the convention. Similarly, I think there would be little interest in changing abstol/reltol to the underscored version in packages like Base, ODE.jl etc as this feels consistent with each of these code bases. Hence I have started to think that the problem is the micro-packaging. It is much easier to look for consistency within a package then across similar packages, and since Julia seems to distribute so many of the essential tools in very narrow boundaries of functionality I am not sure that this kind of naming convention will ever be able to reach something like a Scipy, or the even higher standard of commercial packages like Matlab/Mathematica. (I am sure there are many more examples like using maxiter, versus iterations for describing stopping criteria in iterative solvers ...)

Even further I have noticed that even when packages try to find consistency across packages, for example Optim.jl <-> Roots.jl <-> NLsolve.jl, when one package changes how they do things (Optim.jl moving to delegation on types for method choice) then again the consistency fractures quickly, where we now have a common divide of using either Typed dispatch keywords versus :method symbol names across the previous packages (not to mention the whole inplace versus not-inplace for function arguments …)

Do people, with more experience in scientific packages ecosystems, feel this is solvable? Or do micro distributions just lead to many, many varying degrees of API conventions that need to be learned by end users? Is this common in communities that use C++ etc? I ask as I wonder how much this kind of thing can be worried about when making small packages is so easy.

Reply | Threaded
Open this post in threaded view
|

RE: Julia and the Tower of Babel

David Anthoff

I don’t have a solution, but I completely agree with the problem description.

 

I guess one small step would be that package authors should follow the patterns in base, if there are any.

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Gabriel Gellner
Sent: Friday, October 7, 2016 8:36 AM
To: julia-users <[hidden email]>
Subject: [julia-users] Julia and the Tower of Babel

 

Something that I have been noticing, as I convert more of my research code over to Julia, is how the super easy to use package manager (which I love), coupled with the talent base of the Julia community seems to have a detrimental effect on the API consistency of the many “micro” packages that cover what I would consider the de-facto standard library.

What I mean is that whereas a commercial package like Matlab/Mathematica etc., being written under one large umbrella, will largely (clearly not always) choose consistent names for similar API keyword arguments, and have similar calling conventions for master function like tools (`optimize` versus `lbfgs`, etc), which I am starting to realize is one of the great selling points of these packages as an end user. I can usually guess what a keyword will be in Mathematica, whereas even after a year of using Julia almost exclusively I find I have to look at the documentation (or the source code depending on the documentation ...) to figure out the keyword names in many common packages.

Similarly, in my experience with open source tools, due to the complexity of the package management, we get large “batteries included” distributions that cover a lot of the standard stuff for doing science, like python’s numpy + scipy combination. Whereas in Julia the equivalent of scipy is split over many, separately developed packages (Base, Optim.jl, NLopt.jl, Roots.jl, NLsolve.jl, ODE.jl/DifferentialEquations.jl). Many of these packages are stupid awesome, but they can have dramatically different naming conventions and calling behavior, for essential equivalent behavior. Recently I noticed that tolerances, for example, are named as `atol/rtol` versus `abstol/reltol` versus `abs_tol/rel_tol`, which means is extremely easy to have a piece of scientific code that will need to use all three conventions across different calls to seemingly similar libraries.

Having brought this up I find that the community is largely sympathetic and, in general, would support a common convention, the issue I have slowly realized is that it is rarely that straightforward. In the above example the abstol/reltol versus abs_tol/rel_tol seems like an easy example of what can be tidied up, but the latter underscored name is consistent with similar naming conventions from Optim.jl for other tolerances, so that community is reluctant to change the convention. Similarly, I think there would be little interest in changing abstol/reltol to the underscored version in packages like Base, ODE.jl etc as this feels consistent with each of these code bases. Hence I have started to think that the problem is the micro-packaging. It is much easier to look for consistency within a package then across similar packages, and since Julia seems to distribute so many of the essential tools in very narrow boundaries of functionality I am not sure that this kind of naming convention will ever be able to reach something like a Scipy, or the even higher standard of commercial packages like Matlab/Mathematica. (I am sure there are many more examples like using maxiter, versus iterations for describing stopping criteria in iterative solvers ...)

Even further I have noticed that even when packages try to find consistency across packages, for example Optim.jl <-> Roots.jl <-> NLsolve.jl, when one package changes how they do things (Optim.jl moving to delegation on types for method choice) then again the consistency fractures quickly, where we now have a common divide of using either Typed dispatch keywords versus :method symbol names across the previous packages (not to mention the whole inplace versus not-inplace for function arguments …)

Do people, with more experience in scientific packages ecosystems, feel this is solvable? Or do micro distributions just lead to many, many varying degrees of API conventions that need to be learned by end users? Is this common in communities that use C++ etc? I ask as I wonder how much this kind of thing can be worried about when making small packages is so easy.

Reply | Threaded
Open this post in threaded view
|

Re: Julia and the Tower of Babel

Tom Breloff
This is something that I've spent a lot of time and energy thinking and discussing, as part of both Plots and JuliaML.  I think the situation can be improved in a big way, but this is not something with a "magic solution".  It takes time, effort, and a constant desire to collaborate and design with care for the greater community.  As soon as people get lazy, it starts to get unwieldy.  So I think the "solution" is just to keep at it... keep trying to collaborate... keep trying to agree on common conventions... and always look to find common ground.  Use Base as a guide, whenever possible, and if there are different conventions in place across packages, then spend the time to agree on shared conventions.  And if people refuse to collaborate, give them crap about it.

On Fri, Oct 7, 2016 at 12:02 PM, David Anthoff <[hidden email]> wrote:

I don’t have a solution, but I completely agree with the problem description.

 

I guess one small step would be that package authors should follow the patterns in base, if there are any.

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Gabriel Gellner
Sent: Friday, October 7, 2016 8:36 AM
To: julia-users <[hidden email]>
Subject: [julia-users] Julia and the Tower of Babel

 

Something that I have been noticing, as I convert more of my research code over to Julia, is how the super easy to use package manager (which I love), coupled with the talent base of the Julia community seems to have a detrimental effect on the API consistency of the many “micro” packages that cover what I would consider the de-facto standard library.

What I mean is that whereas a commercial package like Matlab/Mathematica etc., being written under one large umbrella, will largely (clearly not always) choose consistent names for similar API keyword arguments, and have similar calling conventions for master function like tools (`optimize` versus `lbfgs`, etc), which I am starting to realize is one of the great selling points of these packages as an end user. I can usually guess what a keyword will be in Mathematica, whereas even after a year of using Julia almost exclusively I find I have to look at the documentation (or the source code depending on the documentation ...) to figure out the keyword names in many common packages.

Similarly, in my experience with open source tools, due to the complexity of the package management, we get large “batteries included” distributions that cover a lot of the standard stuff for doing science, like python’s numpy + scipy combination. Whereas in Julia the equivalent of scipy is split over many, separately developed packages (Base, Optim.jl, NLopt.jl, Roots.jl, NLsolve.jl, ODE.jl/DifferentialEquations.jl). Many of these packages are stupid awesome, but they can have dramatically different naming conventions and calling behavior, for essential equivalent behavior. Recently I noticed that tolerances, for example, are named as `atol/rtol` versus `abstol/reltol` versus `abs_tol/rel_tol`, which means is extremely easy to have a piece of scientific code that will need to use all three conventions across different calls to seemingly similar libraries.

Having brought this up I find that the community is largely sympathetic and, in general, would support a common convention, the issue I have slowly realized is that it is rarely that straightforward. In the above example the abstol/reltol versus abs_tol/rel_tol seems like an easy example of what can be tidied up, but the latter underscored name is consistent with similar naming conventions from Optim.jl for other tolerances, so that community is reluctant to change the convention. Similarly, I think there would be little interest in changing abstol/reltol to the underscored version in packages like Base, ODE.jl etc as this feels consistent with each of these code bases. Hence I have started to think that the problem is the micro-packaging. It is much easier to look for consistency within a package then across similar packages, and since Julia seems to distribute so many of the essential tools in very narrow boundaries of functionality I am not sure that this kind of naming convention will ever be able to reach something like a Scipy, or the even higher standard of commercial packages like Matlab/Mathematica. (I am sure there are many more examples like using maxiter, versus iterations for describing stopping criteria in iterative solvers ...)

Even further I have noticed that even when packages try to find consistency across packages, for example Optim.jl <-> Roots.jl <-> NLsolve.jl, when one package changes how they do things (Optim.jl moving to delegation on types for method choice) then again the consistency fractures quickly, where we now have a common divide of using either Typed dispatch keywords versus :method symbol names across the previous packages (not to mention the whole inplace versus not-inplace for function arguments …)

Do people, with more experience in scientific packages ecosystems, feel this is solvable? Or do micro distributions just lead to many, many varying degrees of API conventions that need to be learned by end users? Is this common in communities that use C++ etc? I ask as I wonder how much this kind of thing can be worried about when making small packages is so easy.


Reply | Threaded
Open this post in threaded view
|

Re: Julia and the Tower of Babel

Andreas Lobinger
In reply to this post by Gabriel Gellner
Hello colleague,

On Friday, October 7, 2016 at 5:35:46 PM UTC+2, Gabriel Gellner wrote:

Something that I have been noticing, as I convert more of my research code over to Julia, is how the super easy to use package manager (which I love), coupled with the talent base of the Julia community seems to have a detrimental effect on the API consistency of the many “micro” packages that cover what I would consider the de-facto standard library. ....

 well, you consider 'this' the de-facto standard library and others consider 'that' a reasonable standard library and others ...

If you see the need for standardisation of interfaces, just volunteer to write a style guide and open issues and PRs on the respective packages. All this is open source and the development process is transparent on github. For exactly that reason: collaboration.

I'm contributing to the ecosystem and it has been really a pleasure to be part of the story.

Wishing a happy day,
        Andreas
Reply | Threaded
Open this post in threaded view
|

Re: Julia and the Tower of Babel

John Myles White
In reply to this post by Gabriel Gellner
I don't really see how you can solve this without a single dictator who controls the package ecosystem. I'm not enough of an expert in Python to say how well things work there, but the R ecosystem is vastly less organized than the Julia ecosystem. Insofar as it's getting better, it's because the community has agreed to make Hadley Wickham their benevolent dictator.

 --John

On Friday, October 7, 2016 at 8:35:46 AM UTC-7, Gabriel Gellner wrote:

Something that I have been noticing, as I convert more of my research code over to Julia, is how the super easy to use package manager (which I love), coupled with the talent base of the Julia community seems to have a detrimental effect on the API consistency of the many “micro” packages that cover what I would consider the de-facto standard library.

What I mean is that whereas a commercial package like Matlab/Mathematica etc., being written under one large umbrella, will largely (clearly not always) choose consistent names for similar API keyword arguments, and have similar calling conventions for master function like tools (`optimize` versus `lbfgs`, etc), which I am starting to realize is one of the great selling points of these packages as an end user. I can usually guess what a keyword will be in Mathematica, whereas even after a year of using Julia almost exclusively I find I have to look at the documentation (or the source code depending on the documentation ...) to figure out the keyword names in many common packages.

Similarly, in my experience with open source tools, due to the complexity of the package management, we get large “batteries included” distributions that cover a lot of the standard stuff for doing science, like python’s numpy + scipy combination. Whereas in Julia the equivalent of scipy is split over many, separately developed packages (Base, Optim.jl, NLopt.jl, Roots.jl, NLsolve.jl, ODE.jl/DifferentialEquations.jl). Many of these packages are stupid awesome, but they can have dramatically different naming conventions and calling behavior, for essential equivalent behavior. Recently I noticed that tolerances, for example, are named as `atol/rtol` versus `abstol/reltol` versus `abs_tol/rel_tol`, which means is extremely easy to have a piece of scientific code that will need to use all three conventions across different calls to seemingly similar libraries.

Having brought this up I find that the community is largely sympathetic and, in general, would support a common convention, the issue I have slowly realized is that it is rarely that straightforward. In the above example the abstol/reltol versus abs_tol/rel_tol seems like an easy example of what can be tidied up, but the latter underscored name is consistent with similar naming conventions from Optim.jl for other tolerances, so that community is reluctant to change the convention. Similarly, I think there would be little interest in changing abstol/reltol to the underscored version in packages like Base, ODE.jl etc as this feels consistent with each of these code bases. Hence I have started to think that the problem is the micro-packaging. It is much easier to look for consistency within a package then across similar packages, and since Julia seems to distribute so many of the essential tools in very narrow boundaries of functionality I am not sure that this kind of naming convention will ever be able to reach something like a Scipy, or the even higher standard of commercial packages like Matlab/Mathematica. (I am sure there are many more examples like using maxiter, versus iterations for describing stopping criteria in iterative solvers ...)

Even further I have noticed that even when packages try to find consistency across packages, for example Optim.jl <-> Roots.jl <-> NLsolve.jl, when one package changes how they do things (Optim.jl moving to delegation on types for method choice) then again the consistency fractures quickly, where we now have a common divide of using either Typed dispatch keywords versus :method symbol names across the previous packages (not to mention the whole inplace versus not-inplace for function arguments …)

Do people, with more experience in scientific packages ecosystems, feel this is solvable? Or do micro distributions just lead to many, many varying degrees of API conventions that need to be learned by end users? Is this common in communities that use C++ etc? I ask as I wonder how much this kind of thing can be worried about when making small packages is so easy.

Reply | Threaded
Open this post in threaded view
|

Re: Julia and the Tower of Babel

Gabriel Gellner
Yeah the R system is probably the best guide, as it also has a pretty easy to use package manager ... hence so, so many packages ;) I think python works without a single BDF (for science at least) since the core packages are monolithic, so the consistency is immediately apparent, and I find programmers, as a rule, dislike inconsistent API's within a given project (while seemingly less worried across packages).

In response to Andreas and Tom, I don't mean to sound like I don't want to collaborate, rather starting this discussion between conventions in Base and Optim.jl for example made me realize that it is not clear the solution is just a matter of simple discussion, rather each group would need to sacrifice a certain level of API consistency if they used the other's convention ... and like John says, usually that kind of decision requires someone to make a command from on high, which having a loose package system doesn't always facilitate. But we shall see, maybe it doesn't matter in the long run.

I just find it stressful when I am making my own package on what is the best convention to follow ... every choice feels like a severe tradeoff (do I use reltol to be like Base, which will be less and less of a guide as packages are moved out of Base ..., or do I use rel_tol because my package will commonly be used in conjunction with Optim.jl ...).

Thanks for the response though,
something I noticed, but wasn't sure what other felt.

all the best.

On Friday, October 7, 2016 at 10:49:47 AM UTC-6, John Myles White wrote:
I don't really see how you can solve this without a single dictator who controls the package ecosystem. I'm not enough of an expert in Python to say how well things work there, but the R ecosystem is vastly less organized than the Julia ecosystem. Insofar as it's getting better, it's because the community has agreed to make Hadley Wickham their benevolent dictator.

 --John

On Friday, October 7, 2016 at 8:35:46 AM UTC-7, Gabriel Gellner wrote:

Something that I have been noticing, as I convert more of my research code over to Julia, is how the super easy to use package manager (which I love), coupled with the talent base of the Julia community seems to have a detrimental effect on the API consistency of the many “micro” packages that cover what I would consider the de-facto standard library.

What I mean is that whereas a commercial package like Matlab/Mathematica etc., being written under one large umbrella, will largely (clearly not always) choose consistent names for similar API keyword arguments, and have similar calling conventions for master function like tools (`optimize` versus `lbfgs`, etc), which I am starting to realize is one of the great selling points of these packages as an end user. I can usually guess what a keyword will be in Mathematica, whereas even after a year of using Julia almost exclusively I find I have to look at the documentation (or the source code depending on the documentation ...) to figure out the keyword names in many common packages.

Similarly, in my experience with open source tools, due to the complexity of the package management, we get large “batteries included” distributions that cover a lot of the standard stuff for doing science, like python’s numpy + scipy combination. Whereas in Julia the equivalent of scipy is split over many, separately developed packages (Base, Optim.jl, NLopt.jl, Roots.jl, NLsolve.jl, ODE.jl/DifferentialEquations.jl). Many of these packages are stupid awesome, but they can have dramatically different naming conventions and calling behavior, for essential equivalent behavior. Recently I noticed that tolerances, for example, are named as `atol/rtol` versus `abstol/reltol` versus `abs_tol/rel_tol`, which means is extremely easy to have a piece of scientific code that will need to use all three conventions across different calls to seemingly similar libraries.

Having brought this up I find that the community is largely sympathetic and, in general, would support a common convention, the issue I have slowly realized is that it is rarely that straightforward. In the above example the abstol/reltol versus abs_tol/rel_tol seems like an easy example of what can be tidied up, but the latter underscored name is consistent with similar naming conventions from Optim.jl for other tolerances, so that community is reluctant to change the convention. Similarly, I think there would be little interest in changing abstol/reltol to the underscored version in packages like Base, ODE.jl etc as this feels consistent with each of these code bases. Hence I have started to think that the problem is the micro-packaging. It is much easier to look for consistency within a package then across similar packages, and since Julia seems to distribute so many of the essential tools in very narrow boundaries of functionality I am not sure that this kind of naming convention will ever be able to reach something like a Scipy, or the even higher standard of commercial packages like Matlab/Mathematica. (I am sure there are many more examples like using maxiter, versus iterations for describing stopping criteria in iterative solvers ...)

Even further I have noticed that even when packages try to find consistency across packages, for example Optim.jl <-> Roots.jl <-> NLsolve.jl, when one package changes how they do things (Optim.jl moving to delegation on types for method choice) then again the consistency fractures quickly, where we now have a common divide of using either Typed dispatch keywords versus :method symbol names across the previous packages (not to mention the whole inplace versus not-inplace for function arguments …)

Do people, with more experience in scientific packages ecosystems, feel this is solvable? Or do micro distributions just lead to many, many varying degrees of API conventions that need to be learned by end users? Is this common in communities that use C++ etc? I ask as I wonder how much this kind of thing can be worried about when making small packages is so easy.

Reply | Threaded
Open this post in threaded view
|

Re: Julia and the Tower of Babel

jonathan.bieler
Maybe an "easy" first step would be to have a page (a github repo) containing domain specific naming conventions (atol/abstol) that package
developers can look up. Even though existing packages might not adopt them, at least newly created ones would have a chance
to be more consistent. You could even do a small tool that parse your files and warn you about improper naming.
Reply | Threaded
Open this post in threaded view
|

Re: Julia and the Tower of Babel

Milan Bouchet-Valat
Le samedi 08 octobre 2016 à 01:47 -0700, [hidden email]
a écrit :
> Maybe an "easy" first step would be to have a page (a github repo)
> containing domain specific naming conventions (atol/abstol) that
> package
> developers can look up. Even though existing packages might not adopt
> them, at least newly created ones would have a chance
> to be more consistent. You could even do a small tool that parse your
> files and warn you about improper naming.
Creating a web page like this sounds like a good idea.

As regards automatic checking, note that there's already Lint.jl, to
which a list of "nonstandard" names could be added, together with
recommendations.


Regards
Reply | Threaded
Open this post in threaded view
|

Re: Julia and the Tower of Babel

Chris Rackauckas
In reply to this post by jonathan.bieler
Create a repo where we can all bikeshed different names, agree upon some, and then standardize. I honestly don't care which conventions are chosen and will just find/replace with whatever people want, but there has to be a "whatever people want" to do that.

On Saturday, October 8, 2016 at 1:47:07 AM UTC-7, [hidden email] wrote:
Maybe an "easy" first step would be to have a page (a github repo) containing domain specific naming conventions (atol/abstol) that package
developers can look up. Even though existing packages might not adopt them, at least newly created ones would have a chance
to be more consistent. You could even do a small tool that parse your files and warn you about improper naming.
Reply | Threaded
Open this post in threaded view
|

Re: Julia and the Tower of Babel

Traktor Toni
In reply to this post by Gabriel Gellner
In my opinion the solutions to this are very clear, or would be:

1. make a mandatory linter for all julia code
2. julia IDEs should offer good intellisense

Am Freitag, 7. Oktober 2016 17:35:46 UTC+2 schrieb Gabriel Gellner:

Something that I have been noticing, as I convert more of my research code over to Julia, is how the super easy to use package manager (which I love), coupled with the talent base of the Julia community seems to have a detrimental effect on the API consistency of the many “micro” packages that cover what I would consider the de-facto standard library.

What I mean is that whereas a commercial package like Matlab/Mathematica etc., being written under one large umbrella, will largely (clearly not always) choose consistent names for similar API keyword arguments, and have similar calling conventions for master function like tools (`optimize` versus `lbfgs`, etc), which I am starting to realize is one of the great selling points of these packages as an end user. I can usually guess what a keyword will be in Mathematica, whereas even after a year of using Julia almost exclusively I find I have to look at the documentation (or the source code depending on the documentation ...) to figure out the keyword names in many common packages.

Similarly, in my experience with open source tools, due to the complexity of the package management, we get large “batteries included” distributions that cover a lot of the standard stuff for doing science, like python’s numpy + scipy combination. Whereas in Julia the equivalent of scipy is split over many, separately developed packages (Base, Optim.jl, NLopt.jl, Roots.jl, NLsolve.jl, ODE.jl/DifferentialEquations.jl). Many of these packages are stupid awesome, but they can have dramatically different naming conventions and calling behavior, for essential equivalent behavior. Recently I noticed that tolerances, for example, are named as `atol/rtol` versus `abstol/reltol` versus `abs_tol/rel_tol`, which means is extremely easy to have a piece of scientific code that will need to use all three conventions across different calls to seemingly similar libraries.

Having brought this up I find that the community is largely sympathetic and, in general, would support a common convention, the issue I have slowly realized is that it is rarely that straightforward. In the above example the abstol/reltol versus abs_tol/rel_tol seems like an easy example of what can be tidied up, but the latter underscored name is consistent with similar naming conventions from Optim.jl for other tolerances, so that community is reluctant to change the convention. Similarly, I think there would be little interest in changing abstol/reltol to the underscored version in packages like Base, ODE.jl etc as this feels consistent with each of these code bases. Hence I have started to think that the problem is the micro-packaging. It is much easier to look for consistency within a package then across similar packages, and since Julia seems to distribute so many of the essential tools in very narrow boundaries of functionality I am not sure that this kind of naming convention will ever be able to reach something like a Scipy, or the even higher standard of commercial packages like Matlab/Mathematica. (I am sure there are many more examples like using maxiter, versus iterations for describing stopping criteria in iterative solvers ...)

Even further I have noticed that even when packages try to find consistency across packages, for example Optim.jl <-> Roots.jl <-> NLsolve.jl, when one package changes how they do things (Optim.jl moving to delegation on types for method choice) then again the consistency fractures quickly, where we now have a common divide of using either Typed dispatch keywords versus :method symbol names across the previous packages (not to mention the whole inplace versus not-inplace for function arguments …)

Do people, with more experience in scientific packages ecosystems, feel this is solvable? Or do micro distributions just lead to many, many varying degrees of API conventions that need to be learned by end users? Is this common in communities that use C++ etc? I ask as I wonder how much this kind of thing can be worried about when making small packages is so easy.

Reply | Threaded
Open this post in threaded view
|

Re: Julia and the Tower of Babel

Chris Rackauckas
Conventions would have to be arrived at before this is possible.

On Saturday, October 8, 2016 at 3:39:55 AM UTC-7, Traktor Toni wrote:
In my opinion the solutions to this are very clear, or would be:

1. make a mandatory linter for all julia code
2. julia IDEs should offer good intellisense

Am Freitag, 7. Oktober 2016 17:35:46 UTC+2 schrieb Gabriel Gellner:

Something that I have been noticing, as I convert more of my research code over to Julia, is how the super easy to use package manager (which I love), coupled with the talent base of the Julia community seems to have a detrimental effect on the API consistency of the many “micro” packages that cover what I would consider the de-facto standard library.

What I mean is that whereas a commercial package like Matlab/Mathematica etc., being written under one large umbrella, will largely (clearly not always) choose consistent names for similar API keyword arguments, and have similar calling conventions for master function like tools (`optimize` versus `lbfgs`, etc), which I am starting to realize is one of the great selling points of these packages as an end user. I can usually guess what a keyword will be in Mathematica, whereas even after a year of using Julia almost exclusively I find I have to look at the documentation (or the source code depending on the documentation ...) to figure out the keyword names in many common packages.

Similarly, in my experience with open source tools, due to the complexity of the package management, we get large “batteries included” distributions that cover a lot of the standard stuff for doing science, like python’s numpy + scipy combination. Whereas in Julia the equivalent of scipy is split over many, separately developed packages (Base, Optim.jl, NLopt.jl, Roots.jl, NLsolve.jl, ODE.jl/DifferentialEquations.jl). Many of these packages are stupid awesome, but they can have dramatically different naming conventions and calling behavior, for essential equivalent behavior. Recently I noticed that tolerances, for example, are named as `atol/rtol` versus `abstol/reltol` versus `abs_tol/rel_tol`, which means is extremely easy to have a piece of scientific code that will need to use all three conventions across different calls to seemingly similar libraries.

Having brought this up I find that the community is largely sympathetic and, in general, would support a common convention, the issue I have slowly realized is that it is rarely that straightforward. In the above example the abstol/reltol versus abs_tol/rel_tol seems like an easy example of what can be tidied up, but the latter underscored name is consistent with similar naming conventions from Optim.jl for other tolerances, so that community is reluctant to change the convention. Similarly, I think there would be little interest in changing abstol/reltol to the underscored version in packages like Base, ODE.jl etc as this feels consistent with each of these code bases. Hence I have started to think that the problem is the micro-packaging. It is much easier to look for consistency within a package then across similar packages, and since Julia seems to distribute so many of the essential tools in very narrow boundaries of functionality I am not sure that this kind of naming convention will ever be able to reach something like a Scipy, or the even higher standard of commercial packages like Matlab/Mathematica. (I am sure there are many more examples like using maxiter, versus iterations for describing stopping criteria in iterative solvers ...)

Even further I have noticed that even when packages try to find consistency across packages, for example Optim.jl <-> Roots.jl <-> NLsolve.jl, when one package changes how they do things (Optim.jl moving to delegation on types for method choice) then again the consistency fractures quickly, where we now have a common divide of using either Typed dispatch keywords versus :method symbol names across the previous packages (not to mention the whole inplace versus not-inplace for function arguments …)

Do people, with more experience in scientific packages ecosystems, feel this is solvable? Or do micro distributions just lead to many, many varying degrees of API conventions that need to be learned by end users? Is this common in communities that use C++ etc? I ask as I wonder how much this kind of thing can be worried about when making small packages is so easy.

Reply | Threaded
Open this post in threaded view
|

Re: Julia and the Tower of Babel

Jeffrey Sarnoff
I have created a new Organization on github: JuliaPraxis.
Everyone who has added to this thread will get an invitation to join, and so contribute.
I will set up the site and let you know how do include your wor(l)d views.

Anyone else is welcome to post to this thread, and I will send an invitation.



On Saturday, October 8, 2016 at 6:59:51 AM UTC-4, Chris Rackauckas wrote:
Conventions would have to be arrived at before this is possible.

On Saturday, October 8, 2016 at 3:39:55 AM UTC-7, Traktor Toni wrote:
In my opinion the solutions to this are very clear, or would be:

1. make a mandatory linter for all julia code
2. julia IDEs should offer good intellisense

Am Freitag, 7. Oktober 2016 17:35:46 UTC+2 schrieb Gabriel Gellner:

Something that I have been noticing, as I convert more of my research code over to Julia, is how the super easy to use package manager (which I love), coupled with the talent base of the Julia community seems to have a detrimental effect on the API consistency of the many “micro” packages that cover what I would consider the de-facto standard library.

What I mean is that whereas a commercial package like Matlab/Mathematica etc., being written under one large umbrella, will largely (clearly not always) choose consistent names for similar API keyword arguments, and have similar calling conventions for master function like tools (`optimize` versus `lbfgs`, etc), which I am starting to realize is one of the great selling points of these packages as an end user. I can usually guess what a keyword will be in Mathematica, whereas even after a year of using Julia almost exclusively I find I have to look at the documentation (or the source code depending on the documentation ...) to figure out the keyword names in many common packages.

Similarly, in my experience with open source tools, due to the complexity of the package management, we get large “batteries included” distributions that cover a lot of the standard stuff for doing science, like python’s numpy + scipy combination. Whereas in Julia the equivalent of scipy is split over many, separately developed packages (Base, Optim.jl, NLopt.jl, Roots.jl, NLsolve.jl, ODE.jl/DifferentialEquations.jl). Many of these packages are stupid awesome, but they can have dramatically different naming conventions and calling behavior, for essential equivalent behavior. Recently I noticed that tolerances, for example, are named as `atol/rtol` versus `abstol/reltol` versus `abs_tol/rel_tol`, which means is extremely easy to have a piece of scientific code that will need to use all three conventions across different calls to seemingly similar libraries.

Having brought this up I find that the community is largely sympathetic and, in general, would support a common convention, the issue I have slowly realized is that it is rarely that straightforward. In the above example the abstol/reltol versus abs_tol/rel_tol seems like an easy example of what can be tidied up, but the latter underscored name is consistent with similar naming conventions from Optim.jl for other tolerances, so that community is reluctant to change the convention. Similarly, I think there would be little interest in changing abstol/reltol to the underscored version in packages like Base, ODE.jl etc as this feels consistent with each of these code bases. Hence I have started to think that the problem is the micro-packaging. It is much easier to look for consistency within a package then across similar packages, and since Julia seems to distribute so many of the essential tools in very narrow boundaries of functionality I am not sure that this kind of naming convention will ever be able to reach something like a Scipy, or the even higher standard of commercial packages like Matlab/Mathematica. (I am sure there are many more examples like using maxiter, versus iterations for describing stopping criteria in iterative solvers ...)

Even further I have noticed that even when packages try to find consistency across packages, for example Optim.jl <-> Roots.jl <-> NLsolve.jl, when one package changes how they do things (Optim.jl moving to delegation on types for method choice) then again the consistency fractures quickly, where we now have a common divide of using either Typed dispatch keywords versus :method symbol names across the previous packages (not to mention the whole inplace versus not-inplace for function arguments)

Do people, with more experience in scientific packages ecosystems, feel this is solvable? Or do micro distributions just lead to many, many varying degrees of API conventions that need to be learned by end users? Is this common in communities that use C++ etc? I ask as I wonder how much this kind of thing can be worried about when making small packages is so easy.

Reply | Threaded
Open this post in threaded view
|

Re: Julia and the Tower of Babel

Giuseppe Ragusa
it seems a good idea JuliaPraxis. I have been struggling with trying to get consistent naming and having a guide to follow may at least cut short the struggling time.
Reply | Threaded
Open this post in threaded view
|

Re: Julia and the Tower of Babel

Tsur Herman
I noticed this also .. and this is why I chose to "rip" some packages for some of its functionality.

From what I observed the problem is the "coolness" of the language and the highly creative level of the package writers. Just as the first post here
states the seemingly two advantages , cool language and super-creative package writers .. can some time have a "babel tower" effect.

I encountered this with respect to image processing geometry primitive manipulation etc .. the problem is: too many types!!

if something can be represented as an array with some convention for example MxN array where M is the Descriptor size and N is the number of Descriptors  .. then it is better to use and support that 
than to declare more specialized types.

At least for fast paced research and idea validation it is better. Probably for implementation and performance specialized types optimized for speed will be required..
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Julia and the Tower of Babel

Tom Breloff
I think sometimes people go overboard with types, but types allow us to take full advantage of multiple dispatch and abstraction on another level.  For example, a diagonal matrix and a full/dense matrix are both the same thing, but if you can dispatch on them differently you can massively improve the effectiveness/performance of the underlying code without much effort.  A recent thread here was asking about how to do this effectively in Python, and... well, everyone just kinda laughed at the idea.  Types allow us flexibility that we can't have otherwise.

On Sat, Oct 8, 2016 at 10:12 AM, Tsur Herman <[hidden email]> wrote:
I noticed this also .. and this is why I chose to "rip" some packages for some of its functionality.

From what I observed the problem is the "coolness" of the language and the highly creative level of the package writers. Just as the first post here
states the seemingly two advantages , cool language and super-creative package writers .. can some time have a "babel tower" effect.

I encountered this with respect to image processing geometry primitive manipulation etc .. the problem is: too many types!!

if something can be represented as an array with some convention for example MxN array where M is the Descriptor size and N is the number of Descriptors  .. then it is better to use and support that 
than to declare more specialized types.

At least for fast paced research and idea validation it is better. Probably for implementation and performance specialized types optimized for speed will be required..
 
 

Reply | Threaded
Open this post in threaded view
|

Re: Julia and the Tower of Babel

Stefan Karpinski
In reply to this post by Tsur Herman
Good generic API design is one of the hardest problems around. For many problem areas, we just haven't found the right design yet. JuMP is one of the prime examples of brilliant work in this area. Mathematica is the best example of consistent APIs in a language and it's ecosystem because Stephen Wolfram literally reviews and approves every single function that's added. We can't do that since this is an open source community and we don't have dictators, and honestly no one has the time or breadth of expertise to do this for all the amazing areas people are using Julia in. There are some things that are helpful, however.

GitHub orgs. Having related packages under a single org is weirdly effective – way more than it seems like it should be. I think this is about awareness and communication. Not a panacea, but more helpful than you would imagine.

Communication. Long hard conversations like this one. Get people talking about what the common API should look like. Once people agree on a good one, implementation is often easier than one might think.

Generic functions. Julia's multiple dispatch is good at this, especially because it allows you to disentangle nouns and verbs, and different people can work on different parts of the vocabulary. Have a good set of nouns like Distributions? Anyone can add their own verbs. Have some good consistent verbs? Making them apply to your own nouns is no problem either. See the esoteric-seeming expression problem [1,2,3] – which doesn't even occur to Julia programmers as being a problem because the solution is so natural.

Persistence. The more speculative and active a research area is, the less likely we are to have a consensus on what the generic interfaces and APIs should look like. Optimization APIs were all over the place until things like JuMP and Convex came along. Now you can swap out different solvers easily and keep the expression of your problem the same. Changing deep learning backends should be just as easy, but it's certainly not – because people are still trying to figure out how what the interface between how you program and how you implement these systems is.

Summary: keep trying, communicate, create organizations, and use multiple dispatch effectively.


On Sat, Oct 8, 2016 at 10:12 AM, Tsur Herman <[hidden email]> wrote:
I noticed this also .. and this is why I chose to "rip" some packages for some of its functionality.

From what I observed the problem is the "coolness" of the language and the highly creative level of the package writers. Just as the first post here
states the seemingly two advantages , cool language and super-creative package writers .. can some time have a "babel tower" effect.

I encountered this with respect to image processing geometry primitive manipulation etc .. the problem is: too many types!!

if something can be represented as an array with some convention for example MxN array where M is the Descriptor size and N is the number of Descriptors  .. then it is better to use and support that 
than to declare more specialized types.

At least for fast paced research and idea validation it is better. Probably for implementation and performance specialized types optimized for speed will be required..
 
 

Reply | Threaded
Open this post in threaded view
|

Re: Julia and the Tower of Babel

Michael Borregaard

Great to see this brought up here, and to read the constructive and thought-provoking responses from members of the Julia community. I feel this is highly important and I have thougt a lot about it recently, as I am writing an invited guest editorial for a leading ecological journal about how transferring to julia as the lingua franca for ecological scientists may affect the way we do science and work together.

I come to this from a somewhat different angle, as the ecological community is almost 100% wedded to R – the use of R has practically exploded within the last 5 years alone. So when I came to julia I was struck by how structured the package ecosystem appears to be, yet, in spite of the micropackaging. This seems to me to be a huge advantage for collaboration, creativity and methods development, and IMHO this will in the end be a stronger argument for our community to make the transition than the speed of computation.

I think there are a number of reasons for this difference, but I also believe that a primary reason is the reliance on github for developing the package ecosystem from the bottom up, and the use of organizations. These organisations, like JuliaGeo, BioJulia etc in effect act like standard package distributions, both by facilitating communication within, but also by imposing a set of strict guidelines on code compatibility. Centrally, the organisations are really visible centers for where development in a given field takes place, and thus the culture encourages developers to contribute to existing packages and organisations rather than inventing new packages. In R that is not the case - instead most scientific packages are one-lab projects developed to serve a certain research program.

I do hope that this can continue in the future, but one might worry: right now most julia developers are driven by a desire to help build the language itself, but when it grows over a certain size and becomes established this is sure to become less pronounced. Also, the current practice of software papers in scientific journals means that researchers get credit for developing new packages, but none for contributing to existing packages. This directly counteracts the best interests of the community.

It is a new situation to have a scientific language that is built openly and communally, yet with such a high degree of integration and communication. The solution must be culture, as written by Stefan and Tom, specifically to develop the community culture to keep communicating, discussing and agreeing upon standards. Also, for instance, organizations like the biojulia community are very good at identifying new ad-hoc packages coming out that relate to their work, and invite developers to join the communal effort and build the foundation of Julia in their field instead of creating lots of partial alternatives. I think this is key.

But perhaps this could be strengthened by being more explicit about building modular 'standard libraries', like in the respective organizations, but perhaps also for base (or statistics/numerical analysis, at least) that impose strict internal guidelines for conformance? These organizations, of course, would need mechanisms for ensuring renewal within the basic ideoms, so development does not die.

I for one will follow this development with keen interest.
Reply | Threaded
Open this post in threaded view
|

Re: Julia and the Tower of Babel

Jeffrey Sarnoff
__JuliaPraxis__ is on [github](https://github.com/JuliaPraxis) and [gitter](https://gitter.im/JuliaPraxis/Lobby), welcoming growth.





Reply | Threaded
Open this post in threaded view
|

Re: Julia and the Tower of Babel

Jeffrey Sarnoff
In reply to this post by Jeffrey Sarnoff

JuliaPraxis is on  github and gitter ... bring our praxes. 


On Saturday, October 8, 2016 at 8:42:05 AM UTC-4, Jeffrey Sarnoff wrote:
I have created a new Organization on github: JuliaPraxis.
Everyone who has added to this thread will get an invitation to join, and so contribute.
I will set up the site and let you know how do include your wor(l)d views.

Anyone else is welcome to post to this thread, and I will send an invitation.



On Saturday, October 8, 2016 at 6:59:51 AM UTC-4, Chris Rackauckas wrote:
Conventions would have to be arrived at before this is possible.

On Saturday, October 8, 2016 at 3:39:55 AM UTC-7, Traktor Toni wrote:
In my opinion the solutions to this are very clear, or would be:

1. make a mandatory linter for all julia code
2. julia IDEs should offer good intellisense

Am Freitag, 7. Oktober 2016 17:35:46 UTC+2 schrieb Gabriel Gellner:

Something that I have been noticing, as I convert more of my research code over to Julia, is how the super easy to use package manager (which I love), coupled with the talent base of the Julia community seems to have a detrimental effect on the API consistency of the many “micro” packages that cover what I would consider the de-facto standard library.

What I mean is that whereas a commercial package like Matlab/Mathematica etc., being written under one large umbrella, will largely (clearly not always) choose consistent names for similar API keyword arguments, and have similar calling conventions for master function like tools (`optimize` versus `lbfgs`, etc), which I am starting to realize is one of the great selling points of these packages as an end user. I can usually guess what a keyword will be in Mathematica, whereas even after a year of using Julia almost exclusively I find I have to look at the documentation (or the source code depending on the documentation ...) to figure out the keyword names in many common packages.

Similarly, in my experience with open source tools, due to the complexity of the package management, we get large “batteries included” distributions that cover a lot of the standard stuff for doing science, like python’s numpy + scipy combination. Whereas in Julia the equivalent of scipy is split over many, separately developed packages (Base, Optim.jl, NLopt.jl, Roots.jl, NLsolve.jl, ODE.jl/DifferentialEquations.jl). Many of these packages are stupid awesome, but they can have dramatically different naming conventions and calling behavior, for essential equivalent behavior. Recently I noticed that tolerances, for example, are named as `atol/rtol` versus `abstol/reltol` versus `abs_tol/rel_tol`, which means is extremely easy to have a piece of scientific code that will need to use all three conventions across different calls to seemingly similar libraries.

Having brought this up I find that the community is largely sympathetic and, in general, would support a common convention, the issue I have slowly realized is that it is rarely that straightforward. In the above example the abstol/reltol versus abs_tol/rel_tol seems like an easy example of what can be tidied up, but the latter underscored name is consistent with similar naming conventions from Optim.jl for other tolerances, so that community is reluctant to change the convention. Similarly, I think there would be little interest in changing abstol/reltol to the underscored version in packages like Base, ODE.jl etc as this feels consistent with each of these code bases. Hence I have started to think that the problem is the micro-packaging. It is much easier to look for consistency within a package then across similar packages, and since Julia seems to distribute so many of the essential tools in very narrow boundaries of functionality I am not sure that this kind of naming convention will ever be able to reach something like a Scipy, or the even higher standard of commercial packages like Matlab/Mathematica. (I am sure there are many more examples like using maxiter, versus iterations for describing stopping criteria in iterative solvers ...)

Even further I have noticed that even when packages try to find consistency across packages, for example Optim.jl <-> Roots.jl <-> NLsolve.jl, when one package changes how they do things (Optim.jl moving to delegation on types for method choice) then again the consistency fractures quickly, where we now have a common divide of using either Typed dispatch keywords versus :method symbol names across the previous packages (not to mention the whole inplace versus not-inplace for function arguments)

Do people, with more experience in scientific packages ecosystems, feel this is solvable? Or do micro distributions just lead to many, many varying degrees of API conventions that need to be learned by end users? Is this common in communities that use C++ etc? I ask as I wonder how much this kind of thing can be worried about when making small packages is so easy.

Reply | Threaded
Open this post in threaded view
|

Re: Julia and the Tower of Babel

Páll Haraldsson
In reply to this post by Gabriel Gellner
On Friday, October 7, 2016 at 3:35:46 PM UTC, Gabriel Gellner wrote:

`atol/rtol` versus


 

`abstol/reltol` versus `abs_tol/rel_tol`


For the latter "versus" at least (and other examples), this would be solved by style-insensitivity, as in Nimrod (or Nim) language, the only one I've heard that does this; not sure of status of it, maybe they dropped it with the name-change).

I hesitated to propose this for Julia, when I first discovered this, I'm/was conflicted; I thought this would break code, as it's a breaking change, but would in fact help(?)

This could in theory be done with a macro(?)



Style Insensitive?
https://github.com/nim-lang/Nim/issues/521
"Nimrod is a style-insensitive language. This means that it is not case-sensitive and even underscores are ignored: type is a reserved word, and so is TYPE or T_Y_P_E. The idea behind this is that this allows programmers to use their own preferred spelling style and libraries written by different programmers cannot use incompatible conventions.

Please rethink about that or at least give us an option to disable both: case insensitive and also underscore ignored

[another user]:

Also a consistent style for code bases is VASTLY overrated, in fact I almost never had the luxury of it and yet it was never a problem."


Trivia on Nim[rod], D and upcoming(?) C++ below, I was just looking up hard to find above info..):

http://nim-lang.org/docs/nep1.html

Naming Conventions


* Type identifiers should be in PascalCase. All other identifiers should be in camelCase with the exception of constants which may use PascalCase but are not required to.
[..]

For constants coming from a C/C++ wrapper, ALL_UPPERCASE are allowed, but ugly. (Why shout CONSTANT? Constants do no harm, variables do!)



http://nim-lang.org/


  • * A fast non-tracing garbage collector that supports soft real-time systems (like games).

  • * System programming features: Ability to manage your own memory and access the hardware directly. Pointers to garbage collected memory are distinguished from pointers to manually managed memory.

  • [..]

    * Macros can modify the abstract syntax tree at compile time.
    [..]
  • * Macros cannot change Nim's syntax because there is no need for it. Nim's syntax is flexible enough.

  • * Statements are grouped by indentation but can span multiple lines. Indentation must not contain tabulators so the compiler always sees the code the same way as you do.


  • https://en.wikipedia.org/wiki/Nim_(programming_language)

    "Nim (formerly named Nimrod)
    [..]

    Language design

    Influenced by

    [..]
    <a href="javascript:void(0)" title="Lisp (programming language)">Lisp: Macro system, embrace the AST, homoiconicity
    [..]


    UFCS, a feature supported by Nim" [and D]:


    https://en.wikipedia.org/wiki/Uniform_Function_Call_Syntax


    "It has been proposed (as of 2016) for addition to C++ by Bjarne Stroustrup[3] and Herb Sutter, to reduce the ambiguous decision between

    [..]

        // All the followings are correct and equivalent
        int b = first(a);
        int c = a.first();
        int d = a.first;
    "
    12