Re: Indexing

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Indexing

Alexander Pyattaev
It is curious how this discussion was quietly closed with nobody paying any attention to the issue. Sad but true - most generic programming things have 0-based indexes. The reason for this is simple - performance=) With 0-based, most pointer arithmetics are way easier than with 1-based, this is why it is used in C. Further, in assembly everything will be zero-based for years to come unless someone reinvents the processor instruction set. Essentially, you are deliberately adding an extra operation to every single array operation... 
This gets worse for multi-dimensional arrays. Let us consider a 20x20 matrix.
 The memory offset for 5'th element in 2'nd  row is 4 + 20*1  in zero-based and 5-1+20*(2-1) for 1-based. The more dimensions, the more extra operations need to be done for every access. For things like loops over every element of a matrix this is REALLY painful. 
For example, I want to make a stupid loop to sum all elements in array (C--style):
int sum =0;
for (i=0; i<N; i++)
{
sum = sum + array[i];
}
In C, the index i can be immediately used for memory access. However, in 1-based indexing the code actually has one extra addition for every loop iteration. 
Naturally, the compiler will optimize some of the overhead away, but what I wonder is how exactly are those optimizations done in Julia. Based on what I have seen so far it is not quite clear how this is done.


On Wednesday, February 29, 2012 at 2:29:28 AM UTC+2, Andrei Jirnyi wrote:
IMO, Python model is rather unique and unnatural -- I think there is a
very good reason no other language implements indexing in that
particular manner. For me personally, the "non-inclusive-at-the-right"
indexing is probably the biggest issue why all my attempts at getting
into the language (I try about once per year, about 6 times so far...)
have so far failed :) (there are a couple others, e.g. weird default
parameters, but I think this one takes the cake.)

C-style 0-based indexes (i.e. with x[0:2] standing for the first three
elements) could work -- Ox is a very good example of that; but really,
the similarity to Matlab (and Fortran, and R, and Gauss, and SAS/IML,
and even Mata I think) is a *very* good thing, which is a huge selling
point :) so my pick would be definitely to stay with the base of 1.

Fortran is btw very handy in that it allows anything to be a base --
and I've used 0 based arrays in Fortran a few times, most commonly
when translating C code. If this is not too much work, I guess it
could be a nice addition -- but definitely non-critical :)

--aj
Reply | Threaded
Open this post in threaded view
|

Re: Indexing

John Myles White
Please do not revive such old threads. It is very disrespectul to the actively contributing members of our community to attempt to initiate a debate about design decisions that were finalized years ago.

 -- John

On Aug 26, 2015, at 3:27 PM, Alexander Pyattaev <[hidden email]> wrote:

It is curious how this discussion was quietly closed with nobody paying any attention to the issue. Sad but true - most generic programming things have 0-based indexes. The reason for this is simple - performance=) With 0-based, most pointer arithmetics are way easier than with 1-based, this is why it is used in C. Further, in assembly everything will be zero-based for years to come unless someone reinvents the processor instruction set. Essentially, you are deliberately adding an extra operation to every single array operation... 
This gets worse for multi-dimensional arrays. Let us consider a 20x20 matrix.
 The memory offset for 5'th element in 2'nd  row is 4 + 20*1  in zero-based and 5-1+20*(2-1) for 1-based. The more dimensions, the more extra operations need to be done for every access. For things like loops over every element of a matrix this is REALLY painful. 
For example, I want to make a stupid loop to sum all elements in array (C--style):
int sum =0;
for (i=0; i<N; i++)
{
sum = sum + array[i];
}
In C, the index i can be immediately used for memory access. However, in 1-based indexing the code actually has one extra addition for every loop iteration. 
Naturally, the compiler will optimize some of the overhead away, but what I wonder is how exactly are those optimizations done in Julia. Based on what I have seen so far it is not quite clear how this is done.


On Wednesday, February 29, 2012 at 2:29:28 AM UTC+2, Andrei Jirnyi wrote:
IMO, Python model is rather unique and unnatural -- I think there is a
very good reason no other language implements indexing in that
particular manner. For me personally, the "non-inclusive-at-the-right"
indexing is probably the biggest issue why all my attempts at getting
into the language (I try about once per year, about 6 times so far...)
have so far failed :) (there are a couple others, e.g. weird default
parameters, but I think this one takes the cake.)

C-style 0-based indexes (i.e. with x[0:2] standing for the first three
elements) could work -- Ox is a very good example of that; but really,
the similarity to Matlab (and Fortran, and R, and Gauss, and SAS/IML,
and even Mata I think) is a *very* good thing, which is a huge selling
point :) so my pick would be definitely to stay with the base of 1.

Fortran is btw very handy in that it allows anything to be a base --
and I've used 0 based arrays in Fortran a few times, most commonly
when translating C code. If this is not too much work, I guess it
could be a nice addition -- but definitely non-critical :)

--aj

Reply | Threaded
Open this post in threaded view
|

Re: Indexing

Milan Bouchet-Valat
Le lundi 31 août 2015 à 10:34 +0200, John Myles White a écrit :
> Please do not revive such old threads. It is very disrespectul to the
> actively contributing members of our community to attempt to initiate
> a debate about design decisions that were finalized years ago.
Also, criticizing these decisions and ending your post with "it is not
quite clear how this is done" isn't really the right approach. If you
have questions about how this works, ask politely, and people will give
you pointers.

Finally, I suspect you'll have a hard time creating a benchmark where
the -1 operation really matters as much as you think it does. That's
where you should have started.


My two cents

>  -- John
>
> > On Aug 26, 2015, at 3:27 PM, Alexander Pyattaev <
> > [hidden email]> wrote:
> >
> > It is curious how this discussion was quietly closed with nobody
> > paying any attention to the issue. Sad but true - most generic
> > programming things have 0-based indexes. The reason for this is
> > simple - performance=) With 0-based, most pointer arithmetics are
> > way easier than with 1-based, this is why it is used in C. Further,
> > in assembly everything will be zero-based for years to come unless
> > someone reinvents the processor instruction set. Essentially, you
> > are deliberately adding an extra operation to every single array
> > operation...
> > This gets worse for multi-dimensional arrays. Let us consider a
> > 20x20 matrix.
> >  The memory offset for 5'th element in 2'nd  row is 4 + 20*1  in
> > zero-based and 5-1+20*(2-1) for 1-based. The more dimensions, the
> > more extra operations need to be done for every access. For things
> > like loops over every element of a matrix this is REALLY painful.
> > For example, I want to make a stupid loop to sum all elements in
> > array (C--style):
> > int sum =0;
> > for (i=0; i<N; i++)
> > {
> > sum = sum + array[i];
> > }
> > In C, the index i can be immediately used for memory access.
> > However, in 1-based indexing the code actually has one extra
> > addition for every loop iteration.
> > Naturally, the compiler will optimize some of the overhead away,
> > but what I wonder is how exactly are those optimizations done in
> > Julia. Based on what I have seen so far it is not quite clear how
> > this is done.
> >
> >
> > On Wednesday, February 29, 2012 at 2:29:28 AM UTC+2, Andrei Jirnyi
> > wrote:
> > > IMO, Python model is rather unique and unnatural -- I think there
> > > is a
> > > very good reason no other language implements indexing in that
> > > particular manner. For me personally, the "non-inclusive-at-the
> > > -right"
> > > indexing is probably the biggest issue why all my attempts at
> > > getting
> > > into the language (I try about once per year, about 6 times so
> > > far...)
> > > have so far failed :) (there are a couple others, e.g. weird
> > > default
> > > parameters, but I think this one takes the cake.)
> > >
> > > C-style 0-based indexes (i.e. with x[0:2] standing for the first
> > > three
> > > elements) could work -- Ox is a very good example of that; but
> > > really,
> > > the similarity to Matlab (and Fortran, and R, and Gauss, and
> > > SAS/IML,
> > > and even Mata I think) is a *very* good thing, which is a huge
> > > selling
> > > point :) so my pick would be definitely to stay with the base of
> > > 1.
> > >
> > > Fortran is btw very handy in that it allows anything to be a base
> > > --
> > > and I've used 0 based arrays in Fortran a few times, most
> > > commonly
> > > when translating C code. If this is not too much work, I guess it
> > > could be a nice addition -- but definitely non-critical :)
> > >
> > > --aj
Reply | Threaded
Open this post in threaded view
|

Re: Indexing

Alexander Pyattaev

Guys, guys, no disrepect meant. But clearly there must have been something to compensate the overhead in index calculations. While I do not prefer the 1 indexing, I program in both styles just fine. I am not suggesting anyone goes ahead and changes it) What I do wonder about is how you got around optimizing array access. After all, index origin is a matter of taste, pointer arithmetic efficient is not.

On Aug 31, 2015 11:52 AM, "Milan Bouchet-Valat" <[hidden email]> wrote:
Le lundi 31 août 2015 à 10:34 +0200, John Myles White a écrit :
> Please do not revive such old threads. It is very disrespectul to the
> actively contributing members of our community to attempt to initiate
> a debate about design decisions that were finalized years ago.
Also, criticizing these decisions and ending your post with "it is not
quite clear how this is done" isn't really the right approach. If you
have questions about how this works, ask politely, and people will give
you pointers.

Finally, I suspect you'll have a hard time creating a benchmark where
the -1 operation really matters as much as you think it does. That's
where you should have started.


My two cents

>  -- John
>
> > On Aug 26, 2015, at 3:27 PM, Alexander Pyattaev <
> > [hidden email]> wrote:
> >
> > It is curious how this discussion was quietly closed with nobody
> > paying any attention to the issue. Sad but true - most generic
> > programming things have 0-based indexes. The reason for this is
> > simple - performance=) With 0-based, most pointer arithmetics are
> > way easier than with 1-based, this is why it is used in C. Further,
> > in assembly everything will be zero-based for years to come unless
> > someone reinvents the processor instruction set. Essentially, you
> > are deliberately adding an extra operation to every single array
> > operation...
> > This gets worse for multi-dimensional arrays. Let us consider a
> > 20x20 matrix.
> >  The memory offset for 5'th element in 2'nd  row is 4 + 20*1  in
> > zero-based and 5-1+20*(2-1) for 1-based. The more dimensions, the
> > more extra operations need to be done for every access. For things
> > like loops over every element of a matrix this is REALLY painful.
> > For example, I want to make a stupid loop to sum all elements in
> > array (C--style):
> > int sum =0;
> > for (i=0; i<N; i++)
> > {
> > sum = sum + array[i];
> > }
> > In C, the index i can be immediately used for memory access.
> > However, in 1-based indexing the code actually has one extra
> > addition for every loop iteration.
> > Naturally, the compiler will optimize some of the overhead away,
> > but what I wonder is how exactly are those optimizations done in
> > Julia. Based on what I have seen so far it is not quite clear how
> > this is done.
> >
> >
> > On Wednesday, February 29, 2012 at 2:29:28 AM UTC+2, Andrei Jirnyi
> > wrote:
> > > IMO, Python model is rather unique and unnatural -- I think there
> > > is a
> > > very good reason no other language implements indexing in that
> > > particular manner. For me personally, the "non-inclusive-at-the
> > > -right"
> > > indexing is probably the biggest issue why all my attempts at
> > > getting
> > > into the language (I try about once per year, about 6 times so
> > > far...)
> > > have so far failed :) (there are a couple others, e.g. weird
> > > default
> > > parameters, but I think this one takes the cake.)
> > >
> > > C-style 0-based indexes (i.e. with x[0:2] standing for the first
> > > three
> > > elements) could work -- Ox is a very good example of that; but
> > > really,
> > > the similarity to Matlab (and Fortran, and R, and Gauss, and
> > > SAS/IML,
> > > and even Mata I think) is a *very* good thing, which is a huge
> > > selling
> > > point :) so my pick would be definitely to stay with the base of
> > > 1.
> > >
> > > Fortran is btw very handy in that it allows anything to be a base
> > > --
> > > and I've used 0 based arrays in Fortran a few times, most
> > > commonly
> > > when translating C code. If this is not too much work, I guess it
> > > could be a nice addition -- but definitely non-critical :)
> > >
> > > --aj
Reply | Threaded
Open this post in threaded view
|

Re: Indexing

Steven Sagaert
In reply to this post by Alexander Pyattaev
This is B.S.
1-based indexing is actually far more natural. Do "normal" people count 0,1,2 ,...? No, they count 1,2, 3,...
1-based indexing is the "normal" way & used by most scientific languages (not to mention by mathematics!). 0-based indexing only was used in C because C confuses arrays & pointers. A C-array is just syntactic sugar for a pointer and hence a C-array index is actually a pointer offset not a index in the mathematical sense. Since C was so succesful many other languages took that system over especially all C-descendants like C++, Java, C#, PHP,javascript,...
0-based indexing is also not universal in general purpose languages. E.g. Pascal had 1-based indexing.

0-based indexing actually leads to a lot of "one-off" indexing errors especially with new programmers who's brains has not been "rewired" to 0-based counting.



On Friday, February 24, 2012 at 1:06:36 AM UTC+1, Noam wrote:
Hello,

My name is Noam Yorav-Raphael. I found out about Julia two days ago and got really excited - I stayed up until 4am and read the manual and experimented.

I want to add my word about why I think 1-based indexing is wrong, in the hope that there's some chance that you'll change it. (I tried to reply to the thread, but couldn't)

One reason is that it will make Julia a scientific-only language. All of today's general-purpose languages are zero-based. Using 1-based indexes will mean that it will be cumbersome to interface with many things written in other languages, and that people who are looking for a general-purpose language will be skeptical about Julia. I'm using Python a lot, and numpy/scipy/matplotlib quite a bit, and I find the fact that you can do both scientific stuff and other stuff (GUI, web, files, DB) a great advantage. When I read about Julia I thought, "Wow, a language better than Python and Matlab!" With 1-based indexing I'm afraid it will just be better than Matlab.

The other reason is that I think that 0-based indexing, with (start <= i < stop) ranges are great. The moment I saw the following diagram, when learning Python, I felt that suddenly years of adding and subtracting ones have come to an end.

0   1   2   3   4   5
|   |   |   |   |   |
+---+---+---+---+---+
| a | b | c | d | e |
+---+---+---+---+---+

The diagram shows an array with 5 elements, with a number line on top of it. Instead of numbers specifying cells (which have size), they specify places - just like coordinates do. So a[2] means "the value which comes right after 2", which is ok. It shines when you realize that a[2:4] means "cut the rectangle between places 2 and 4 and take what's in between". It's so clear! You want to take the last three elements of the array? Just use a[len(a)-3:len(a)] and it's obvious why it works. Compare this to a[len(a)-2:len(a)]. And I'm not talking about Python's a[-3:] shortcut, which wouldn't make sense in 1-based. You want to drop the first three elements? It's a[3:len(a)], not a[4:len(a)]. You just don't need to do plus ones and minus ones all the time.

And generally, about counting from 0 --

Why does the 18th century include all the year that start with 17--, except for 1700 which belongs to the 17th century and 1800 which belongs to the 18th? Because we count starting with 1. If the first year since Jesus' fourth birthday was called year 0, and the first century since was called the 0th century, the century number 17 would simply include the years 1700-1799.

What I'm saying is that the only advantage of 1-based indexing is that we're used to it. Besides that, it just causes trouble. Starting to count from 0 is the right thing to do. That's the reason why the assembly code to get the third element of an array will always include the number 2 - it's not some computer-sciency stuff. It's because its *real* index is 2, not 3.

Mathematicians use 1-based indices when it doesn't matter. Whenever they use 1-based indexing, they could use 0-based indexing and it will work just as fine. However, try to use 1-based indexing to index polynomial coefficients (a_0 + a_1*x + a_2*x^2) or simply digits (123 = 1*10^2 + 2*10^1 + 3*10^0) - it just won't work.

Sometimes, using 1-based indices won't hurt you. Sometimes it will. Using 0-based indices will always work. That's why 0-based arrays in Python+numpy work fine. That's why older programming languages (fortran, cobol, algol, not including lisp) use 1-based indices and today all common languages are 0-based (*). You'd have thought that since people tend to stick with the familiar, all programming languages today would be 1-based. But in spite of people used to counting from 1, and people used to programming languages counting from 1, people moved to languages that count from 0. Isn't that a sure sign that there were enough problems with 1-based so people moved to 0-based?

(*) <a href="http://en.wikipedia.org/wiki/Comparison_of_programming_languages_(array)#Array_system_cross-reference_list" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\75http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FComparison_of_programming_languages_(array)%23Array_system_cross-reference_list\46sa\75D\46sntz\0751\46usg\75AFQjCNH80EDUH4o9qvRiFey8rrVb0tFOBw&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\75http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FComparison_of_programming_languages_(array)%23Array_system_cross-reference_list\46sa\75D\46sntz\0751\46usg\75AFQjCNH80EDUH4o9qvRiFey8rrVb0tFOBw&#39;;return true;">http://en.wikipedia.org/wiki/Comparison_of_programming_languages_(array)#Array_system_cross-reference_list

So, if you want a language that would be more than a Matlab replacement, and I think that you do, go with 0-based indexing. It will require some getting used to from Matlab programmers (and after that it'll be fine), but it will let you join the wider programming community.

I hope I wasn't rude. The idea of writing your own ultimate language really excited me, so when I spotted something I thought is wrong I had to try to convince you to fix it :)

Best luck,
Noam
Reply | Threaded
Open this post in threaded view
|

Re: Indexing

Mauro
In reply to this post by Alexander Pyattaev
To answer your question, you could look at the machine code with
@code_native, modify that to use 0-based indexing and do benchmarks.

On Mon, 2015-08-31 at 11:46, Alexander Pyattaev <[hidden email]> wrote:

> Guys, guys, no disrepect meant. But clearly there must have been something
> to compensate the overhead in index calculations. While I do not prefer the
> 1 indexing, I program in both styles just fine. I am not suggesting anyone
> goes ahead and changes it) What I do wonder about is how you got around
> optimizing array access. After all, index origin is a matter of taste,
> pointer arithmetic efficient is not.
> On Aug 31, 2015 11:52 AM, "Milan Bouchet-Valat" <[hidden email]> wrote:
>
>> Le lundi 31 août 2015 à 10:34 +0200, John Myles White a écrit :
>> > Please do not revive such old threads. It is very disrespectul to the
>> > actively contributing members of our community to attempt to initiate
>> > a debate about design decisions that were finalized years ago.
>> Also, criticizing these decisions and ending your post with "it is not
>> quite clear how this is done" isn't really the right approach. If you
>> have questions about how this works, ask politely, and people will give
>> you pointers.
>>
>> Finally, I suspect you'll have a hard time creating a benchmark where
>> the -1 operation really matters as much as you think it does. That's
>> where you should have started.
>>
>>
>> My two cents
>>
>> >  -- John
>> >
>> > > On Aug 26, 2015, at 3:27 PM, Alexander Pyattaev <
>> > > [hidden email]> wrote:
>> > >
>> > > It is curious how this discussion was quietly closed with nobody
>> > > paying any attention to the issue. Sad but true - most generic
>> > > programming things have 0-based indexes. The reason for this is
>> > > simple - performance=) With 0-based, most pointer arithmetics are
>> > > way easier than with 1-based, this is why it is used in C. Further,
>> > > in assembly everything will be zero-based for years to come unless
>> > > someone reinvents the processor instruction set. Essentially, you
>> > > are deliberately adding an extra operation to every single array
>> > > operation...
>> > > This gets worse for multi-dimensional arrays. Let us consider a
>> > > 20x20 matrix.
>> > >  The memory offset for 5'th element in 2'nd  row is 4 + 20*1  in
>> > > zero-based and 5-1+20*(2-1) for 1-based. The more dimensions, the
>> > > more extra operations need to be done for every access. For things
>> > > like loops over every element of a matrix this is REALLY painful.
>> > > For example, I want to make a stupid loop to sum all elements in
>> > > array (C--style):
>> > > int sum =0;
>> > > for (i=0; i<N; i++)
>> > > {
>> > > sum = sum + array[i];
>> > > }
>> > > In C, the index i can be immediately used for memory access.
>> > > However, in 1-based indexing the code actually has one extra
>> > > addition for every loop iteration.
>> > > Naturally, the compiler will optimize some of the overhead away,
>> > > but what I wonder is how exactly are those optimizations done in
>> > > Julia. Based on what I have seen so far it is not quite clear how
>> > > this is done.
>> > >
>> > >
>> > > On Wednesday, February 29, 2012 at 2:29:28 AM UTC+2, Andrei Jirnyi
>> > > wrote:
>> > > > IMO, Python model is rather unique and unnatural -- I think there
>> > > > is a
>> > > > very good reason no other language implements indexing in that
>> > > > particular manner. For me personally, the "non-inclusive-at-the
>> > > > -right"
>> > > > indexing is probably the biggest issue why all my attempts at
>> > > > getting
>> > > > into the language (I try about once per year, about 6 times so
>> > > > far...)
>> > > > have so far failed :) (there are a couple others, e.g. weird
>> > > > default
>> > > > parameters, but I think this one takes the cake.)
>> > > >
>> > > > C-style 0-based indexes (i.e. with x[0:2] standing for the first
>> > > > three
>> > > > elements) could work -- Ox is a very good example of that; but
>> > > > really,
>> > > > the similarity to Matlab (and Fortran, and R, and Gauss, and
>> > > > SAS/IML,
>> > > > and even Mata I think) is a *very* good thing, which is a huge
>> > > > selling
>> > > > point :) so my pick would be definitely to stay with the base of
>> > > > 1.
>> > > >
>> > > > Fortran is btw very handy in that it allows anything to be a base
>> > > > --
>> > > > and I've used 0 based arrays in Fortran a few times, most
>> > > > commonly
>> > > > when translating C code. If this is not too much work, I guess it
>> > > > could be a nice addition -- but definitely non-critical :)
>> > > >
>> > > > --aj
>>

Reply | Threaded
Open this post in threaded view
|

Re: Indexing

Alexander Pyattaev

Dear Mauro,
What do you mean by modifying with 0-based indexing? I have looked at native, and it does not seem to optimize anything at all. Maybe the actual machine code is somehow optimized further, but native does not show that AFAIK.
Alex.

On Aug 31, 2015 1:00 PM, "Mauro" <[hidden email]> wrote:
To answer your question, you could look at the machine code with
@code_native, modify that to use 0-based indexing and do benchmarks.

On Mon, 2015-08-31 at 11:46, Alexander Pyattaev <[hidden email]> wrote:
> Guys, guys, no disrepect meant. But clearly there must have been something
> to compensate the overhead in index calculations. While I do not prefer the
> 1 indexing, I program in both styles just fine. I am not suggesting anyone
> goes ahead and changes it) What I do wonder about is how you got around
> optimizing array access. After all, index origin is a matter of taste,
> pointer arithmetic efficient is not.
> On Aug 31, 2015 11:52 AM, "Milan Bouchet-Valat" <[hidden email]> wrote:
>
>> Le lundi 31 août 2015 à 10:34 +0200, John Myles White a écrit :
>> > Please do not revive such old threads. It is very disrespectul to the
>> > actively contributing members of our community to attempt to initiate
>> > a debate about design decisions that were finalized years ago.
>> Also, criticizing these decisions and ending your post with "it is not
>> quite clear how this is done" isn't really the right approach. If you
>> have questions about how this works, ask politely, and people will give
>> you pointers.
>>
>> Finally, I suspect you'll have a hard time creating a benchmark where
>> the -1 operation really matters as much as you think it does. That's
>> where you should have started.
>>
>>
>> My two cents
>>
>> >  -- John
>> >
>> > > On Aug 26, 2015, at 3:27 PM, Alexander Pyattaev <
>> > > [hidden email]> wrote:
>> > >
>> > > It is curious how this discussion was quietly closed with nobody
>> > > paying any attention to the issue. Sad but true - most generic
>> > > programming things have 0-based indexes. The reason for this is
>> > > simple - performance=) With 0-based, most pointer arithmetics are
>> > > way easier than with 1-based, this is why it is used in C. Further,
>> > > in assembly everything will be zero-based for years to come unless
>> > > someone reinvents the processor instruction set. Essentially, you
>> > > are deliberately adding an extra operation to every single array
>> > > operation...
>> > > This gets worse for multi-dimensional arrays. Let us consider a
>> > > 20x20 matrix.
>> > >  The memory offset for 5'th element in 2'nd  row is 4 + 20*1  in
>> > > zero-based and 5-1+20*(2-1) for 1-based. The more dimensions, the
>> > > more extra operations need to be done for every access. For things
>> > > like loops over every element of a matrix this is REALLY painful.
>> > > For example, I want to make a stupid loop to sum all elements in
>> > > array (C--style):
>> > > int sum =0;
>> > > for (i=0; i<N; i++)
>> > > {
>> > > sum = sum + array[i];
>> > > }
>> > > In C, the index i can be immediately used for memory access.
>> > > However, in 1-based indexing the code actually has one extra
>> > > addition for every loop iteration.
>> > > Naturally, the compiler will optimize some of the overhead away,
>> > > but what I wonder is how exactly are those optimizations done in
>> > > Julia. Based on what I have seen so far it is not quite clear how
>> > > this is done.
>> > >
>> > >
>> > > On Wednesday, February 29, 2012 at 2:29:28 AM UTC+2, Andrei Jirnyi
>> > > wrote:
>> > > > IMO, Python model is rather unique and unnatural -- I think there
>> > > > is a
>> > > > very good reason no other language implements indexing in that
>> > > > particular manner. For me personally, the "non-inclusive-at-the
>> > > > -right"
>> > > > indexing is probably the biggest issue why all my attempts at
>> > > > getting
>> > > > into the language (I try about once per year, about 6 times so
>> > > > far...)
>> > > > have so far failed :) (there are a couple others, e.g. weird
>> > > > default
>> > > > parameters, but I think this one takes the cake.)
>> > > >
>> > > > C-style 0-based indexes (i.e. with x[0:2] standing for the first
>> > > > three
>> > > > elements) could work -- Ox is a very good example of that; but
>> > > > really,
>> > > > the similarity to Matlab (and Fortran, and R, and Gauss, and
>> > > > SAS/IML,
>> > > > and even Mata I think) is a *very* good thing, which is a huge
>> > > > selling
>> > > > point :) so my pick would be definitely to stay with the base of
>> > > > 1.
>> > > >
>> > > > Fortran is btw very handy in that it allows anything to be a base
>> > > > --
>> > > > and I've used 0 based arrays in Fortran a few times, most
>> > > > commonly
>> > > > when translating C code. If this is not too much work, I guess it
>> > > > could be a nice addition -- but definitely non-critical :)
>> > > >
>> > > > --aj
>>

Reply | Threaded
Open this post in threaded view
|

Re: Indexing

Mauro
> What do you mean by modifying with 0-based indexing? I have looked at
> native, and it does not seem to optimize anything at all. Maybe the actual
> machine code is somehow optimized further, but native does not show that

I meant that you translate the assembly code to 0-based *by hand*.  Then
put both into some kind of harness to make them callable.  Finally run
benchmarks using the two implementations.  I suspect you'll need to be
crafty with assembler to be able to do this.  (I couldn't do it)

> AFAIK.
> Alex.
> On Aug 31, 2015 1:00 PM, "Mauro" <[hidden email]> wrote:
>
>> To answer your question, you could look at the machine code with
>> @code_native, modify that to use 0-based indexing and do benchmarks.
>>
>> On Mon, 2015-08-31 at 11:46, Alexander Pyattaev <[hidden email]>
>> wrote:
>> > Guys, guys, no disrepect meant. But clearly there must have been
>> something
>> > to compensate the overhead in index calculations. While I do not prefer
>> the
>> > 1 indexing, I program in both styles just fine. I am not suggesting
>> anyone
>> > goes ahead and changes it) What I do wonder about is how you got around
>> > optimizing array access. After all, index origin is a matter of taste,
>> > pointer arithmetic efficient is not.
>> > On Aug 31, 2015 11:52 AM, "Milan Bouchet-Valat" <[hidden email]>
>> wrote:
>> >
>> >> Le lundi 31 août 2015 à 10:34 +0200, John Myles White a écrit :
>> >> > Please do not revive such old threads. It is very disrespectul to the
>> >> > actively contributing members of our community to attempt to initiate
>> >> > a debate about design decisions that were finalized years ago.
>> >> Also, criticizing these decisions and ending your post with "it is not
>> >> quite clear how this is done" isn't really the right approach. If you
>> >> have questions about how this works, ask politely, and people will give
>> >> you pointers.
>> >>
>> >> Finally, I suspect you'll have a hard time creating a benchmark where
>> >> the -1 operation really matters as much as you think it does. That's
>> >> where you should have started.
>> >>
>> >>
>> >> My two cents
>> >>
>> >> >  -- John
>> >> >
>> >> > > On Aug 26, 2015, at 3:27 PM, Alexander Pyattaev <
>> >> > > [hidden email]> wrote:
>> >> > >
>> >> > > It is curious how this discussion was quietly closed with nobody
>> >> > > paying any attention to the issue. Sad but true - most generic
>> >> > > programming things have 0-based indexes. The reason for this is
>> >> > > simple - performance=) With 0-based, most pointer arithmetics are
>> >> > > way easier than with 1-based, this is why it is used in C. Further,
>> >> > > in assembly everything will be zero-based for years to come unless
>> >> > > someone reinvents the processor instruction set. Essentially, you
>> >> > > are deliberately adding an extra operation to every single array
>> >> > > operation...
>> >> > > This gets worse for multi-dimensional arrays. Let us consider a
>> >> > > 20x20 matrix.
>> >> > >  The memory offset for 5'th element in 2'nd  row is 4 + 20*1  in
>> >> > > zero-based and 5-1+20*(2-1) for 1-based. The more dimensions, the
>> >> > > more extra operations need to be done for every access. For things
>> >> > > like loops over every element of a matrix this is REALLY painful.
>> >> > > For example, I want to make a stupid loop to sum all elements in
>> >> > > array (C--style):
>> >> > > int sum =0;
>> >> > > for (i=0; i<N; i++)
>> >> > > {
>> >> > > sum = sum + array[i];
>> >> > > }
>> >> > > In C, the index i can be immediately used for memory access.
>> >> > > However, in 1-based indexing the code actually has one extra
>> >> > > addition for every loop iteration.
>> >> > > Naturally, the compiler will optimize some of the overhead away,
>> >> > > but what I wonder is how exactly are those optimizations done in
>> >> > > Julia. Based on what I have seen so far it is not quite clear how
>> >> > > this is done.
>> >> > >
>> >> > >
>> >> > > On Wednesday, February 29, 2012 at 2:29:28 AM UTC+2, Andrei Jirnyi
>> >> > > wrote:
>> >> > > > IMO, Python model is rather unique and unnatural -- I think there
>> >> > > > is a
>> >> > > > very good reason no other language implements indexing in that
>> >> > > > particular manner. For me personally, the "non-inclusive-at-the
>> >> > > > -right"
>> >> > > > indexing is probably the biggest issue why all my attempts at
>> >> > > > getting
>> >> > > > into the language (I try about once per year, about 6 times so
>> >> > > > far...)
>> >> > > > have so far failed :) (there are a couple others, e.g. weird
>> >> > > > default
>> >> > > > parameters, but I think this one takes the cake.)
>> >> > > >
>> >> > > > C-style 0-based indexes (i.e. with x[0:2] standing for the first
>> >> > > > three
>> >> > > > elements) could work -- Ox is a very good example of that; but
>> >> > > > really,
>> >> > > > the similarity to Matlab (and Fortran, and R, and Gauss, and
>> >> > > > SAS/IML,
>> >> > > > and even Mata I think) is a *very* good thing, which is a huge
>> >> > > > selling
>> >> > > > point :) so my pick would be definitely to stay with the base of
>> >> > > > 1.
>> >> > > >
>> >> > > > Fortran is btw very handy in that it allows anything to be a base
>> >> > > > --
>> >> > > > and I've used 0 based arrays in Fortran a few times, most
>> >> > > > commonly
>> >> > > > when translating C code. If this is not too much work, I guess it
>> >> > > > could be a nice addition -- but definitely non-critical :)
>> >> > > >
>> >> > > > --aj
>> >>
>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: Indexing

Alexander Pyattaev

Hm. I see your point. Basically, the result of this would not be quite accurate, since some array access is may be translated into vector instructions on x86... I think the way to go about this is just C code compiled with LLVM, and see the difference. But that would not be a Julia benchmark anymore... Either way, I will see what can be done about it and let you know.

On Aug 31, 2015 2:36 PM, "Mauro" <[hidden email]> wrote:
> What do you mean by modifying with 0-based indexing? I have looked at
> native, and it does not seem to optimize anything at all. Maybe the actual
> machine code is somehow optimized further, but native does not show that

I meant that you translate the assembly code to 0-based *by hand*.  Then
put both into some kind of harness to make them callable.  Finally run
benchmarks using the two implementations.  I suspect you'll need to be
crafty with assembler to be able to do this.  (I couldn't do it)

> AFAIK.
> Alex.
> On Aug 31, 2015 1:00 PM, "Mauro" <[hidden email]> wrote:
>
>> To answer your question, you could look at the machine code with
>> @code_native, modify that to use 0-based indexing and do benchmarks.
>>
>> On Mon, 2015-08-31 at 11:46, Alexander Pyattaev <[hidden email]>
>> wrote:
>> > Guys, guys, no disrepect meant. But clearly there must have been
>> something
>> > to compensate the overhead in index calculations. While I do not prefer
>> the
>> > 1 indexing, I program in both styles just fine. I am not suggesting
>> anyone
>> > goes ahead and changes it) What I do wonder about is how you got around
>> > optimizing array access. After all, index origin is a matter of taste,
>> > pointer arithmetic efficient is not.
>> > On Aug 31, 2015 11:52 AM, "Milan Bouchet-Valat" <[hidden email]>
>> wrote:
>> >
>> >> Le lundi 31 août 2015 à 10:34 +0200, John Myles White a écrit :
>> >> > Please do not revive such old threads. It is very disrespectul to the
>> >> > actively contributing members of our community to attempt to initiate
>> >> > a debate about design decisions that were finalized years ago.
>> >> Also, criticizing these decisions and ending your post with "it is not
>> >> quite clear how this is done" isn't really the right approach. If you
>> >> have questions about how this works, ask politely, and people will give
>> >> you pointers.
>> >>
>> >> Finally, I suspect you'll have a hard time creating a benchmark where
>> >> the -1 operation really matters as much as you think it does. That's
>> >> where you should have started.
>> >>
>> >>
>> >> My two cents
>> >>
>> >> >  -- John
>> >> >
>> >> > > On Aug 26, 2015, at 3:27 PM, Alexander Pyattaev <
>> >> > > [hidden email]> wrote:
>> >> > >
>> >> > > It is curious how this discussion was quietly closed with nobody
>> >> > > paying any attention to the issue. Sad but true - most generic
>> >> > > programming things have 0-based indexes. The reason for this is
>> >> > > simple - performance=) With 0-based, most pointer arithmetics are
>> >> > > way easier than with 1-based, this is why it is used in C. Further,
>> >> > > in assembly everything will be zero-based for years to come unless
>> >> > > someone reinvents the processor instruction set. Essentially, you
>> >> > > are deliberately adding an extra operation to every single array
>> >> > > operation...
>> >> > > This gets worse for multi-dimensional arrays. Let us consider a
>> >> > > 20x20 matrix.
>> >> > >  The memory offset for 5'th element in 2'nd  row is 4 + 20*1  in
>> >> > > zero-based and 5-1+20*(2-1) for 1-based. The more dimensions, the
>> >> > > more extra operations need to be done for every access. For things
>> >> > > like loops over every element of a matrix this is REALLY painful.
>> >> > > For example, I want to make a stupid loop to sum all elements in
>> >> > > array (C--style):
>> >> > > int sum =0;
>> >> > > for (i=0; i<N; i++)
>> >> > > {
>> >> > > sum = sum + array[i];
>> >> > > }
>> >> > > In C, the index i can be immediately used for memory access.
>> >> > > However, in 1-based indexing the code actually has one extra
>> >> > > addition for every loop iteration.
>> >> > > Naturally, the compiler will optimize some of the overhead away,
>> >> > > but what I wonder is how exactly are those optimizations done in
>> >> > > Julia. Based on what I have seen so far it is not quite clear how
>> >> > > this is done.
>> >> > >
>> >> > >
>> >> > > On Wednesday, February 29, 2012 at 2:29:28 AM UTC+2, Andrei Jirnyi
>> >> > > wrote:
>> >> > > > IMO, Python model is rather unique and unnatural -- I think there
>> >> > > > is a
>> >> > > > very good reason no other language implements indexing in that
>> >> > > > particular manner. For me personally, the "non-inclusive-at-the
>> >> > > > -right"
>> >> > > > indexing is probably the biggest issue why all my attempts at
>> >> > > > getting
>> >> > > > into the language (I try about once per year, about 6 times so
>> >> > > > far...)
>> >> > > > have so far failed :) (there are a couple others, e.g. weird
>> >> > > > default
>> >> > > > parameters, but I think this one takes the cake.)
>> >> > > >
>> >> > > > C-style 0-based indexes (i.e. with x[0:2] standing for the first
>> >> > > > three
>> >> > > > elements) could work -- Ox is a very good example of that; but
>> >> > > > really,
>> >> > > > the similarity to Matlab (and Fortran, and R, and Gauss, and
>> >> > > > SAS/IML,
>> >> > > > and even Mata I think) is a *very* good thing, which is a huge
>> >> > > > selling
>> >> > > > point :) so my pick would be definitely to stay with the base of
>> >> > > > 1.
>> >> > > >
>> >> > > > Fortran is btw very handy in that it allows anything to be a base
>> >> > > > --
>> >> > > > and I've used 0 based arrays in Fortran a few times, most
>> >> > > > commonly
>> >> > > > when translating C code. If this is not too much work, I guess it
>> >> > > > could be a nice addition -- but definitely non-critical :)
>> >> > > >
>> >> > > > --aj
>> >>
>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: Indexing

Yichao Yu
On Mon, Aug 31, 2015 at 8:17 AM, Alexander Pyattaev
<[hidden email]> wrote:
> Hm. I see your point. Basically, the result of this would not be quite
> accurate, since some array access is may be translated into vector
> instructions on x86... I think the way to go about this is just C code
> compiled with LLVM, and see the difference. But that would not be a Julia
> benchmark anymore... Either way, I will see what can be done about it and
> let you know.

For the following code (make sure to use floating point type to
disable auto vectorization)
```jl
          @inbounds for i in eachindex(a)
              s += a[i]
          end
```

The assembly loop is
```asm
L32:    vaddsd  (%rcx), %xmm0, %xmm0
       addq    $8, %rcx
       addq    $-1, %rax
       jne     L32
```

AFAICT, there's no "extra operation to every single array operation".
The load address is directly calculated by incrementing a pointer and
the %rax decrement is the loop condition.

>
> On Aug 31, 2015 2:36 PM, "Mauro" <[hidden email]> wrote:
>>
>> > What do you mean by modifying with 0-based indexing? I have looked at
>> > native, and it does not seem to optimize anything at all. Maybe the
>> > actual
>> > machine code is somehow optimized further, but native does not show that
>>
>> I meant that you translate the assembly code to 0-based *by hand*.  Then
>> put both into some kind of harness to make them callable.  Finally run
>> benchmarks using the two implementations.  I suspect you'll need to be
>> crafty with assembler to be able to do this.  (I couldn't do it)
>>
>> > AFAIK.
>> > Alex.
>> > On Aug 31, 2015 1:00 PM, "Mauro" <[hidden email]> wrote:
>> >
>> >> To answer your question, you could look at the machine code with
>> >> @code_native, modify that to use 0-based indexing and do benchmarks.
>> >>
>> >> On Mon, 2015-08-31 at 11:46, Alexander Pyattaev
>> >> <[hidden email]>
>> >> wrote:
>> >> > Guys, guys, no disrepect meant. But clearly there must have been
>> >> something
>> >> > to compensate the overhead in index calculations. While I do not
>> >> > prefer
>> >> the
>> >> > 1 indexing, I program in both styles just fine. I am not suggesting
>> >> anyone
>> >> > goes ahead and changes it) What I do wonder about is how you got
>> >> > around
>> >> > optimizing array access. After all, index origin is a matter of
>> >> > taste,
>> >> > pointer arithmetic efficient is not.
>> >> > On Aug 31, 2015 11:52 AM, "Milan Bouchet-Valat" <[hidden email]>
>> >> wrote:
>> >> >
>> >> >> Le lundi 31 août 2015 à 10:34 +0200, John Myles White a écrit :
>> >> >> > Please do not revive such old threads. It is very disrespectul to
>> >> >> > the
>> >> >> > actively contributing members of our community to attempt to
>> >> >> > initiate
>> >> >> > a debate about design decisions that were finalized years ago.
>> >> >> Also, criticizing these decisions and ending your post with "it is
>> >> >> not
>> >> >> quite clear how this is done" isn't really the right approach. If
>> >> >> you
>> >> >> have questions about how this works, ask politely, and people will
>> >> >> give
>> >> >> you pointers.
>> >> >>
>> >> >> Finally, I suspect you'll have a hard time creating a benchmark
>> >> >> where
>> >> >> the -1 operation really matters as much as you think it does. That's
>> >> >> where you should have started.
>> >> >>
>> >> >>
>> >> >> My two cents
>> >> >>
>> >> >> >  -- John
>> >> >> >
>> >> >> > > On Aug 26, 2015, at 3:27 PM, Alexander Pyattaev <
>> >> >> > > [hidden email]> wrote:
>> >> >> > >
>> >> >> > > It is curious how this discussion was quietly closed with nobody
>> >> >> > > paying any attention to the issue. Sad but true - most generic
>> >> >> > > programming things have 0-based indexes. The reason for this is
>> >> >> > > simple - performance=) With 0-based, most pointer arithmetics
>> >> >> > > are
>> >> >> > > way easier than with 1-based, this is why it is used in C.
>> >> >> > > Further,
>> >> >> > > in assembly everything will be zero-based for years to come
>> >> >> > > unless
>> >> >> > > someone reinvents the processor instruction set. Essentially,
>> >> >> > > you
>> >> >> > > are deliberately adding an extra operation to every single array
>> >> >> > > operation...
>> >> >> > > This gets worse for multi-dimensional arrays. Let us consider a
>> >> >> > > 20x20 matrix.
>> >> >> > >  The memory offset for 5'th element in 2'nd  row is 4 + 20*1  in
>> >> >> > > zero-based and 5-1+20*(2-1) for 1-based. The more dimensions,
>> >> >> > > the
>> >> >> > > more extra operations need to be done for every access. For
>> >> >> > > things
>> >> >> > > like loops over every element of a matrix this is REALLY
>> >> >> > > painful.
>> >> >> > > For example, I want to make a stupid loop to sum all elements in
>> >> >> > > array (C--style):
>> >> >> > > int sum =0;
>> >> >> > > for (i=0; i<N; i++)
>> >> >> > > {
>> >> >> > > sum = sum + array[i];
>> >> >> > > }
>> >> >> > > In C, the index i can be immediately used for memory access.
>> >> >> > > However, in 1-based indexing the code actually has one extra
>> >> >> > > addition for every loop iteration.
>> >> >> > > Naturally, the compiler will optimize some of the overhead away,
>> >> >> > > but what I wonder is how exactly are those optimizations done in
>> >> >> > > Julia. Based on what I have seen so far it is not quite clear
>> >> >> > > how
>> >> >> > > this is done.
>> >> >> > >
>> >> >> > >
>> >> >> > > On Wednesday, February 29, 2012 at 2:29:28 AM UTC+2, Andrei
>> >> >> > > Jirnyi
>> >> >> > > wrote:
>> >> >> > > > IMO, Python model is rather unique and unnatural -- I think
>> >> >> > > > there
>> >> >> > > > is a
>> >> >> > > > very good reason no other language implements indexing in that
>> >> >> > > > particular manner. For me personally, the
>> >> >> > > > "non-inclusive-at-the
>> >> >> > > > -right"
>> >> >> > > > indexing is probably the biggest issue why all my attempts at
>> >> >> > > > getting
>> >> >> > > > into the language (I try about once per year, about 6 times so
>> >> >> > > > far...)
>> >> >> > > > have so far failed :) (there are a couple others, e.g. weird
>> >> >> > > > default
>> >> >> > > > parameters, but I think this one takes the cake.)
>> >> >> > > >
>> >> >> > > > C-style 0-based indexes (i.e. with x[0:2] standing for the
>> >> >> > > > first
>> >> >> > > > three
>> >> >> > > > elements) could work -- Ox is a very good example of that; but
>> >> >> > > > really,
>> >> >> > > > the similarity to Matlab (and Fortran, and R, and Gauss, and
>> >> >> > > > SAS/IML,
>> >> >> > > > and even Mata I think) is a *very* good thing, which is a huge
>> >> >> > > > selling
>> >> >> > > > point :) so my pick would be definitely to stay with the base
>> >> >> > > > of
>> >> >> > > > 1.
>> >> >> > > >
>> >> >> > > > Fortran is btw very handy in that it allows anything to be a
>> >> >> > > > base
>> >> >> > > > --
>> >> >> > > > and I've used 0 based arrays in Fortran a few times, most
>> >> >> > > > commonly
>> >> >> > > > when translating C code. If this is not too much work, I guess
>> >> >> > > > it
>> >> >> > > > could be a nice addition -- but definitely non-critical :)
>> >> >> > > >
>> >> >> > > > --aj
>> >> >>
>> >>
>> >>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Indexing

Alexander Pyattaev
That's true, but it only works nicely like this for 1-D arrays. The issues begin when you have something iterating over a 2D array. Then, inner index calculation becomes messy. Also vectorization will be disabled by default there. A good example could be computing a sum of a 2D array.

On Mon, Aug 31, 2015 at 3:42 PM, Yichao Yu <[hidden email]> wrote:
On Mon, Aug 31, 2015 at 8:17 AM, Alexander Pyattaev
<[hidden email]> wrote:
> Hm. I see your point. Basically, the result of this would not be quite
> accurate, since some array access is may be translated into vector
> instructions on x86... I think the way to go about this is just C code
> compiled with LLVM, and see the difference. But that would not be a Julia
> benchmark anymore... Either way, I will see what can be done about it and
> let you know.

For the following code (make sure to use floating point type to
disable auto vectorization)
```jl
          @inbounds for i in eachindex(a)
              s += a[i]
          end
```

The assembly loop is
```asm
L32:    vaddsd  (%rcx), %xmm0, %xmm0
       addq    $8, %rcx
       addq    $-1, %rax
       jne     L32
```

AFAICT, there's no "extra operation to every single array operation".
The load address is directly calculated by incrementing a pointer and
the %rax decrement is the loop condition.

>
> On Aug 31, 2015 2:36 PM, "Mauro" <[hidden email]> wrote:
>>
>> > What do you mean by modifying with 0-based indexing? I have looked at
>> > native, and it does not seem to optimize anything at all. Maybe the
>> > actual
>> > machine code is somehow optimized further, but native does not show that
>>
>> I meant that you translate the assembly code to 0-based *by hand*.  Then
>> put both into some kind of harness to make them callable.  Finally run
>> benchmarks using the two implementations.  I suspect you'll need to be
>> crafty with assembler to be able to do this.  (I couldn't do it)
>>
>> > AFAIK.
>> > Alex.
>> > On Aug 31, 2015 1:00 PM, "Mauro" <[hidden email]> wrote:
>> >
>> >> To answer your question, you could look at the machine code with
>> >> @code_native, modify that to use 0-based indexing and do benchmarks.
>> >>
>> >> On Mon, 2015-08-31 at 11:46, Alexander Pyattaev
>> >> <[hidden email]>
>> >> wrote:
>> >> > Guys, guys, no disrepect meant. But clearly there must have been
>> >> something
>> >> > to compensate the overhead in index calculations. While I do not
>> >> > prefer
>> >> the
>> >> > 1 indexing, I program in both styles just fine. I am not suggesting
>> >> anyone
>> >> > goes ahead and changes it) What I do wonder about is how you got
>> >> > around
>> >> > optimizing array access. After all, index origin is a matter of
>> >> > taste,
>> >> > pointer arithmetic efficient is not.
>> >> > On Aug 31, 2015 11:52 AM, "Milan Bouchet-Valat" <[hidden email]>
>> >> wrote:
>> >> >
>> >> >> Le lundi 31 août 2015 à 10:34 +0200, John Myles White a écrit :
>> >> >> > Please do not revive such old threads. It is very disrespectul to
>> >> >> > the
>> >> >> > actively contributing members of our community to attempt to
>> >> >> > initiate
>> >> >> > a debate about design decisions that were finalized years ago.
>> >> >> Also, criticizing these decisions and ending your post with "it is
>> >> >> not
>> >> >> quite clear how this is done" isn't really the right approach. If
>> >> >> you
>> >> >> have questions about how this works, ask politely, and people will
>> >> >> give
>> >> >> you pointers.
>> >> >>
>> >> >> Finally, I suspect you'll have a hard time creating a benchmark
>> >> >> where
>> >> >> the -1 operation really matters as much as you think it does. That's
>> >> >> where you should have started.
>> >> >>
>> >> >>
>> >> >> My two cents
>> >> >>
>> >> >> >  -- John
>> >> >> >
>> >> >> > > On Aug 26, 2015, at 3:27 PM, Alexander Pyattaev <
>> >> >> > > [hidden email]> wrote:
>> >> >> > >
>> >> >> > > It is curious how this discussion was quietly closed with nobody
>> >> >> > > paying any attention to the issue. Sad but true - most generic
>> >> >> > > programming things have 0-based indexes. The reason for this is
>> >> >> > > simple - performance=) With 0-based, most pointer arithmetics
>> >> >> > > are
>> >> >> > > way easier than with 1-based, this is why it is used in C.
>> >> >> > > Further,
>> >> >> > > in assembly everything will be zero-based for years to come
>> >> >> > > unless
>> >> >> > > someone reinvents the processor instruction set. Essentially,
>> >> >> > > you
>> >> >> > > are deliberately adding an extra operation to every single array
>> >> >> > > operation...
>> >> >> > > This gets worse for multi-dimensional arrays. Let us consider a
>> >> >> > > 20x20 matrix.
>> >> >> > >  The memory offset for 5'th element in 2'nd  row is 4 + 20*1  in
>> >> >> > > zero-based and 5-1+20*(2-1) for 1-based. The more dimensions,
>> >> >> > > the
>> >> >> > > more extra operations need to be done for every access. For
>> >> >> > > things
>> >> >> > > like loops over every element of a matrix this is REALLY
>> >> >> > > painful.
>> >> >> > > For example, I want to make a stupid loop to sum all elements in
>> >> >> > > array (C--style):
>> >> >> > > int sum =0;
>> >> >> > > for (i=0; i<N; i++)
>> >> >> > > {
>> >> >> > > sum = sum + array[i];
>> >> >> > > }
>> >> >> > > In C, the index i can be immediately used for memory access.
>> >> >> > > However, in 1-based indexing the code actually has one extra
>> >> >> > > addition for every loop iteration.
>> >> >> > > Naturally, the compiler will optimize some of the overhead away,
>> >> >> > > but what I wonder is how exactly are those optimizations done in
>> >> >> > > Julia. Based on what I have seen so far it is not quite clear
>> >> >> > > how
>> >> >> > > this is done.
>> >> >> > >
>> >> >> > >
>> >> >> > > On Wednesday, February 29, 2012 at 2:29:28 AM UTC+2, Andrei
>> >> >> > > Jirnyi
>> >> >> > > wrote:
>> >> >> > > > IMO, Python model is rather unique and unnatural -- I think
>> >> >> > > > there
>> >> >> > > > is a
>> >> >> > > > very good reason no other language implements indexing in that
>> >> >> > > > particular manner. For me personally, the
>> >> >> > > > "non-inclusive-at-the
>> >> >> > > > -right"
>> >> >> > > > indexing is probably the biggest issue why all my attempts at
>> >> >> > > > getting
>> >> >> > > > into the language (I try about once per year, about 6 times so
>> >> >> > > > far...)
>> >> >> > > > have so far failed :) (there are a couple others, e.g. weird
>> >> >> > > > default
>> >> >> > > > parameters, but I think this one takes the cake.)
>> >> >> > > >
>> >> >> > > > C-style 0-based indexes (i.e. with x[0:2] standing for the
>> >> >> > > > first
>> >> >> > > > three
>> >> >> > > > elements) could work -- Ox is a very good example of that; but
>> >> >> > > > really,
>> >> >> > > > the similarity to Matlab (and Fortran, and R, and Gauss, and
>> >> >> > > > SAS/IML,
>> >> >> > > > and even Mata I think) is a *very* good thing, which is a huge
>> >> >> > > > selling
>> >> >> > > > point :) so my pick would be definitely to stay with the base
>> >> >> > > > of
>> >> >> > > > 1.
>> >> >> > > >
>> >> >> > > > Fortran is btw very handy in that it allows anything to be a
>> >> >> > > > base
>> >> >> > > > --
>> >> >> > > > and I've used 0 based arrays in Fortran a few times, most
>> >> >> > > > commonly
>> >> >> > > > when translating C code. If this is not too much work, I guess
>> >> >> > > > it
>> >> >> > > > could be a nice addition -- but definitely non-critical :)
>> >> >> > > >
>> >> >> > > > --aj
>> >> >>
>> >>
>> >>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: Indexing

Isaiah Norton
In fact, this is an area of deep and ongoing discussion because indexing is such a critical issue for scientific computing. See, for starters, these issues -- and the many others linked therein:


There are still some steps remaining to get *everything* we want for abstract interfaces, multidimensional indexing, etc. But significant efforts have clearly been made so far. Do have a look at the kinds of things people are building Julia, and the performance they are achieving.

In comparison to the above discussions, dealing with a -1 here or there is a fairly trivial compiler optimization (can usually be lifted out of a loop). However, we actually end up lowering indexes to LLVM with 0-based offsets, because that is what the LLVM API expects (look up "getElementPtr" in LLVM, and `jl_field_indexes` in the Julia C code, for example).

In general, we are happy to engage with good-faith technical questions, but keep in mind that the answers you receive may be quite brief, as some people answer on a smartphone on the train, and some questions are simply missed (this is the nature of mailing lists). For "developer level" questions, in particular, you are expected to put in the effort to work out details yourself.

You may be interested in the "developer documentation", which is the recommended starting point for questions about Julia internals:

 

On Mon, Aug 31, 2015 at 9:34 AM, Alexander Pyattaev <[hidden email]> wrote:
That's true, but it only works nicely like this for 1-D arrays. The issues begin when you have something iterating over a 2D array. Then, inner index calculation becomes messy. Also vectorization will be disabled by default there. A good example could be computing a sum of a 2D array.

On Mon, Aug 31, 2015 at 3:42 PM, Yichao Yu <[hidden email]> wrote:
On Mon, Aug 31, 2015 at 8:17 AM, Alexander Pyattaev
<[hidden email]> wrote:
> Hm. I see your point. Basically, the result of this would not be quite
> accurate, since some array access is may be translated into vector
> instructions on x86... I think the way to go about this is just C code
> compiled with LLVM, and see the difference. But that would not be a Julia
> benchmark anymore... Either way, I will see what can be done about it and
> let you know.

For the following code (make sure to use floating point type to
disable auto vectorization)
```jl
          @inbounds for i in eachindex(a)
              s += a[i]
          end
```

The assembly loop is
```asm
L32:    vaddsd  (%rcx), %xmm0, %xmm0
       addq    $8, %rcx
       addq    $-1, %rax
       jne     L32
```

AFAICT, there's no "extra operation to every single array operation".
The load address is directly calculated by incrementing a pointer and
the %rax decrement is the loop condition.

>
> On Aug 31, 2015 2:36 PM, "Mauro" <[hidden email]> wrote:
>>
>> > What do you mean by modifying with 0-based indexing? I have looked at
>> > native, and it does not seem to optimize anything at all. Maybe the
>> > actual
>> > machine code is somehow optimized further, but native does not show that
>>
>> I meant that you translate the assembly code to 0-based *by hand*.  Then
>> put both into some kind of harness to make them callable.  Finally run
>> benchmarks using the two implementations.  I suspect you'll need to be
>> crafty with assembler to be able to do this.  (I couldn't do it)
>>
>> > AFAIK.
>> > Alex.
>> > On Aug 31, 2015 1:00 PM, "Mauro" <[hidden email]> wrote:
>> >
>> >> To answer your question, you could look at the machine code with
>> >> @code_native, modify that to use 0-based indexing and do benchmarks.
>> >>
>> >> On Mon, 2015-08-31 at 11:46, Alexander Pyattaev
>> >> <[hidden email]>
>> >> wrote:
>> >> > Guys, guys, no disrepect meant. But clearly there must have been
>> >> something
>> >> > to compensate the overhead in index calculations. While I do not
>> >> > prefer
>> >> the
>> >> > 1 indexing, I program in both styles just fine. I am not suggesting
>> >> anyone
>> >> > goes ahead and changes it) What I do wonder about is how you got
>> >> > around
>> >> > optimizing array access. After all, index origin is a matter of
>> >> > taste,
>> >> > pointer arithmetic efficient is not.
>> >> > On Aug 31, 2015 11:52 AM, "Milan Bouchet-Valat" <[hidden email]>
>> >> wrote:
>> >> >
>> >> >> Le lundi 31 août 2015 à 10:34 +0200, John Myles White a écrit :
>> >> >> > Please do not revive such old threads. It is very disrespectul to
>> >> >> > the
>> >> >> > actively contributing members of our community to attempt to
>> >> >> > initiate
>> >> >> > a debate about design decisions that were finalized years ago.
>> >> >> Also, criticizing these decisions and ending your post with "it is
>> >> >> not
>> >> >> quite clear how this is done" isn't really the right approach. If
>> >> >> you
>> >> >> have questions about how this works, ask politely, and people will
>> >> >> give
>> >> >> you pointers.
>> >> >>
>> >> >> Finally, I suspect you'll have a hard time creating a benchmark
>> >> >> where
>> >> >> the -1 operation really matters as much as you think it does. That's
>> >> >> where you should have started.
>> >> >>
>> >> >>
>> >> >> My two cents
>> >> >>
>> >> >> >  -- John
>> >> >> >
>> >> >> > > On Aug 26, 2015, at 3:27 PM, Alexander Pyattaev <
>> >> >> > > [hidden email]> wrote:
>> >> >> > >
>> >> >> > > It is curious how this discussion was quietly closed with nobody
>> >> >> > > paying any attention to the issue. Sad but true - most generic
>> >> >> > > programming things have 0-based indexes. The reason for this is
>> >> >> > > simple - performance=) With 0-based, most pointer arithmetics
>> >> >> > > are
>> >> >> > > way easier than with 1-based, this is why it is used in C.
>> >> >> > > Further,
>> >> >> > > in assembly everything will be zero-based for years to come
>> >> >> > > unless
>> >> >> > > someone reinvents the processor instruction set. Essentially,
>> >> >> > > you
>> >> >> > > are deliberately adding an extra operation to every single array
>> >> >> > > operation...
>> >> >> > > This gets worse for multi-dimensional arrays. Let us consider a
>> >> >> > > 20x20 matrix.
>> >> >> > >  The memory offset for 5'th element in 2'nd  row is 4 + 20*1  in
>> >> >> > > zero-based and 5-1+20*(2-1) for 1-based. The more dimensions,
>> >> >> > > the
>> >> >> > > more extra operations need to be done for every access. For
>> >> >> > > things
>> >> >> > > like loops over every element of a matrix this is REALLY
>> >> >> > > painful.
>> >> >> > > For example, I want to make a stupid loop to sum all elements in
>> >> >> > > array (C--style):
>> >> >> > > int sum =0;
>> >> >> > > for (i=0; i<N; i++)
>> >> >> > > {
>> >> >> > > sum = sum + array[i];
>> >> >> > > }
>> >> >> > > In C, the index i can be immediately used for memory access.
>> >> >> > > However, in 1-based indexing the code actually has one extra
>> >> >> > > addition for every loop iteration.
>> >> >> > > Naturally, the compiler will optimize some of the overhead away,
>> >> >> > > but what I wonder is how exactly are those optimizations done in
>> >> >> > > Julia. Based on what I have seen so far it is not quite clear
>> >> >> > > how
>> >> >> > > this is done.
>> >> >> > >
>> >> >> > >
>> >> >> > > On Wednesday, February 29, 2012 at 2:29:28 AM UTC+2, Andrei
>> >> >> > > Jirnyi
>> >> >> > > wrote:
>> >> >> > > > IMO, Python model is rather unique and unnatural -- I think
>> >> >> > > > there
>> >> >> > > > is a
>> >> >> > > > very good reason no other language implements indexing in that
>> >> >> > > > particular manner. For me personally, the
>> >> >> > > > "non-inclusive-at-the
>> >> >> > > > -right"
>> >> >> > > > indexing is probably the biggest issue why all my attempts at
>> >> >> > > > getting
>> >> >> > > > into the language (I try about once per year, about 6 times so
>> >> >> > > > far...)
>> >> >> > > > have so far failed :) (there are a couple others, e.g. weird
>> >> >> > > > default
>> >> >> > > > parameters, but I think this one takes the cake.)
>> >> >> > > >
>> >> >> > > > C-style 0-based indexes (i.e. with x[0:2] standing for the
>> >> >> > > > first
>> >> >> > > > three
>> >> >> > > > elements) could work -- Ox is a very good example of that; but
>> >> >> > > > really,
>> >> >> > > > the similarity to Matlab (and Fortran, and R, and Gauss, and
>> >> >> > > > SAS/IML,
>> >> >> > > > and even Mata I think) is a *very* good thing, which is a huge
>> >> >> > > > selling
>> >> >> > > > point :) so my pick would be definitely to stay with the base
>> >> >> > > > of
>> >> >> > > > 1.
>> >> >> > > >
>> >> >> > > > Fortran is btw very handy in that it allows anything to be a
>> >> >> > > > base
>> >> >> > > > --
>> >> >> > > > and I've used 0 based arrays in Fortran a few times, most
>> >> >> > > > commonly
>> >> >> > > > when translating C code. If this is not too much work, I guess
>> >> >> > > > it
>> >> >> > > > could be a nice addition -- but definitely non-critical :)
>> >> >> > > >
>> >> >> > > > --aj
>> >> >>
>> >>
>> >>
>>
>