Reducing complexity of OffsetArrays

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Reducing complexity of OffsetArrays

Bob Portmann
TL;DR I think the type system should take care of converting between different types of arrays and thus free the user to code against the type of array they want (OneTo, ZeroTo, Offset, etc).

Maybe I have a deep mis-understanding of how the new generalized offset arrays are suppose to work in Julia. If so I'd be happy to be set straight. If not then I think the present system will lead to unnecessary complexity and bug-ridden code. In the present system one can define array types that are not one-based but can have arbitrary offsets. This is a great idea but to make it work a very general system for indexing relative to the ends of the array has been devised. If one writes a function that accepts an `AbstractArray` then one MUST use this more general system or produce bugs when e.g. a `ZeroToArray` is passed in. This puts a large burden on writers of library code that works on arrays and I think is an unnecessary complexity.

Since Fortran arrays are a nice example let look at the situation in Fortran. If I define an array in a function `real arr(-10:10)` then I would index it using indices that range from -10 to 10. If I pass this into another function that declared the array using its total size (usually passed in separately or in a parameter statement) `real arr(21)` then in that subroutine one would index the array using "normal" indices that range from 1 to 21. I.E., you state in the function how you want to treat the array and are not forced to work with in offset form unless you want to. In my opinion, this is what makes using offset arrays in Fortran sane and is the key thing that Julia should try to emulate.

One way to get this behavior in Julia would be to use the type system and `convert`. Since `AbstractArray` has meant until 0.5 a `OneTo` array I think it is wise that it remain that way so that all old code will work unchanged (even with the new array types). Thus, I propose adding a new top-level type above AbstractArray type to capture all arrays something like:

```
Array{T,N} <: DenseArray{T,N} <: AbstractArray{T,N} <: AbstractAllArray{T,N} <: Any
```

And the offset arrays would be subtypes of `AbstractAllArrays` on a different branch

```
OffsetArray{T,N} <: AbstractOffsetArray{T,N} <: AbstractAllArray{T,N} <: Any
```

And similarly for `ZeroToArray`.

Then, if one declares an Array as

```
function func(arr::AbstractArray)
```

one can safely assume it is a 1-based Array inside `func`. If an offset array is passed into `func` then it is automatically converted to a `OneTo` array inside `func`. This is the key point of this proposal and is similar to the auto-conversion of, e.g., Int to Floats in a function call. Similarly if one declares an array as a `ZeroToArray` in a function and passes in an `Array` then it will be converted to a `ZeroToArray` in the function. Some conversions would need more information and thus would be disallowed (e.g., `Array` to `OffsetArray`). I think this system would go a long way towards making `ZeroToArray`s and `OffsetArray`s simple to use in Julia w/o using the generalized indexing techniques introduced in Julia 0.5. Note that it would still be possible to use the `AbstractAllArray` type and accept all arrays w/o conversion and use the generalized indexing techniques.

I'm curious what people think about this.

Bob

ps Paste into github to see in formatted form. I'm not sure how to get that in an email.

Reply | Threaded
Open this post in threaded view
|

Re: Reducing complexity of OffsetArrays

Tim Holy
At least with the non-1 arrays I know of now, you can always get the OneTo
array form with parent(A). However, that's not something I feel comfortable
saying will be guaranteed forever.

In terms of the "automatic conversion" part of your proposal, I'm a bit
uncomfortable because in my view the indices mean something: how should
    copy!(A, B)
work if B's indices are not contained within A's indices? To me the indices
*are* the identity of a location in the array, so making it too easy to rename
things (esp. automatically) seems like a formula for serious bugs: if I wanted
to copy my data to indices -5:5, it's a bug if it gets stuck into the slots
1:11 instead because of some silent index-renaming.

If I understand correctly, regarding complexity, your biggest concern is about
the range types "affiliated" with a non-1 array type. First, those are optional,
in the sense that many algorithms can be written without them. What makes them
seemingly necessary in some circumstances is `similar` and the fact that Julia
(unlike Fortran) has a type system with significant performance implications.

For example, suppose you have defined a "sparse" AbstractArray type with non-1
indices, and a matching dense array type. Let's say you pass in one of these
sparse arrays S to some *generic* algorithm (perhaps one that someone else
wrote, without your use case in mind)  which needs to allocate some kind of
dense output. Then what you need to say is "allocate me an array similar to S,
but make it dense, with the same indices as S." `similar(S)` might not work,
because that might create a sparse output; nor would `Array{eltype(S)}
(size(S))` because `S` might have non-1 indices and `Array` doesn't support
that.

So the adopted solution is to allow you to say

    similar(dims->Array{eltype(S)}(dims), indices(S))

and your package needs to specialize this method to do what you want it to do.
Notice here that you have only one mechanism to control *which* type of array
gets created: the type returned by `indices(S)`. If it's a OneTo-tuple this
will just create an Array, but if it's something else then something else will
be created. The author of a non-1 array package gets to customize this if s/he
takes advantage of the opportunity to define a new index type; alternatively,
you could just make sure that the `OffsetArrays` package has been loaded, and
then use UnitRange for the return value of `indices`, and it will create an
`OffsetArray` for you. If you're happy with that, you don't have to define a
custom indices type.

Now, we could get around this by declaring "one true non-1 array type," but
this might be constraining. For example, https://github.com/JuliaArrays/
CatIndices.jl defines a BidirectionalVector array type that causes `shift!` and
`unshift!` to adjust the indices of the first element, complementary to how
`push!` and `pop!` adjust the indices of the last element; I don't think this
is something you could assume was desired in general, but it's very useful in
particular contexts.

I'm all in favor of simplifying this where possible, and would love ideas for
how to do it. But I'm not sure that Fortran is a perfect model, because the
flexibility of Julia's type system allows useful behaviors that make Fortran
look so...static.

Best,
--Tim

On Thursday, October 27, 2016 5:21:26 PM CDT Bob Portmann wrote:

> TL;DR I think the type system should take care of converting between
> different types of arrays and thus free the user to code against the type
> of array they want (OneTo, ZeroTo, Offset, etc).
>
> Maybe I have a deep mis-understanding of how the new generalized offset
> arrays are suppose to work in Julia. If so I'd be happy to be set straight.
> If not then I think the present system will lead to unnecessary complexity
> and bug-ridden code. In the present system one can define array types that
> are not one-based but can have arbitrary offsets. This is a great idea but
> to make it work a very general system for indexing relative to the ends of
> the array has been devised. If one writes a function that accepts an
> `AbstractArray` then one MUST use this more general system or produce bugs
> when e.g. a `ZeroToArray` is passed in. This puts a large burden on writers
> of library code that works on arrays and I think is an unnecessary
> complexity.
>
> Since Fortran arrays are a nice example let look at the situation in
> Fortran. If I define an array in a function `real arr(-10:10)` then I would
> index it using indices that range from -10 to 10. If I pass this into
> another function that declared the array using its total size (usually
> passed in separately or in a parameter statement) `real arr(21)` then in
> that subroutine one would index the array using "normal" indices that range
> from 1 to 21. I.E., you state in the function how you want to treat the
> array and are not forced to work with in offset form unless you want to. In
> my opinion, this is what makes using offset arrays in Fortran sane and is
> the key thing that Julia should try to emulate.
>
> One way to get this behavior in Julia would be to use the type system and
> `convert`. Since `AbstractArray` has meant until 0.5 a `OneTo` array I
> think it is wise that it remain that way so that all old code will work
> unchanged (even with the new array types). Thus, I propose adding a new
> top-level type above AbstractArray type to capture all arrays something
> like:
>
> ```
> Array{T,N} <: DenseArray{T,N} <: AbstractArray{T,N} <:
> AbstractAllArray{T,N} <: Any
> ```
>
> And the offset arrays would be subtypes of `AbstractAllArrays` on a
> different branch
>
> ```
> OffsetArray{T,N} <: AbstractOffsetArray{T,N} <: AbstractAllArray{T,N} <: Any
> ```
>
> And similarly for `ZeroToArray`.
>
> Then, if one declares an Array as
>
> ```
> function func(arr::AbstractArray)
> ```
>
> one can safely assume it is a 1-based Array inside `func`. If an offset
> array is passed into `func` then it is automatically converted to a `OneTo`
> array inside `func`. This is the key point of this proposal and is similar
> to the auto-conversion of, e.g., Int to Floats in a function call.
> Similarly if one declares an array as a `ZeroToArray` in a function and
> passes in an `Array` then it will be converted to a `ZeroToArray` in the
> function. Some conversions would need more information and thus would be
> disallowed (e.g., `Array` to `OffsetArray`). I think this system would go a
> long way towards making `ZeroToArray`s and `OffsetArray`s simple to use in
> Julia w/o using the generalized indexing techniques introduced in Julia
> 0.5. Note that it would still be possible to use the `AbstractAllArray`
> type and accept all arrays w/o conversion and use the generalized indexing
> techniques.
>
> I'm curious what people think about this.
>
> Bob
>
> ps Paste into github to see in formatted form. I'm not sure how to get that
> in an email.


Reply | Threaded
Open this post in threaded view
|

Re: Reducing complexity of OffsetArrays

Tim Holy
In reply to this post by Bob Portmann
I should add, abstract discussions are all well and good, but I fully
recognize you may have something very specific in mind and my answer simply
fails to understand your concern. If you can give one or more concrete
examples of where the current system is either bug-prone or a pain in the
neck, I'd love to see what you mean.

Best,
--Tim

On Thursday, October 27, 2016 5:21:26 PM CDT Bob Portmann wrote:

> TL;DR I think the type system should take care of converting between
> different types of arrays and thus free the user to code against the type
> of array they want (OneTo, ZeroTo, Offset, etc).
>
> Maybe I have a deep mis-understanding of how the new generalized offset
> arrays are suppose to work in Julia. If so I'd be happy to be set straight.
> If not then I think the present system will lead to unnecessary complexity
> and bug-ridden code. In the present system one can define array types that
> are not one-based but can have arbitrary offsets. This is a great idea but
> to make it work a very general system for indexing relative to the ends of
> the array has been devised. If one writes a function that accepts an
> `AbstractArray` then one MUST use this more general system or produce bugs
> when e.g. a `ZeroToArray` is passed in. This puts a large burden on writers
> of library code that works on arrays and I think is an unnecessary
> complexity.
>
> Since Fortran arrays are a nice example let look at the situation in
> Fortran. If I define an array in a function `real arr(-10:10)` then I would
> index it using indices that range from -10 to 10. If I pass this into
> another function that declared the array using its total size (usually
> passed in separately or in a parameter statement) `real arr(21)` then in
> that subroutine one would index the array using "normal" indices that range
> from 1 to 21. I.E., you state in the function how you want to treat the
> array and are not forced to work with in offset form unless you want to. In
> my opinion, this is what makes using offset arrays in Fortran sane and is
> the key thing that Julia should try to emulate.
>
> One way to get this behavior in Julia would be to use the type system and
> `convert`. Since `AbstractArray` has meant until 0.5 a `OneTo` array I
> think it is wise that it remain that way so that all old code will work
> unchanged (even with the new array types). Thus, I propose adding a new
> top-level type above AbstractArray type to capture all arrays something
> like:
>
> ```
> Array{T,N} <: DenseArray{T,N} <: AbstractArray{T,N} <:
> AbstractAllArray{T,N} <: Any
> ```
>
> And the offset arrays would be subtypes of `AbstractAllArrays` on a
> different branch
>
> ```
> OffsetArray{T,N} <: AbstractOffsetArray{T,N} <: AbstractAllArray{T,N} <: Any
> ```
>
> And similarly for `ZeroToArray`.
>
> Then, if one declares an Array as
>
> ```
> function func(arr::AbstractArray)
> ```
>
> one can safely assume it is a 1-based Array inside `func`. If an offset
> array is passed into `func` then it is automatically converted to a `OneTo`
> array inside `func`. This is the key point of this proposal and is similar
> to the auto-conversion of, e.g., Int to Floats in a function call.
> Similarly if one declares an array as a `ZeroToArray` in a function and
> passes in an `Array` then it will be converted to a `ZeroToArray` in the
> function. Some conversions would need more information and thus would be
> disallowed (e.g., `Array` to `OffsetArray`). I think this system would go a
> long way towards making `ZeroToArray`s and `OffsetArray`s simple to use in
> Julia w/o using the generalized indexing techniques introduced in Julia
> 0.5. Note that it would still be possible to use the `AbstractAllArray`
> type and accept all arrays w/o conversion and use the generalized indexing
> techniques.
>
> I'm curious what people think about this.
>
> Bob
>
> ps Paste into github to see in formatted form. I'm not sure how to get that
> in an email.


Reply | Threaded
Open this post in threaded view
|

Re: Reducing complexity of OffsetArrays

Bob Portmann
Thanks for the thoughtful response. I hope you'll tolerate one reply in the "abstract".

I am resisting the big change that occurred in 0.5. In 0.4 and earlier if one declares an array as an `AbstractArray` in a function then one knew that the indices were one based (a nice simple model, even if it is hated by many). In 0.5, if one wants to write general code, then one has to assume that arrays can have ANY indices. And one needs to write code using more abstract tools. This seems to me to be a large cost in complexity for the small subset of cases where offset indices are helpful. This is the core issue to me.

One way around this, it would seem, is to declare arrays as `::Array` and not `::AbstractArray` in functions (if one want to be sure they are `OneTo`). But then one gives up accepting many types of arrays that would pose no problem (e.g, DistributedArrays with OneTo indices). Thus, I'm proposing to have a high level abstract type that would capture all arrays types that can be assumed to be `OneTo`. Then one can write library functions against that type. This alone would help I think.

The auto-conversion is an extra step that I thought might work since it is (I think) low cost to convert an `OffsetArray` to a `OneTo` array . Thus if you passed an OffsetArray to a function that takes the abstract OneTo type (I kept its name AbstractArray above but that need not be its name) you expect that if it returned a `similar` array it would be of `OneTo` type. You would then have to convert it back to an OffsetArray type. It would probably be easy to write one line functions to do this for you. If it was sparse I assume it would remain so. It would be up to the writer of the new OffsetArray type to declare what type it would convert to in the OneTo array abstract tree.

Does this help?

Bob

ps I don't have any concrete examples yet because OffsetArrays are so scarce in the julia ecosystem. I'm just looking ahead to a world where it will no longer be possible to assume AbstractArrays are OneTo. This seems more painful than telling the 0-based crowd to code against 1-based arrays.
 

On Sat, Oct 29, 2016 at 10:40 AM, Tim Holy <[hidden email]> wrote:
I should add, abstract discussions are all well and good, but I fully
recognize you may have something very specific in mind and my answer simply
fails to understand your concern. If you can give one or more concrete
examples of where the current system is either bug-prone or a pain in the
neck, I'd love to see what you mean.

Best,
--Tim

On Thursday, October 27, 2016 5:21:26 PM CDT Bob Portmann wrote:
> TL;DR I think the type system should take care of converting between
> different types of arrays and thus free the user to code against the type
> of array they want (OneTo, ZeroTo, Offset, etc).
>
> Maybe I have a deep mis-understanding of how the new generalized offset
> arrays are suppose to work in Julia. If so I'd be happy to be set straight.
> If not then I think the present system will lead to unnecessary complexity
> and bug-ridden code. In the present system one can define array types that
> are not one-based but can have arbitrary offsets. This is a great idea but
> to make it work a very general system for indexing relative to the ends of
> the array has been devised. If one writes a function that accepts an
> `AbstractArray` then one MUST use this more general system or produce bugs
> when e.g. a `ZeroToArray` is passed in. This puts a large burden on writers
> of library code that works on arrays and I think is an unnecessary
> complexity.
>
> Since Fortran arrays are a nice example let look at the situation in
> Fortran. If I define an array in a function `real arr(-10:10)` then I would
> index it using indices that range from -10 to 10. If I pass this into
> another function that declared the array using its total size (usually
> passed in separately or in a parameter statement) `real arr(21)` then in
> that subroutine one would index the array using "normal" indices that range
> from 1 to 21. I.E., you state in the function how you want to treat the
> array and are not forced to work with in offset form unless you want to. In
> my opinion, this is what makes using offset arrays in Fortran sane and is
> the key thing that Julia should try to emulate.
>
> One way to get this behavior in Julia would be to use the type system and
> `convert`. Since `AbstractArray` has meant until 0.5 a `OneTo` array I
> think it is wise that it remain that way so that all old code will work
> unchanged (even with the new array types). Thus, I propose adding a new
> top-level type above AbstractArray type to capture all arrays something
> like:
>
> ```
> Array{T,N} <: DenseArray{T,N} <: AbstractArray{T,N} <:
> AbstractAllArray{T,N} <: Any
> ```
>
> And the offset arrays would be subtypes of `AbstractAllArrays` on a
> different branch
>
> ```
> OffsetArray{T,N} <: AbstractOffsetArray{T,N} <: AbstractAllArray{T,N} <: Any
> ```
>
> And similarly for `ZeroToArray`.
>
> Then, if one declares an Array as
>
> ```
> function func(arr::AbstractArray)
> ```
>
> one can safely assume it is a 1-based Array inside `func`. If an offset
> array is passed into `func` then it is automatically converted to a `OneTo`
> array inside `func`. This is the key point of this proposal and is similar
> to the auto-conversion of, e.g., Int to Floats in a function call.
> Similarly if one declares an array as a `ZeroToArray` in a function and
> passes in an `Array` then it will be converted to a `ZeroToArray` in the
> function. Some conversions would need more information and thus would be
> disallowed (e.g., `Array` to `OffsetArray`). I think this system would go a
> long way towards making `ZeroToArray`s and `OffsetArray`s simple to use in
> Julia w/o using the generalized indexing techniques introduced in Julia
> 0.5. Note that it would still be possible to use the `AbstractAllArray`
> type and accept all arrays w/o conversion and use the generalized indexing
> techniques.
>
> I'm curious what people think about this.
>
> Bob
>
> ps Paste into github to see in formatted form. I'm not sure how to get that
> in an email.



Reply | Threaded
Open this post in threaded view
|

Re: Reducing complexity of OffsetArrays

Tim Holy
I'm afraid I still don't understand the claimed big increment in complexity. First, let's distinguish "generic offset arrays" from the OffsetArrays.jl package. If you're happy using OffsetArrays, you don't have to write your own offset-array type. Being able to use an established & tested package reduces your burden a lot, and you can ignore the second half of the devdocs page entirely.

If you just want to *use* OffsetArrays.jl, the basic changes in coding style for writing indices-aware code are:

- any place you used to call `size`, you probably want to call `indices` instead (and likely make minor adjustments elsewhere, since `incides` returns a tuple-of-ranges---but such changes tend to be very obvious);
- check all uses of `similar`; some will stay as-is, other will migrate to `similar(f, inds)` style.

In my experience, that's just about it. The devdocs goes into quite a lot of detail to explain the rationale, but really the actual changes are quite small. While you can't quite do it via `grep` and `sed`, to me that just doesn't seem complicated.

Where the pain comes is that if you're converting old code, you sometimes have to think your way through it again---"hmm, what do I really mean by this index"? If your code had complicated indexing the first time you wrote it, unfortunately you're going to have to think about it carefully again; so in some cases, "porting" code is almost as bad as writing it the first time. However, if you write indices-aware code in the first place, in my experience the added burden is almost negligible, and in quite a few cases the ability to offset array indices makes things *easier* (e.g., "padding" an array on its edges is oh-so-much-clearer than it used to be, it's like a different world). That's the whole reason I implemented this facility in julia-0.5: to make life easier, not to make it harder. (Personally I think the whole argument over 0-based and 1-based indexing is stupid; it's the ability to use arbitrary indices that I find interesting & useful, and it makes most of my code prettier.)

For examples of packages that use OffsetArrays, check the following:
- CatIndices
- FFTViews
- ImageFiltering

ImageFiltering is a mixed bag: there's a small performance penalty in a few cases (even if you use @unsafe) because that LLVM doesn't always optimize code as well as it could in principle (maybe @polly will help someday...). Because image filtering is an extraordinarily performance-sensitive operation, there are a few places where I had to make some yucky hand optimizations.

Again, I'm very happy to continue this conversation---I'd really like to understand your concern, but without a concrete example I confess I'm lost.

Best,
--Tim


On Sat, Oct 29, 2016 at 8:44 PM, Bob Portmann <[hidden email]> wrote:
Thanks for the thoughtful response. I hope you'll tolerate one reply in the "abstract".

I am resisting the big change that occurred in 0.5. In 0.4 and earlier if one declares an array as an `AbstractArray` in a function then one knew that the indices were one based (a nice simple model, even if it is hated by many). In 0.5, if one wants to write general code, then one has to assume that arrays can have ANY indices. And one needs to write code using more abstract tools. This seems to me to be a large cost in complexity for the small subset of cases where offset indices are helpful. This is the core issue to me.

One way around this, it would seem, is to declare arrays as `::Array` and not `::AbstractArray` in functions (if one want to be sure they are `OneTo`). But then one gives up accepting many types of arrays that would pose no problem (e.g, DistributedArrays with OneTo indices). Thus, I'm proposing to have a high level abstract type that would capture all arrays types that can be assumed to be `OneTo`. Then one can write library functions against that type. This alone would help I think.

The auto-conversion is an extra step that I thought might work since it is (I think) low cost to convert an `OffsetArray` to a `OneTo` array . Thus if you passed an OffsetArray to a function that takes the abstract OneTo type (I kept its name AbstractArray above but that need not be its name) you expect that if it returned a `similar` array it would be of `OneTo` type. You would then have to convert it back to an OffsetArray type. It would probably be easy to write one line functions to do this for you. If it was sparse I assume it would remain so. It would be up to the writer of the new OffsetArray type to declare what type it would convert to in the OneTo array abstract tree.

Does this help?

Bob

ps I don't have any concrete examples yet because OffsetArrays are so scarce in the julia ecosystem. I'm just looking ahead to a world where it will no longer be possible to assume AbstractArrays are OneTo. This seems more painful than telling the 0-based crowd to code against 1-based arrays.
 

On Sat, Oct 29, 2016 at 10:40 AM, Tim Holy <[hidden email]> wrote:
I should add, abstract discussions are all well and good, but I fully
recognize you may have something very specific in mind and my answer simply
fails to understand your concern. If you can give one or more concrete
examples of where the current system is either bug-prone or a pain in the
neck, I'd love to see what you mean.

Best,
--Tim

On Thursday, October 27, 2016 5:21:26 PM CDT Bob Portmann wrote:
> TL;DR I think the type system should take care of converting between
> different types of arrays and thus free the user to code against the type
> of array they want (OneTo, ZeroTo, Offset, etc).
>
> Maybe I have a deep mis-understanding of how the new generalized offset
> arrays are suppose to work in Julia. If so I'd be happy to be set straight.
> If not then I think the present system will lead to unnecessary complexity
> and bug-ridden code. In the present system one can define array types that
> are not one-based but can have arbitrary offsets. This is a great idea but
> to make it work a very general system for indexing relative to the ends of
> the array has been devised. If one writes a function that accepts an
> `AbstractArray` then one MUST use this more general system or produce bugs
> when e.g. a `ZeroToArray` is passed in. This puts a large burden on writers
> of library code that works on arrays and I think is an unnecessary
> complexity.
>
> Since Fortran arrays are a nice example let look at the situation in
> Fortran. If I define an array in a function `real arr(-10:10)` then I would
> index it using indices that range from -10 to 10. If I pass this into
> another function that declared the array using its total size (usually
> passed in separately or in a parameter statement) `real arr(21)` then in
> that subroutine one would index the array using "normal" indices that range
> from 1 to 21. I.E., you state in the function how you want to treat the
> array and are not forced to work with in offset form unless you want to. In
> my opinion, this is what makes using offset arrays in Fortran sane and is
> the key thing that Julia should try to emulate.
>
> One way to get this behavior in Julia would be to use the type system and
> `convert`. Since `AbstractArray` has meant until 0.5 a `OneTo` array I
> think it is wise that it remain that way so that all old code will work
> unchanged (even with the new array types). Thus, I propose adding a new
> top-level type above AbstractArray type to capture all arrays something
> like:
>
> ```
> Array{T,N} <: DenseArray{T,N} <: AbstractArray{T,N} <:
> AbstractAllArray{T,N} <: Any
> ```
>
> And the offset arrays would be subtypes of `AbstractAllArrays` on a
> different branch
>
> ```
> OffsetArray{T,N} <: AbstractOffsetArray{T,N} <: AbstractAllArray{T,N} <: Any
> ```
>
> And similarly for `ZeroToArray`.
>
> Then, if one declares an Array as
>
> ```
> function func(arr::AbstractArray)
> ```
>
> one can safely assume it is a 1-based Array inside `func`. If an offset
> array is passed into `func` then it is automatically converted to a `OneTo`
> array inside `func`. This is the key point of this proposal and is similar
> to the auto-conversion of, e.g., Int to Floats in a function call.
> Similarly if one declares an array as a `ZeroToArray` in a function and
> passes in an `Array` then it will be converted to a `ZeroToArray` in the
> function. Some conversions would need more information and thus would be
> disallowed (e.g., `Array` to `OffsetArray`). I think this system would go a
> long way towards making `ZeroToArray`s and `OffsetArray`s simple to use in
> Julia w/o using the generalized indexing techniques introduced in Julia
> 0.5. Note that it would still be possible to use the `AbstractAllArray`
> type and accept all arrays w/o conversion and use the generalized indexing
> techniques.
>
> I'm curious what people think about this.
>
> Bob
>
> ps Paste into github to see in formatted form. I'm not sure how to get that
> in an email.




Reply | Threaded
Open this post in threaded view
|

Re: Reducing complexity of OffsetArrays

Bob Portmann
Like I said, no real practical experience yet. The increase in complexity that I fear is the loss of, e.g., writing arr[2,3] and having it not be the element in the 2nd row and third column (i.e., the loss of a simple model of how things are laid out). Maybe my fears are unfounded. Others don't seem concerned it would seem.

I'll check out those packages that you mention.

Thanks,
Bob 

On Sun, Oct 30, 2016 at 2:29 PM, Tim Holy <[hidden email]> wrote:
I'm afraid I still don't understand the claimed big increment in complexity. First, let's distinguish "generic offset arrays" from the OffsetArrays.jl package. If you're happy using OffsetArrays, you don't have to write your own offset-array type. Being able to use an established & tested package reduces your burden a lot, and you can ignore the second half of the devdocs page entirely.

If you just want to *use* OffsetArrays.jl, the basic changes in coding style for writing indices-aware code are:

- any place you used to call `size`, you probably want to call `indices` instead (and likely make minor adjustments elsewhere, since `incides` returns a tuple-of-ranges---but such changes tend to be very obvious);
- check all uses of `similar`; some will stay as-is, other will migrate to `similar(f, inds)` style.

In my experience, that's just about it. The devdocs goes into quite a lot of detail to explain the rationale, but really the actual changes are quite small. While you can't quite do it via `grep` and `sed`, to me that just doesn't seem complicated.

Where the pain comes is that if you're converting old code, you sometimes have to think your way through it again---"hmm, what do I really mean by this index"? If your code had complicated indexing the first time you wrote it, unfortunately you're going to have to think about it carefully again; so in some cases, "porting" code is almost as bad as writing it the first time. However, if you write indices-aware code in the first place, in my experience the added burden is almost negligible, and in quite a few cases the ability to offset array indices makes things *easier* (e.g., "padding" an array on its edges is oh-so-much-clearer than it used to be, it's like a different world). That's the whole reason I implemented this facility in julia-0.5: to make life easier, not to make it harder. (Personally I think the whole argument over 0-based and 1-based indexing is stupid; it's the ability to use arbitrary indices that I find interesting & useful, and it makes most of my code prettier.)

For examples of packages that use OffsetArrays, check the following:
- CatIndices
- FFTViews
- ImageFiltering

ImageFiltering is a mixed bag: there's a small performance penalty in a few cases (even if you use @unsafe) because that LLVM doesn't always optimize code as well as it could in principle (maybe @polly will help someday...). Because image filtering is an extraordinarily performance-sensitive operation, there are a few places where I had to make some yucky hand optimizations.

Again, I'm very happy to continue this conversation---I'd really like to understand your concern, but without a concrete example I confess I'm lost.

Best,
--Tim


On Sat, Oct 29, 2016 at 8:44 PM, Bob Portmann <[hidden email]> wrote:
Thanks for the thoughtful response. I hope you'll tolerate one reply in the "abstract".

I am resisting the big change that occurred in 0.5. In 0.4 and earlier if one declares an array as an `AbstractArray` in a function then one knew that the indices were one based (a nice simple model, even if it is hated by many). In 0.5, if one wants to write general code, then one has to assume that arrays can have ANY indices. And one needs to write code using more abstract tools. This seems to me to be a large cost in complexity for the small subset of cases where offset indices are helpful. This is the core issue to me.

One way around this, it would seem, is to declare arrays as `::Array` and not `::AbstractArray` in functions (if one want to be sure they are `OneTo`). But then one gives up accepting many types of arrays that would pose no problem (e.g, DistributedArrays with OneTo indices). Thus, I'm proposing to have a high level abstract type that would capture all arrays types that can be assumed to be `OneTo`. Then one can write library functions against that type. This alone would help I think.

The auto-conversion is an extra step that I thought might work since it is (I think) low cost to convert an `OffsetArray` to a `OneTo` array . Thus if you passed an OffsetArray to a function that takes the abstract OneTo type (I kept its name AbstractArray above but that need not be its name) you expect that if it returned a `similar` array it would be of `OneTo` type. You would then have to convert it back to an OffsetArray type. It would probably be easy to write one line functions to do this for you. If it was sparse I assume it would remain so. It would be up to the writer of the new OffsetArray type to declare what type it would convert to in the OneTo array abstract tree.

Does this help?

Bob

ps I don't have any concrete examples yet because OffsetArrays are so scarce in the julia ecosystem. I'm just looking ahead to a world where it will no longer be possible to assume AbstractArrays are OneTo. This seems more painful than telling the 0-based crowd to code against 1-based arrays.
 

On Sat, Oct 29, 2016 at 10:40 AM, Tim Holy <[hidden email]> wrote:
I should add, abstract discussions are all well and good, but I fully
recognize you may have something very specific in mind and my answer simply
fails to understand your concern. If you can give one or more concrete
examples of where the current system is either bug-prone or a pain in the
neck, I'd love to see what you mean.

Best,
--Tim

On Thursday, October 27, 2016 5:21:26 PM CDT Bob Portmann wrote:
> TL;DR I think the type system should take care of converting between
> different types of arrays and thus free the user to code against the type
> of array they want (OneTo, ZeroTo, Offset, etc).
>
> Maybe I have a deep mis-understanding of how the new generalized offset
> arrays are suppose to work in Julia. If so I'd be happy to be set straight.
> If not then I think the present system will lead to unnecessary complexity
> and bug-ridden code. In the present system one can define array types that
> are not one-based but can have arbitrary offsets. This is a great idea but
> to make it work a very general system for indexing relative to the ends of
> the array has been devised. If one writes a function that accepts an
> `AbstractArray` then one MUST use this more general system or produce bugs
> when e.g. a `ZeroToArray` is passed in. This puts a large burden on writers
> of library code that works on arrays and I think is an unnecessary
> complexity.
>
> Since Fortran arrays are a nice example let look at the situation in
> Fortran. If I define an array in a function `real arr(-10:10)` then I would
> index it using indices that range from -10 to 10. If I pass this into
> another function that declared the array using its total size (usually
> passed in separately or in a parameter statement) `real arr(21)` then in
> that subroutine one would index the array using "normal" indices that range
> from 1 to 21. I.E., you state in the function how you want to treat the
> array and are not forced to work with in offset form unless you want to. In
> my opinion, this is what makes using offset arrays in Fortran sane and is
> the key thing that Julia should try to emulate.
>
> One way to get this behavior in Julia would be to use the type system and
> `convert`. Since `AbstractArray` has meant until 0.5 a `OneTo` array I
> think it is wise that it remain that way so that all old code will work
> unchanged (even with the new array types). Thus, I propose adding a new
> top-level type above AbstractArray type to capture all arrays something
> like:
>
> ```
> Array{T,N} <: DenseArray{T,N} <: AbstractArray{T,N} <:
> AbstractAllArray{T,N} <: Any
> ```
>
> And the offset arrays would be subtypes of `AbstractAllArrays` on a
> different branch
>
> ```
> OffsetArray{T,N} <: AbstractOffsetArray{T,N} <: AbstractAllArray{T,N} <: Any
> ```
>
> And similarly for `ZeroToArray`.
>
> Then, if one declares an Array as
>
> ```
> function func(arr::AbstractArray)
> ```
>
> one can safely assume it is a 1-based Array inside `func`. If an offset
> array is passed into `func` then it is automatically converted to a `OneTo`
> array inside `func`. This is the key point of this proposal and is similar
> to the auto-conversion of, e.g., Int to Floats in a function call.
> Similarly if one declares an array as a `ZeroToArray` in a function and
> passes in an `Array` then it will be converted to a `ZeroToArray` in the
> function. Some conversions would need more information and thus would be
> disallowed (e.g., `Array` to `OffsetArray`). I think this system would go a
> long way towards making `ZeroToArray`s and `OffsetArray`s simple to use in
> Julia w/o using the generalized indexing techniques introduced in Julia
> 0.5. Note that it would still be possible to use the `AbstractAllArray`
> type and accept all arrays w/o conversion and use the generalized indexing
> techniques.
>
> I'm curious what people think about this.
>
> Bob
>
> ps Paste into github to see in formatted form. I'm not sure how to get that
> in an email.





Reply | Threaded
Open this post in threaded view
|

Re: Reducing complexity of OffsetArrays

Tim Holy
There's still a simple model: the indices are the *key* of the entry. Think about an array as a very special Dict. You could create a Dict with integer keys, let's say from -2:5. Presumably you wouldn't be exactly happy if there were some magical way that the number you originally stored as `d[-1] = 3.2` were alternatively accessible as `d[2]`, simply because the smallest index was -2 and therefore 3.2 is the "second" entry?

Like a Dict, for an array the value always goes with the key (the indices). Perhaps this will help:
```
julia> using OffsetArrays

julia> a = OffsetArray(rand(11), -5:5)
OffsetArrays.OffsetArray{Float64,1,Array{Float64,1}} with indices -5:5:
 0.815289 
 0.0043941
 0.00403153
 0.478065 
 0.150709 
 0.256156 
 0.934703 
 0.672495 
 0.428721 
 0.242469 
 0.43742  

julia> idx = OffsetArray(-1:1, -1:1)
OffsetArrays.OffsetArray{Int64,1,UnitRange{Int64}} with indices -1:1:
 -1
  0
  1

julia> b = a[idx]
OffsetArrays.OffsetArray{Float64,1,Array{Float64,1}} with indices -1:1:
 0.150709
 0.256156
 0.934703

julia> a[-1]
0.15070935766983662

julia> b[-1]
0.15070935766983662
```
So indexing `b = a[idx]` means that `b[j] = a[idx[j]]`. Does that help?

Best,
--Tim

On Tue, Nov 1, 2016 at 3:36 PM, Bob Portmann <[hidden email]> wrote:
Like I said, no real practical experience yet. The increase in complexity that I fear is the loss of, e.g., writing arr[2,3] and having it not be the element in the 2nd row and third column (i.e., the loss of a simple model of how things are laid out). Maybe my fears are unfounded. Others don't seem concerned it would seem.

I'll check out those packages that you mention.

Thanks,
Bob 

On Sun, Oct 30, 2016 at 2:29 PM, Tim Holy <[hidden email]> wrote:
I'm afraid I still don't understand the claimed big increment in complexity. First, let's distinguish "generic offset arrays" from the OffsetArrays.jl package. If you're happy using OffsetArrays, you don't have to write your own offset-array type. Being able to use an established & tested package reduces your burden a lot, and you can ignore the second half of the devdocs page entirely.

If you just want to *use* OffsetArrays.jl, the basic changes in coding style for writing indices-aware code are:

- any place you used to call `size`, you probably want to call `indices` instead (and likely make minor adjustments elsewhere, since `incides` returns a tuple-of-ranges---but such changes tend to be very obvious);
- check all uses of `similar`; some will stay as-is, other will migrate to `similar(f, inds)` style.

In my experience, that's just about it. The devdocs goes into quite a lot of detail to explain the rationale, but really the actual changes are quite small. While you can't quite do it via `grep` and `sed`, to me that just doesn't seem complicated.

Where the pain comes is that if you're converting old code, you sometimes have to think your way through it again---"hmm, what do I really mean by this index"? If your code had complicated indexing the first time you wrote it, unfortunately you're going to have to think about it carefully again; so in some cases, "porting" code is almost as bad as writing it the first time. However, if you write indices-aware code in the first place, in my experience the added burden is almost negligible, and in quite a few cases the ability to offset array indices makes things *easier* (e.g., "padding" an array on its edges is oh-so-much-clearer than it used to be, it's like a different world). That's the whole reason I implemented this facility in julia-0.5: to make life easier, not to make it harder. (Personally I think the whole argument over 0-based and 1-based indexing is stupid; it's the ability to use arbitrary indices that I find interesting & useful, and it makes most of my code prettier.)

For examples of packages that use OffsetArrays, check the following:
- CatIndices
- FFTViews
- ImageFiltering

ImageFiltering is a mixed bag: there's a small performance penalty in a few cases (even if you use @unsafe) because that LLVM doesn't always optimize code as well as it could in principle (maybe @polly will help someday...). Because image filtering is an extraordinarily performance-sensitive operation, there are a few places where I had to make some yucky hand optimizations.

Again, I'm very happy to continue this conversation---I'd really like to understand your concern, but without a concrete example I confess I'm lost.

Best,
--Tim


On Sat, Oct 29, 2016 at 8:44 PM, Bob Portmann <[hidden email]> wrote:
Thanks for the thoughtful response. I hope you'll tolerate one reply in the "abstract".

I am resisting the big change that occurred in 0.5. In 0.4 and earlier if one declares an array as an `AbstractArray` in a function then one knew that the indices were one based (a nice simple model, even if it is hated by many). In 0.5, if one wants to write general code, then one has to assume that arrays can have ANY indices. And one needs to write code using more abstract tools. This seems to me to be a large cost in complexity for the small subset of cases where offset indices are helpful. This is the core issue to me.

One way around this, it would seem, is to declare arrays as `::Array` and not `::AbstractArray` in functions (if one want to be sure they are `OneTo`). But then one gives up accepting many types of arrays that would pose no problem (e.g, DistributedArrays with OneTo indices). Thus, I'm proposing to have a high level abstract type that would capture all arrays types that can be assumed to be `OneTo`. Then one can write library functions against that type. This alone would help I think.

The auto-conversion is an extra step that I thought might work since it is (I think) low cost to convert an `OffsetArray` to a `OneTo` array . Thus if you passed an OffsetArray to a function that takes the abstract OneTo type (I kept its name AbstractArray above but that need not be its name) you expect that if it returned a `similar` array it would be of `OneTo` type. You would then have to convert it back to an OffsetArray type. It would probably be easy to write one line functions to do this for you. If it was sparse I assume it would remain so. It would be up to the writer of the new OffsetArray type to declare what type it would convert to in the OneTo array abstract tree.

Does this help?

Bob

ps I don't have any concrete examples yet because OffsetArrays are so scarce in the julia ecosystem. I'm just looking ahead to a world where it will no longer be possible to assume AbstractArrays are OneTo. This seems more painful than telling the 0-based crowd to code against 1-based arrays.
 

On Sat, Oct 29, 2016 at 10:40 AM, Tim Holy <[hidden email]> wrote:
I should add, abstract discussions are all well and good, but I fully
recognize you may have something very specific in mind and my answer simply
fails to understand your concern. If you can give one or more concrete
examples of where the current system is either bug-prone or a pain in the
neck, I'd love to see what you mean.

Best,
--Tim

On Thursday, October 27, 2016 5:21:26 PM CDT Bob Portmann wrote:
> TL;DR I think the type system should take care of converting between
> different types of arrays and thus free the user to code against the type
> of array they want (OneTo, ZeroTo, Offset, etc).
>
> Maybe I have a deep mis-understanding of how the new generalized offset
> arrays are suppose to work in Julia. If so I'd be happy to be set straight.
> If not then I think the present system will lead to unnecessary complexity
> and bug-ridden code. In the present system one can define array types that
> are not one-based but can have arbitrary offsets. This is a great idea but
> to make it work a very general system for indexing relative to the ends of
> the array has been devised. If one writes a function that accepts an
> `AbstractArray` then one MUST use this more general system or produce bugs
> when e.g. a `ZeroToArray` is passed in. This puts a large burden on writers
> of library code that works on arrays and I think is an unnecessary
> complexity.
>
> Since Fortran arrays are a nice example let look at the situation in
> Fortran. If I define an array in a function `real arr(-10:10)` then I would
> index it using indices that range from -10 to 10. If I pass this into
> another function that declared the array using its total size (usually
> passed in separately or in a parameter statement) `real arr(21)` then in
> that subroutine one would index the array using "normal" indices that range
> from 1 to 21. I.E., you state in the function how you want to treat the
> array and are not forced to work with in offset form unless you want to. In
> my opinion, this is what makes using offset arrays in Fortran sane and is
> the key thing that Julia should try to emulate.
>
> One way to get this behavior in Julia would be to use the type system and
> `convert`. Since `AbstractArray` has meant until 0.5 a `OneTo` array I
> think it is wise that it remain that way so that all old code will work
> unchanged (even with the new array types). Thus, I propose adding a new
> top-level type above AbstractArray type to capture all arrays something
> like:
>
> ```
> Array{T,N} <: DenseArray{T,N} <: AbstractArray{T,N} <:
> AbstractAllArray{T,N} <: Any
> ```
>
> And the offset arrays would be subtypes of `AbstractAllArrays` on a
> different branch
>
> ```
> OffsetArray{T,N} <: AbstractOffsetArray{T,N} <: AbstractAllArray{T,N} <: Any
> ```
>
> And similarly for `ZeroToArray`.
>
> Then, if one declares an Array as
>
> ```
> function func(arr::AbstractArray)
> ```
>
> one can safely assume it is a 1-based Array inside `func`. If an offset
> array is passed into `func` then it is automatically converted to a `OneTo`
> array inside `func`. This is the key point of this proposal and is similar
> to the auto-conversion of, e.g., Int to Floats in a function call.
> Similarly if one declares an array as a `ZeroToArray` in a function and
> passes in an `Array` then it will be converted to a `ZeroToArray` in the
> function. Some conversions would need more information and thus would be
> disallowed (e.g., `Array` to `OffsetArray`). I think this system would go a
> long way towards making `ZeroToArray`s and `OffsetArray`s simple to use in
> Julia w/o using the generalized indexing techniques introduced in Julia
> 0.5. Note that it would still be possible to use the `AbstractAllArray`
> type and accept all arrays w/o conversion and use the generalized indexing
> techniques.
>
> I'm curious what people think about this.
>
> Bob
>
> ps Paste into github to see in formatted form. I'm not sure how to get that
> in an email.






Reply | Threaded
Open this post in threaded view
|

Re: Reducing complexity of OffsetArrays

Bob Portmann
Yes. There is no question that there is tremendous potential in the system you have developed.

One last thing. I am surprised this does not error in 0.5:
```
julia> function t(a::AbstractArray)
    size(a)
end

julia> t(rand(5))
```
It seems to me if AbstractArray is going to represent all arrays then this needs to error on all arrays and not just OffsetArrays and there ilk. If people don't code against the abstract model then bug prone code will be the result. If that means that they have to change the function to:
```
julia> function t(a::Array)
    size(a)
end
```
until they are ready to fix the code then that is the price that must be paid. The alternative is a new AbstractArray type as I outline above.

Cheers,
Bob

On Tue, Nov 1, 2016 at 3:41 PM, Tim Holy <[hidden email]> wrote:
There's still a simple model: the indices are the *key* of the entry. Think about an array as a very special Dict. You could create a Dict with integer keys, let's say from -2:5. Presumably you wouldn't be exactly happy if there were some magical way that the number you originally stored as `d[-1] = 3.2` were alternatively accessible as `d[2]`, simply because the smallest index was -2 and therefore 3.2 is the "second" entry?

Like a Dict, for an array the value always goes with the key (the indices). Perhaps this will help:
```
julia> using OffsetArrays

julia> a = OffsetArray(rand(11), -5:5)
OffsetArrays.OffsetArray{Float64,1,Array{Float64,1}} with indices -5:5:
 0.815289 
 0.0043941
 0.00403153
 0.478065 
 0.150709 
 0.256156 
 0.934703 
 0.672495 
 0.428721 
 0.242469 
 0.43742  

julia> idx = OffsetArray(-1:1, -1:1)
OffsetArrays.OffsetArray{Int64,1,UnitRange{Int64}} with indices -1:1:
 -1
  0
  1

julia> b = a[idx]
OffsetArrays.OffsetArray{Float64,1,Array{Float64,1}} with indices -1:1:
 0.150709
 0.256156
 0.934703

julia> a[-1]
0.15070935766983662

julia> b[-1]
0.15070935766983662
```
So indexing `b = a[idx]` means that `b[j] = a[idx[j]]`. Does that help?

Best,
--Tim

On Tue, Nov 1, 2016 at 3:36 PM, Bob Portmann <[hidden email]> wrote:
Like I said, no real practical experience yet. The increase in complexity that I fear is the loss of, e.g., writing arr[2,3] and having it not be the element in the 2nd row and third column (i.e., the loss of a simple model of how things are laid out). Maybe my fears are unfounded. Others don't seem concerned it would seem.

I'll check out those packages that you mention.

Thanks,
Bob 

On Sun, Oct 30, 2016 at 2:29 PM, Tim Holy <[hidden email]> wrote:
I'm afraid I still don't understand the claimed big increment in complexity. First, let's distinguish "generic offset arrays" from the OffsetArrays.jl package. If you're happy using OffsetArrays, you don't have to write your own offset-array type. Being able to use an established & tested package reduces your burden a lot, and you can ignore the second half of the devdocs page entirely.

If you just want to *use* OffsetArrays.jl, the basic changes in coding style for writing indices-aware code are:

- any place you used to call `size`, you probably want to call `indices` instead (and likely make minor adjustments elsewhere, since `incides` returns a tuple-of-ranges---but such changes tend to be very obvious);
- check all uses of `similar`; some will stay as-is, other will migrate to `similar(f, inds)` style.

In my experience, that's just about it. The devdocs goes into quite a lot of detail to explain the rationale, but really the actual changes are quite small. While you can't quite do it via `grep` and `sed`, to me that just doesn't seem complicated.

Where the pain comes is that if you're converting old code, you sometimes have to think your way through it again---"hmm, what do I really mean by this index"? If your code had complicated indexing the first time you wrote it, unfortunately you're going to have to think about it carefully again; so in some cases, "porting" code is almost as bad as writing it the first time. However, if you write indices-aware code in the first place, in my experience the added burden is almost negligible, and in quite a few cases the ability to offset array indices makes things *easier* (e.g., "padding" an array on its edges is oh-so-much-clearer than it used to be, it's like a different world). That's the whole reason I implemented this facility in julia-0.5: to make life easier, not to make it harder. (Personally I think the whole argument over 0-based and 1-based indexing is stupid; it's the ability to use arbitrary indices that I find interesting & useful, and it makes most of my code prettier.)

For examples of packages that use OffsetArrays, check the following:
- CatIndices
- FFTViews
- ImageFiltering

ImageFiltering is a mixed bag: there's a small performance penalty in a few cases (even if you use @unsafe) because that LLVM doesn't always optimize code as well as it could in principle (maybe @polly will help someday...). Because image filtering is an extraordinarily performance-sensitive operation, there are a few places where I had to make some yucky hand optimizations.

Again, I'm very happy to continue this conversation---I'd really like to understand your concern, but without a concrete example I confess I'm lost.

Best,
--Tim


On Sat, Oct 29, 2016 at 8:44 PM, Bob Portmann <[hidden email]> wrote:
Thanks for the thoughtful response. I hope you'll tolerate one reply in the "abstract".

I am resisting the big change that occurred in 0.5. In 0.4 and earlier if one declares an array as an `AbstractArray` in a function then one knew that the indices were one based (a nice simple model, even if it is hated by many). In 0.5, if one wants to write general code, then one has to assume that arrays can have ANY indices. And one needs to write code using more abstract tools. This seems to me to be a large cost in complexity for the small subset of cases where offset indices are helpful. This is the core issue to me.

One way around this, it would seem, is to declare arrays as `::Array` and not `::AbstractArray` in functions (if one want to be sure they are `OneTo`). But then one gives up accepting many types of arrays that would pose no problem (e.g, DistributedArrays with OneTo indices). Thus, I'm proposing to have a high level abstract type that would capture all arrays types that can be assumed to be `OneTo`. Then one can write library functions against that type. This alone would help I think.

The auto-conversion is an extra step that I thought might work since it is (I think) low cost to convert an `OffsetArray` to a `OneTo` array . Thus if you passed an OffsetArray to a function that takes the abstract OneTo type (I kept its name AbstractArray above but that need not be its name) you expect that if it returned a `similar` array it would be of `OneTo` type. You would then have to convert it back to an OffsetArray type. It would probably be easy to write one line functions to do this for you. If it was sparse I assume it would remain so. It would be up to the writer of the new OffsetArray type to declare what type it would convert to in the OneTo array abstract tree.

Does this help?

Bob

ps I don't have any concrete examples yet because OffsetArrays are so scarce in the julia ecosystem. I'm just looking ahead to a world where it will no longer be possible to assume AbstractArrays are OneTo. This seems more painful than telling the 0-based crowd to code against 1-based arrays.
 

On Sat, Oct 29, 2016 at 10:40 AM, Tim Holy <[hidden email]> wrote:
I should add, abstract discussions are all well and good, but I fully
recognize you may have something very specific in mind and my answer simply
fails to understand your concern. If you can give one or more concrete
examples of where the current system is either bug-prone or a pain in the
neck, I'd love to see what you mean.

Best,
--Tim

On Thursday, October 27, 2016 5:21:26 PM CDT Bob Portmann wrote:
> TL;DR I think the type system should take care of converting between
> different types of arrays and thus free the user to code against the type
> of array they want (OneTo, ZeroTo, Offset, etc).
>
> Maybe I have a deep mis-understanding of how the new generalized offset
> arrays are suppose to work in Julia. If so I'd be happy to be set straight.
> If not then I think the present system will lead to unnecessary complexity
> and bug-ridden code. In the present system one can define array types that
> are not one-based but can have arbitrary offsets. This is a great idea but
> to make it work a very general system for indexing relative to the ends of
> the array has been devised. If one writes a function that accepts an
> `AbstractArray` then one MUST use this more general system or produce bugs
> when e.g. a `ZeroToArray` is passed in. This puts a large burden on writers
> of library code that works on arrays and I think is an unnecessary
> complexity.
>
> Since Fortran arrays are a nice example let look at the situation in
> Fortran. If I define an array in a function `real arr(-10:10)` then I would
> index it using indices that range from -10 to 10. If I pass this into
> another function that declared the array using its total size (usually
> passed in separately or in a parameter statement) `real arr(21)` then in
> that subroutine one would index the array using "normal" indices that range
> from 1 to 21. I.E., you state in the function how you want to treat the
> array and are not forced to work with in offset form unless you want to. In
> my opinion, this is what makes using offset arrays in Fortran sane and is
> the key thing that Julia should try to emulate.
>
> One way to get this behavior in Julia would be to use the type system and
> `convert`. Since `AbstractArray` has meant until 0.5 a `OneTo` array I
> think it is wise that it remain that way so that all old code will work
> unchanged (even with the new array types). Thus, I propose adding a new
> top-level type above AbstractArray type to capture all arrays something
> like:
>
> ```
> Array{T,N} <: DenseArray{T,N} <: AbstractArray{T,N} <:
> AbstractAllArray{T,N} <: Any
> ```
>
> And the offset arrays would be subtypes of `AbstractAllArrays` on a
> different branch
>
> ```
> OffsetArray{T,N} <: AbstractOffsetArray{T,N} <: AbstractAllArray{T,N} <: Any
> ```
>
> And similarly for `ZeroToArray`.
>
> Then, if one declares an Array as
>
> ```
> function func(arr::AbstractArray)
> ```
>
> one can safely assume it is a 1-based Array inside `func`. If an offset
> array is passed into `func` then it is automatically converted to a `OneTo`
> array inside `func`. This is the key point of this proposal and is similar
> to the auto-conversion of, e.g., Int to Floats in a function call.
> Similarly if one declares an array as a `ZeroToArray` in a function and
> passes in an `Array` then it will be converted to a `ZeroToArray` in the
> function. Some conversions would need more information and thus would be
> disallowed (e.g., `Array` to `OffsetArray`). I think this system would go a
> long way towards making `ZeroToArray`s and `OffsetArray`s simple to use in
> Julia w/o using the generalized indexing techniques introduced in Julia
> 0.5. Note that it would still be possible to use the `AbstractAllArray`
> type and accept all arrays w/o conversion and use the generalized indexing
> techniques.
>
> I'm curious what people think about this.
>
> Bob
>
> ps Paste into github to see in formatted form. I'm not sure how to get that
> in an email.







Reply | Threaded
Open this post in threaded view
|

Re: Reducing complexity of OffsetArrays

Tamas Papp
In reply to this post by Tim Holy
Is there a general recommendation for dealing with the case when a
function encounters two arrays with different index sets? Eg

OffsetArray(1:5,1:5) + OffsetArray(1:5, -2:2)

currently gives a DimensionMismatch exception via promote_shape, so
perhaps one should simply rely on what the latter does. But one can
imagine other options too (eg all unrepresented indices as 0, etc). If
there is a blog post or issue summarizing the options, I would be
interested in reading it (could not find anything).

On Tue, Nov 01 2016, Tim Holy wrote:

> There's still a simple model: the indices are the *key* of the entry. Think
> about an array as a very special Dict. You could create a Dict with integer
> keys, let's say from -2:5. Presumably you wouldn't be exactly happy if
> there were some magical way that the number you originally stored as `d[-1]
> = 3.2` were alternatively accessible as `d[2]`, simply because the smallest
> index was -2 and therefore 3.2 is the "second" entry?
>
> Like a Dict, for an array the value always goes with the key (the indices).
> Perhaps this will help:
> ```
> julia> using OffsetArrays
>
> julia> a = OffsetArray(rand(11), -5:5)
> OffsetArrays.OffsetArray{Float64,1,Array{Float64,1}} with indices -5:5:
>  0.815289
>  0.0043941
>  0.00403153
>  0.478065
>  0.150709
>  0.256156
>  0.934703
>  0.672495
>  0.428721
>  0.242469
>  0.43742
>
> julia> idx = OffsetArray(-1:1, -1:1)
> OffsetArrays.OffsetArray{Int64,1,UnitRange{Int64}} with indices -1:1:
>  -1
>   0
>   1
>
> julia> b = a[idx]
> OffsetArrays.OffsetArray{Float64,1,Array{Float64,1}} with indices -1:1:
>  0.150709
>  0.256156
>  0.934703
>
> julia> a[-1]
> 0.15070935766983662
>
> julia> b[-1]
> 0.15070935766983662
> ```
> So indexing `b = a[idx]` means that `b[j] = a[idx[j]]`. Does that help?
>
> Best,
> --Tim
>
> On Tue, Nov 1, 2016 at 3:36 PM, Bob Portmann <[hidden email]> wrote:
>
>> Like I said, no real practical experience yet. The increase in complexity
>> that I fear is the loss of, e.g., writing arr[2,3] and having it not be the
>> element in the 2nd row and third column (i.e., the loss of a simple model
>> of how things are laid out). Maybe my fears are unfounded. Others don't
>> seem concerned it would seem.
>>
>> I'll check out those packages that you mention.
>>
>> Thanks,
>> Bob
>>
>> On Sun, Oct 30, 2016 at 2:29 PM, Tim Holy <[hidden email]> wrote:
>>
>>> I'm afraid I still don't understand the claimed big increment in
>>> complexity. First, let's distinguish "generic offset arrays" from the
>>> OffsetArrays.jl package. If you're happy using OffsetArrays, you don't have
>>> to write your own offset-array type. Being able to use an established &
>>> tested package reduces your burden a lot, and you can ignore the second
>>> half of the devdocs page entirely.
>>>
>>> If you just want to *use* OffsetArrays.jl, the basic changes in coding
>>> style for writing indices-aware code are:
>>>
>>> - any place you used to call `size`, you probably want to call `indices`
>>> instead (and likely make minor adjustments elsewhere, since `incides`
>>> returns a tuple-of-ranges---but such changes tend to be very obvious);
>>> - check all uses of `similar`; some will stay as-is, other will migrate
>>> to `similar(f, inds)` style.
>>>
>>> In my experience, that's just about it. The devdocs goes into quite a lot
>>> of detail to explain the rationale, but really the actual changes are quite
>>> small. While you can't quite do it via `grep` and `sed`, to me that just
>>> doesn't seem complicated.
>>>
>>> Where the pain comes is that if you're converting old code, you sometimes
>>> have to think your way through it again---"hmm, what do I really mean by
>>> this index"? If your code had complicated indexing the first time you wrote
>>> it, unfortunately you're going to have to think about it carefully again;
>>> so in some cases, "porting" code is almost as bad as writing it the first
>>> time. However, if you write indices-aware code in the first place, in my
>>> experience the added burden is almost negligible, and in quite a few cases
>>> the ability to offset array indices makes things *easier* (e.g., "padding"
>>> an array on its edges is oh-so-much-clearer than it used to be, it's like a
>>> different world). That's the whole reason I implemented this facility in
>>> julia-0.5: to make life easier, not to make it harder. (Personally I think
>>> the whole argument over 0-based and 1-based indexing is stupid; it's the
>>> ability to use arbitrary indices that I find interesting & useful, and it
>>> makes most of my code prettier.)
>>>
>>> For examples of packages that use OffsetArrays, check the following:
>>> - CatIndices
>>> - FFTViews
>>> - ImageFiltering
>>>
>>> ImageFiltering is a mixed bag: there's a small performance penalty in a
>>> few cases (even if you use @unsafe) because that LLVM doesn't always
>>> optimize code as well as it could in principle (maybe @polly will help
>>> someday...). Because image filtering is an extraordinarily
>>> performance-sensitive operation, there are a few places where I had to make
>>> some yucky hand optimizations.
>>>
>>> Again, I'm very happy to continue this conversation---I'd really like to
>>> understand your concern, but without a concrete example I confess I'm lost.
>>>
>>> Best,
>>> --Tim
>>>
>>>
>>> On Sat, Oct 29, 2016 at 8:44 PM, Bob Portmann <[hidden email]>
>>> wrote:
>>>
>>>> Thanks for the thoughtful response. I hope you'll tolerate one reply in
>>>> the "abstract".
>>>>
>>>> I am resisting the big change that occurred in 0.5. In 0.4 and earlier
>>>> if one declares an array as an `AbstractArray` in a function then one knew
>>>> that the indices were one based (a nice simple model, even if it is hated
>>>> by many). In 0.5, if one wants to write general code, then one has to
>>>> assume that arrays can have ANY indices. And one needs to write code using
>>>> more abstract tools. This seems to me to be a large cost in complexity for
>>>> the small subset of cases where offset indices are helpful. This is the
>>>> core issue to me.
>>>>
>>>> One way around this, it would seem, is to declare arrays as `::Array`
>>>> and not `::AbstractArray` in functions (if one want to be sure they are
>>>> `OneTo`). But then one gives up accepting many types of arrays that would
>>>> pose no problem (e.g, DistributedArrays with OneTo indices). Thus, I'm
>>>> proposing to have a high level abstract type that would capture all arrays
>>>> types that can be assumed to be `OneTo`. Then one can write library
>>>> functions against that type. This alone would help I think.
>>>>
>>>> The auto-conversion is an extra step that I thought might work since it
>>>> is (I think) low cost to convert an `OffsetArray` to a `OneTo` array . Thus
>>>> if you passed an OffsetArray to a function that takes the abstract OneTo
>>>> type (I kept its name AbstractArray above but that need not be its name)
>>>> you expect that if it returned a `similar` array it would be of `OneTo`
>>>> type. You would then have to convert it back to an OffsetArray type. It
>>>> would probably be easy to write one line functions to do this for you. If
>>>> it was sparse I assume it would remain so. It would be up to the writer of
>>>> the new OffsetArray type to declare what type it would convert to in the
>>>> OneTo array abstract tree.
>>>>
>>>> Does this help?
>>>>
>>>> Bob
>>>>
>>>> ps I don't have any concrete examples yet because OffsetArrays are so
>>>> scarce in the julia ecosystem. I'm just looking ahead to a world where it
>>>> will no longer be possible to assume AbstractArrays are OneTo. This seems
>>>> more painful than telling the 0-based crowd to code against 1-based arrays.
>>>>
>>>>
>>>> On Sat, Oct 29, 2016 at 10:40 AM, Tim Holy <[hidden email]> wrote:
>>>>
>>>>> I should add, abstract discussions are all well and good, but I fully
>>>>> recognize you may have something very specific in mind and my answer
>>>>> simply
>>>>> fails to understand your concern. If you can give one or more concrete
>>>>> examples of where the current system is either bug-prone or a pain in
>>>>> the
>>>>> neck, I'd love to see what you mean.
>>>>>
>>>>> Best,
>>>>> --Tim
>>>>>
>>>>> On Thursday, October 27, 2016 5:21:26 PM CDT Bob Portmann wrote:
>>>>> > TL;DR I think the type system should take care of converting between
>>>>> > different types of arrays and thus free the user to code against the
>>>>> type
>>>>> > of array they want (OneTo, ZeroTo, Offset, etc).
>>>>> >
>>>>> > Maybe I have a deep mis-understanding of how the new generalized
>>>>> offset
>>>>> > arrays are suppose to work in Julia. If so I'd be happy to be set
>>>>> straight.
>>>>> > If not then I think the present system will lead to unnecessary
>>>>> complexity
>>>>> > and bug-ridden code. In the present system one can define array types
>>>>> that
>>>>> > are not one-based but can have arbitrary offsets. This is a great
>>>>> idea but
>>>>> > to make it work a very general system for indexing relative to the
>>>>> ends of
>>>>> > the array has been devised. If one writes a function that accepts an
>>>>> > `AbstractArray` then one MUST use this more general system or produce
>>>>> bugs
>>>>> > when e.g. a `ZeroToArray` is passed in. This puts a large burden on
>>>>> writers
>>>>> > of library code that works on arrays and I think is an unnecessary
>>>>> > complexity.
>>>>> >
>>>>> > Since Fortran arrays are a nice example let look at the situation in
>>>>> > Fortran. If I define an array in a function `real arr(-10:10)` then I
>>>>> would
>>>>> > index it using indices that range from -10 to 10. If I pass this into
>>>>> > another function that declared the array using its total size (usually
>>>>> > passed in separately or in a parameter statement) `real arr(21)` then
>>>>> in
>>>>> > that subroutine one would index the array using "normal" indices that
>>>>> range
>>>>> > from 1 to 21. I.E., you state in the function how you want to treat
>>>>> the
>>>>> > array and are not forced to work with in offset form unless you want
>>>>> to. In
>>>>> > my opinion, this is what makes using offset arrays in Fortran sane
>>>>> and is
>>>>> > the key thing that Julia should try to emulate.
>>>>> >
>>>>> > One way to get this behavior in Julia would be to use the type system
>>>>> and
>>>>> > `convert`. Since `AbstractArray` has meant until 0.5 a `OneTo` array I
>>>>> > think it is wise that it remain that way so that all old code will
>>>>> work
>>>>> > unchanged (even with the new array types). Thus, I propose adding a
>>>>> new
>>>>> > top-level type above AbstractArray type to capture all arrays
>>>>> something
>>>>> > like:
>>>>> >
>>>>> > ```
>>>>> > Array{T,N} <: DenseArray{T,N} <: AbstractArray{T,N} <:
>>>>> > AbstractAllArray{T,N} <: Any
>>>>> > ```
>>>>> >
>>>>> > And the offset arrays would be subtypes of `AbstractAllArrays` on a
>>>>> > different branch
>>>>> >
>>>>> > ```
>>>>> > OffsetArray{T,N} <: AbstractOffsetArray{T,N} <: AbstractAllArray{T,N}
>>>>> <: Any
>>>>> > ```
>>>>> >
>>>>> > And similarly for `ZeroToArray`.
>>>>> >
>>>>> > Then, if one declares an Array as
>>>>> >
>>>>> > ```
>>>>> > function func(arr::AbstractArray)
>>>>> > ```
>>>>> >
>>>>> > one can safely assume it is a 1-based Array inside `func`. If an
>>>>> offset
>>>>> > array is passed into `func` then it is automatically converted to a
>>>>> `OneTo`
>>>>> > array inside `func`. This is the key point of this proposal and is
>>>>> similar
>>>>> > to the auto-conversion of, e.g., Int to Floats in a function call.
>>>>> > Similarly if one declares an array as a `ZeroToArray` in a function
>>>>> and
>>>>> > passes in an `Array` then it will be converted to a `ZeroToArray` in
>>>>> the
>>>>> > function. Some conversions would need more information and thus would
>>>>> be
>>>>> > disallowed (e.g., `Array` to `OffsetArray`). I think this system
>>>>> would go a
>>>>> > long way towards making `ZeroToArray`s and `OffsetArray`s simple to
>>>>> use in
>>>>> > Julia w/o using the generalized indexing techniques introduced in
>>>>> Julia
>>>>> > 0.5. Note that it would still be possible to use the
>>>>> `AbstractAllArray`
>>>>> > type and accept all arrays w/o conversion and use the generalized
>>>>> indexing
>>>>> > techniques.
>>>>> >
>>>>> > I'm curious what people think about this.
>>>>> >
>>>>> > Bob
>>>>> >
>>>>> > ps Paste into github to see in formatted form. I'm not sure how to
>>>>> get that
>>>>> > in an email.
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>