STL set algorithms such as set_union and set_intersection pose a potential memory safety issue under GPU acceleration. These operations accept parameters for the start and end of both input sets and the start of the output set. In the CPU version, the output iterator is simply incremented each time a new member is added to the output set. Thus, the algorithm does not need to know the size of the output container (though this implementation could possibly cause a memory error to occur.
In the GPU version of these algorithms, the output set device structure and the final copy from the GPU output to the CPU output must be sized to cover the maximum possible output size. However, there is no guarantee that the programmer has sized the CPU output container to fit the maximum output size. For example, the programmer might know that the input sets to an union computation share many common members and use a smaller output container than theoretically necessary to reflect this knowledge. If the GPU version of set_union then copies the maximum possible number of elements to the output container, an out-of-bounds memory write can occur that would not have occurred in the CPU version. Therefore, the specification for a GPU set_union would have to require that the output container can fit the maximum number of output elements. In the case that the output container is not sized properly, the behavior would be undefined, as is true in many cases in the C++ specification.
In my implementation of GPU accelerated set operations, I assume that the programmer will have properly sized the output containers for the maximum number of output elements.
No comments:
Post a Comment