OSU Micro Benchmarks v7.4 (4/26/2024)

* New Features & Enhancements
    - Add support for new benchmarks to measure network congestion
        * osu_bw_fan_in, osu_bw_fan_out
    - Add new collective benchmarks
        * osu_reduce_scatter_block, osu_ireduce_scatter_block
    - Add support for custom percentile values to evaluate benchmark performance
    - Add support to log validation failures to following benchmarks
        * Point-to-point benchmarks
            * osu_bibw, osu_bw, osu_mbw_mr, osu_latency, osu_multi_lat,
            * osu_latency_mp, osu_latency_mt, osu_bw_persistent,
            * osu_bibw_persistent, osu_latency_persistent
        * Blocking collective benchmarks
            * osu_allgather, osu_allgatherv, osu_alltoall, osu_allreduce,
            * osu_alltoallv, osu_alltoallw, osu_bcast, osu_barrier, osu_gather,
            * osu_gatherv, osu_reduce, osu_reduce_scatter, osu_scatter,
            * osu_scatterv, osu_reduce_scatter_block
        * Non-Blocking collective benchmarks
            * osu_iallgather, osu_iallgatherv, osu_iallreduce, osu_ialltoall,
            * osu_ialltoallv, osu_ialltoallw, osu_ibcast, osu_ibarrier,
            * osu_igather, osu_igatherv, osu_ireduce, osu_iscatter,
            * osu_iscatterv, osu_ireduce_scatter, osu_ireduce_scatter_block
        * Neighborhood collective benchmarks
            * osu_neighbor_allgather, osu_neighbor_allgatherv,
            * osu_neighbor_alltoall,osu_neighbor_alltoallv,
            * osu_neighbor_alltoallw, osu_ineighbor_allgatherv,
            * osu_ineighbor_allgatherv, osu_ineighbor_alltoall,
            * osu_ineighbor_alltoallv, osu_ineighbor_alltoallw
        * Persistent collective benchmarks
            * osu_allgather_persistent, osu_allgatherv_persistent,
            * osu_allreduce_persistent, osu_alltoall_persistent,
            * osu_alltoallv_persistent, osu_alltoallw_persistent,
            * osu_barrier_persistent, osu_bcast_persistent,
            * osu_gather_persistent, osu_gatherv_persistent,
            * osu_reduce_persistent, osu_reduce_scatter_persistent,
            * osu_scatter_persistent, osu_scatterv_persistent,

* Bug Fixes
    - Fixed time measurement in osu_mbw_mr and osu_multi_lat benchmarks to
      consider execution time across all sender ranks.
    - Fixed validation failure when using managed device buffers.
        * Thanks to Van Man Nguyen @Eviden for report.
    - Fixed time measurement in osu_mbw_mr and osu_multi_lat benchmarks to
      consider execution time across all sender ranks.
    - Fixed multiple definition error when building with NCCL support.
        * Thanks to Klaus Noelp @FU Hagen for report.
    - Fixed bugs causing display of multiple/incorrect headers.
    - Fixed hang issue in benchmarks using multiple buffers when validation
      is enabled.
        * Thanks to Van Man Nguyen @Eviden for the report.
    - Fixed issue in osu_cas_latency benchmark that led to segmentation fault.
        * Thanks to Julien Duprat @Eviden for the report.
    - Fixed configure time failure while building with latest ROCm.
        * Thanks to Mahdieh Ghazimirsaeed @AMD for the report and patch.
    - Fixed device and managed memory allocation in one-sided benchmarks.
        * Thanks to Mikhail Brinskii @NVIDIA for the report.
    - Fixed bug in Reduce-Scatter benchmarks that led to data validation of
      only a part of the receive buffer.
    - Fixed NULL pointer error in device based one-sided benchmarks.
    - Fixed data validation issue with Reduce, Allreduce, Reduce_scatter
      benchmarks.
    - Updated configure help for "--enable-mpi4" to state that the MPI-4
      features are auto enabled.
        * Thanks to Antoine Morvan @Eviden for the report.
    - Fixed a bug in benchmarks that led to MPI_IN_PLACE support getting enabled
      only when validation is enabled.

OSU Micro Benchmarks v7.3 (10/30/2023)

* New Features & Enhancements
    - Add support for RCCL benchmarks
        * Thanks to Marcel Koch @KIT for the initial patch
        * Point-to-point benchmarks supported
            * osu_xccl_bibw, osu_xccl_bw, osu_xccl_latency
        * Collective benchmarks supported
            * osu_xccl_allgather, osu_xccl_allreduce, osu_xccl_alltoall,
            * osu_xccl_bcast, osu_xccl_reduce, osu_xccl_reduce_scatter
    - Add new benchmarks for persistent collectives
        * osu_allgather_persistent, osu_allgatherv_persistent,
        * osu_allreduce_persistent, osu_alltoall_persistent,
        * osu_alltoallv_persistent, osu_alltoallw_persistent,
        * osu_barrier_persistent, osu_bcast_persistent,
        * osu_gather_persistent, osu_gatherv_persistent,
        * osu_reduce_persistent, osu_reduce_scatter_persistent,
        * osu_scatter_persistent, osu_scatterv_persistent
    - Support new metrics to evaluate benchmark performance
        * 50th percentile tail latency/bandwidth
        * 95th percentile tail latency/bandwidth
        * 99th percentile tail latency/bandwidth

* Bug Fixes
    - Fixed acknowledgement buffer memory allocation issue in bandwidth related
      benchmarks
        * Thanks to Emmanuel BRELLE @Eviden for report and patch.
    - Fixed validation issue in osu_fop_latency.
        * Thanks to Coey Minear @HPE for report and patch.
    - Added support to managed buffers in one-sided collectives and
      one-sided point-to-point benchmarks

OSU Micro Benchmarks v7.2 (07/10/2023)

* New Features & Enhancements
    - Add MPI-4 session based initialization support to following benchmarks
        * Point-to-point benchmarks supported
            * osu_bibw, osu_bw, osu_mbw_mr, osu_latency, osu_multi_lat,
            * osu_latency_mp, osu_latency_mt, osu_bw_persistent,
            * osu_bibw_persistent, osu_latency_persistent
        * Blocking benchmarks supported
            * osu_allgather, osu_allgatherv, osu_alltoall, osu_allreduce,
            * osu_alltoallv, osu_alltoallw, osu_bcast, osu_barrier, osu_gather,
            * osu_gatherv, osu_reduce, osu_reduce_scatter, osu_scatter,
            * osu_scatterv
        * Non-Blocking benchmarks supported
            * osu_iallgather, osu_iallgatherv, osu_iallreduce, osu_ialltoall,
            * osu_ialltoallv, osu_ialltoallw, osu_ibcast, osu_ibarrier,
            * osu_igather, osu_igatherv, osu_ireduce, osu_iscatter,
            * osu_iscatterv, osu_ireduce_scatter
        * Neighborhood benchmarks
            * osu_neighbor_allgather, osu_neighbor_allgatherv,
            * osu_neighbor_alltoall,osu_neighbor_alltoallv,
            * osu_neighbor_alltoallw, osu_ineighbor_allgatherv,
            * osu_ineighbor_allgatherv, osu_ineighbor_alltoall,
            * osu_ineighbor_alltoallv, osu_ineighbor_alltoallw
        * Startup benchmarks
            * osu_init
    - Add MPI_IN_PLACE support for following blocking and non-blocking
      collectives
        * Blocking benchmarks supported
            * osu_allgather, osu_allgatherv, osu_alltoall, osu_allreduce,
            * osu_alltoallv, osu_alltoallw, osu_gather, osu_gatherv, osu_reduce,
            * osu_reduce_scatter, osu_scatter, osu_scatterv
        * Non-Blocking benchmarks supported
            * osu_iallgather, osu_iallgatherv, osu_iallreduce, osu_ialltoall,
            * osu_ialltoallv, osu_ialltoallw, osu_igather, osu_igatherv,
            * osu_ireduce, osu_iscatter, osu_iscatterv, osu_ireduce_scatter
    - Add an option to set root rank for rooted blocking and non-blocking
      collectives
        * Blocking benchmarks supported
            * osu_gather, osu_gatherv, osu_reduce, osu_scatter,
            * osu_scatterv
        * Non-Blocking benchmarks supported
            * osu_igather, osu_igatherv, osu_ireduce, osu_iscatter,
            * osu_iscatterv

* Bug Fixes
    - Fixed memory leak in point-to-point benchmarks when validation is enabled.
        * Thanks to Shi Jin @Amazon for report and patch.
    - Fixed missing '#' formatting bug in osu_ibarrier header.
        * Thanks to Nick Hagerty @ORNL for report.

OSU Micro Benchmarks v7.1 (04/06/2023)

* New Features & Enhancements
    - Add new blocking and non-blocking neighborhood collective benchmarks with
      support for CPU buffers.
        * Blocking benchmarks
          - osu_neighbor_allgather, osu_neighbor_allgatherv,
          - osu_neighbor_alltoall,osu_neighbor_alltoallv,
          - osu_neighbor_alltoallw
        * Non-Blocking benchmarks
          - osu_ineighbor_allgatherv, osu_ineighbor_allgatherv,
          - osu_ineighbor_alltoall, osu_ineighbor_alltoallv,
          - osu_ineighbor_alltoallw
    - Add new benchmarks for point-to-point persistent communication primitives.
        * osu_bw_persistent, osu_bibw_persistent, osu_latency_persistent
    - Extend collective and point-to-point benchmarks to support different MPI
      datatypes.
        * New data types supported - MPI_INT, MPI_FLOAT, MPI_CHAR
    - Add validation and MPI types support to one-sided benchmarks.
        * osu_acc_latency, osu_cas_latency, osu_fop_latency

* Bug Fixes (since v7.0)
    - Reverse order of calling MPI_Reduce to find out MIN and MAX comm times
        * Thanks to Devendar Bureddy @NVIDIA for the report and patch
    - Fix data validation issue when running wiht ROCm support on AMD GPUs
        * Thanks to Damon McDougall @AMD for the report and patch
    - Thanks to Luke Robison @Amazon for initial patch for one sided
      validation and MPI_TYPEs support.

OSU Micro Benchmarks v7.0 (11/10/2022)

* New Features & Enhancements
    - Add PAPI support for mpi pt2pt benchmarks
        * Enabled with the `-P` option
        * Benchmarks supported
            * osu_bibw, osu_bw, osu_mbw_mr, osu_latency, osu_multi_lat,
            * osu_latency_mp
    - Add PAPI support for mpi blocking and non-blocking collective
      benchmarks
        * Enabled with the `-P` option
        * Blocking benchmarks supported
            * osu_allgather, osu_allgatherv, osu_alltoall, osu_allreduce,
            * osu_alltoallv, osu_alltoallw, osu_bcast, osu_barrier, osu_gather,
            * osu_gatherv, osu_reduce, osu_reduce_scatter, osu_scatter,
            * osu_scatterv
        * Non-Blocking benchmarks supported
            * osu_iallgather, osu_iallgatherv, osu_iallreduce, osu_ialltoall,
            * osu_ialltoallv, osu_ialltoallw, osu_ibcast, osu_ibarrier,
            * osu_igather, osu_igatherv, osu_ireduce, osu_iscatter,
            * osu_iscatterv, osu_ireduce_scatter
    - Add PAPI support for mpi one-sided benchmarks
        * Enabled with the `-P` option
        * Benchmarks supported
            * osu_acc_latency, osu_cas_latency, osu_fop_latency,
            * osu_get_acc_latency, osu_get_bw, osu_get_latency, osu_put_bibw,
            * osu_put_bw, osu_put_latency
    - Add support for benchmarking non-blocking Reduce-Scatter.
        * osu_ireduce_scatter

* Bug Fixes (since v6.2)
    - Fixed a bug causing incorrect value for size in graph titles.

OSU Micro Benchmarks v6.2 (10/25/2022)

* New Features & Enhancements
    - Add graphing support for mpi pt2pt benchmarks
        * Enabled with the `-G` option
        * Supports tty, png, and pdf graph ouputs.
        * Benchmarks supported
            * osu_bibw, osu_bw, osu_mbw_mr, osu_latency, osu_multi_lat,
            * osu_latency_mp, osu_latency_mt
    - Add graphing support for mpi blocking and non-blocking collective
      benchmarks
        * Enabled with the `-G` option
        * Supports tty, png, and pdf graph ouputs.
        * Blocking benchmarks supported
            * osu_allgather, osu_allgatherv, osu_alltoall, osu_allreduce,
            * osu_alltoallv, osu_alltoallw, osu_bcast, osu_barrier, osu_gather,
            * osu_gatherv, osu_reduce, osu_reduce_scatter, osu_scatter,
            * osu_scatterv
        * Non-Blocking benchmarks supported
            * osu_iallgather, osu_iallgatherv, osu_iallreduce, osu_ialltoall,
            * osu_ialltoallv, osu_ialltoallw, osu_ibcast, osu_ibarrier,
            * osu_igather, osu_igatherv, osu_ireduce, osu_iscatter,
            * osu_iscatterv
    - Add graphing support for mpi one-sided benchmarks
        * Enabled with the `-G` option
        * Supports tty, png, and pdf graph ouputs.
        * Benchmarks supported
            * osu_acc_latency, osu_cas_latency, osu_fop_latency,
            * osu_get_acc_latency, osu_get_bw, osu_get_latency, osu_put_bibw,
            * osu_put_bw, osu_put_latency
    - Add support for benchmarking blocking Alltoallw.
        * osu_alltoallw

* Bug Fixes (since v6.1)
    - Fixed inconsistency in max_message_size between some blocking and
      non-blocking collectives.
        * Thanks to Krishna Kandalla @HPE for the report.
    - Correct skip value for small message sizes in osu_ialltoallw benchmark

OSU Micro Benchmarks v6.1 (09/16/2022)

* New Features & Enhancements
    - Add support for derived data types in mpi pt2pt benchmarks
        * Enabled with the `-D` option
        * Supports contiguous, vector, and indexed data types
        * Benchmarks supported
            * osu_bibw
            * osu_bw
            * osu_mbw_mr
            * osu_latency
            * osu_multi_lat
    - Add support for derived data types in mpi blocking and non-blocking
      collective benchmarks
        * Enabled with the `-D` option
        * Supports contiguous, vector, and indexed data types
        * Blocking benchmarks supported
            * osu_allgather
            * osu_allgatherv
            * osu_alltoall
            * osu_alltoallv
            * osu_bcast
            * osu_gather
            * osu_gatherv
            * osu_scatter
            * osu_scatterv
        * Non-Blocking benchmarks supported
            * osu_iallgather
            * osu_iallgatherv
            * osu_ialltoall
            * osu_ialltoallv
            * osu_ialltoallw
            * osu_ibcast
            * osu_igather
            * osu_igatherv
            * osu_iscatter
            * osu_iscatterv

* Bug Fixes (since v6.0)
    - Add support for data validation for ROCm benchmarks
        * Thanks to Amir Shehata @ORNL for the report and initial patch
    - Correct the version number in the released version
        * Thanks to Adam Goldman @Intel for the report

OSU Micro Benchmarks v6.0 (08/16/2022)

* New Features & Enhancements
    - Add support for Java pt2pt benchmarks
        * OSULatency - Latency Test
        * OSUBandwidth - Bandwidth Test
          - OSUBandwidthOMPI - Bandwidth Test for Open MPI Java Bindings
        * OSUBiBandwidth - Bidirectional Bandwidth Test
          - OSUBiBandwidthOMPI - BiBandwidth Test for Open MPI Java Bindings
    - Add support for Java collective benchmarks
        * OSUAllgather - MPI_Allgather Latency Test
        * OSUAllgatherv - MPI_Allgatherv Latency Test
        * OSUAllReduce - MPI_Allreduce Latency Test
        * OSUAlltoall - MPI_Alltoall Latency Test
        * OSUAlltoallv - MPI_Alltoallv Latency Test
        * OSUBarrier - MPI_Barrier Latency Test
        * OSUBcast - MPI_Bcast Latency Test
        * OSUGather - MPI_Gather Latency Test
        * OSUGatherv - MPI_Gatherv Latency Test
        * OSUReduce - MPI_Reduce Latency Test
        * OSUReduceScatter - MPI_Reduce_scatter Latency Test
        * OSUScatter - MPI_Scatter Latency Test
        * OSUScatterv - MPI_Scatterv Latency Test
    - Add support for Python pt2pt benchmarks
        * osu_latency - Latency Test
        * osu_bw - Bandwidth Test
        * osu_bibw - Bidirectional Bandwidth Test
        * osu_multi_lat - Multi-pair Latency Test
    - Add support for Python collective benchmarks
        * osu_allgather - MPI_Allgather Latency Test
        * osu_allgatherv - MPI_Allgatherv Latency Test
        * osu_allreduce - MPI_Allreduce Latency Test
        * osu_alltoall - MPI_Alltoall Latency Test
        * osu_alltoallv - MPI_Alltoallv Latency Test
        * osu_barrier - MPI_Barrier Latency Test
        * osu_bcast - MPI_Bcast Latency Test
        * osu_gather - MPI_Gather Latency Test
        * osu_gatherv - MPI_Gatherv Latency Test
        * osu_reduce - MPI_Reduce Latency Test
        * osu_reduce_scatter - MPI_Reduce_scatter Latency Test
        * osu_scatter - MPI_Scatter Latency Test
        * osu_scatterv - MPI_Scatterv Latency Test

* Bug Fixes (since v5.9)
    - Fix bug in data validation support for CUDA managed memory benchmarks
        - Thanks to Chris Chambreau @LLNL for the report

OSU Micro Benchmarks v5.9 (03/02/2022)

* New Features & Enhancements
    - Add support for data validation in mpi pt2pt benchmarks
        * Enabled with the `-c` option
    - Add support for data validation in mpi collective benchmarks
        * Enabled with the `-c` option
    - Add support for NCCL collective benchmarks
        * osu_nccl_alltoall

* Bug Fixes (since v5.8)
    - Remove osu_latency_mp & osu_latency_mt from list of benchmarks that
      have GPU support
        * Thanks to Carl Ponder @NVIDIA for reporting the issue.
    - Ignore the `--accelerator` option when using pt2pt benchmarks with GPU
      enabled builds.
        * Thanks to Adamn Goldman @Intel for the report

OSU Micro Benchmarks v5.8 (08/12/2021)

* New Features & Enhancements
    - Add support for NCCL pt2pt benchmarks
        * osu_nccl_bibw
        * osu_nccl_bw
        * osu_nccl_latency
    - Add support for NCCL collective benchmarks
        * osu_nccl_allgather
        * osu_nccl_allreduce
        * osu_nccl_bcast
        * osu_nccl_reduce
        * osu_nccl_reduce_scatter
    - Add data validation support for
        * osu_allreduce
        * osu_nccl_allreduce
        * osu_reduce
        * osu_nccl_reduce
        * osu_alltoall

* Bug Fixes (since v5.7.1)
    - Fix bug in support for CUDA managed memory benchmarks
        - Thanks to Adam Goldman @Intel for the report and the initial patch
    - Protect managed memory functionality with appropriate compile time flag

OSU Micro Benchmarks v5.7.1 (05/11/2021)

* New Features & Enhancements
    - Add support to send and receive data from different buffers for
      osu_latency, osu_bw, osu_bibw, and osu_mbw_mr
    - Enhance support for CUDA managed memory benchmarks
        - Thanks to Ian Karlin and Nathan Hanford @LLNL for the feedback
    - Add support to print minimum and maximum communication times for
      non-blocking benchmarks

* Bug Fixes (since v5.7)
    - Update README file with updated description for osu_latency_mp
        - Thanks to Honggang Li @RedHat for the suggestion
    - Fix error in setting benchmark name in osu_allgatherv.c and osu_allgatherv.c
        - Thanks to Brandon Cook @LBL for the report

OSU Micro Benchmarks v5.7 (12/11/2020)

* New Features & Enhancements
    - Add support to OMB to evaluate the performance of various primitives with
      AMD GPU device and ROCm support
        - This functionality is exposed when configured with --enable-rocm
          option
        - Thanks to AMD for the initial patch

* Bug Fixes (since v5.6.3)
    - Enhance one-sided window creation and fix a potential issue when using
       MPI_Win_allocate
        * Thanks to Bert Wesarg and George Katevenis for reporting the issue
          and providing initial patch
    - Remove additional '-M' option that gets printed with help message for
      osu_latency_mt and osu_latency_mp
        * Thanks to Nick Papior for the report
    - Added missing '-W' option support for one-sided bandwidth tests

OSU Micro Benchmarks v5.6.3 (06/01/2020)

* New Features & Enhancements
    - Add support for benchmarking applications that use 'fork' system call
        * osu_latency_mp

* Bug Fixes (since v5.6.2)
    - Allow passing flags to nvcc compiler
    - Fix compilation issue with IBM XLC++ compilers and CUDA 10.2
    - Fix issues in window creation with host-to-device and device-to-host
      transfers for one-sided tests

OSU Micro Benchmarks v5.6.2 (08/08/2019)

* New Features & Enhancements
    - Add support for benchmarking GPU-Aware multi-threaded point-to-point
      operations
        * osu_latency_mt

* Bug Fixes (since v5.6.1)
    - Fix issue with freeing in osu_get_bw benchmark
    - Fix issues with out of tree builds
        - Thanks to Joseph Schuchart@HLRS for reporting the issue
    - Fix incorrect header in osu_mbw_mr benchmark
    - Fix memory alignment for non-heap allocations in openshmem message rate
      benchmarks
        - Thanks to Yossi@Mellanox for pointing out the issue

OSU Micro Benchmarks v5.6.1 (03/15/2019)

* Bug Fixes (since v5.6)
    - Fix issue with latency computation in osu_latency_mt benchmark.

OSU Micro Benchmarks v5.6 (03/01/2019)

* New Features & Enhancements
    - Add support for benchmarking GPU-Aware multi-pair point-to-point
      operations
        * osu_mbw_mr
        * osu_multi_lat
    - Add support to specify different number of sender and receiver threads for
      osu_latency_mt benchmark

* Bug Fixes
    - Remove -t option for blocking collectives
    - Fix issue when only building OpenSHMEM benchmarks
    - Improve error reporting for one-sided benchmarks
    - Fix compilation issues with PGI compilers for GPU-Aware benchmarks
    - Fix incorrect handling of configure options
        - Thanks to Tony Curtis @StonyBrook for the report
    - Fix compilation warnings

OSU Micro Benchmarks v5.5 (11/10/18)
* New Features & Enhancements
    - Introduce new MPI non-blocking collective benchmarks with support to
      measure overlap of computation and communication for CPUs and GPUs
        - osu_ireduce
        - osu_iallreduce

OSU Micro Benchmarks v5.4.4 (09/21/18)
* Bug Fixes
    - Deprecate the need to specify get_local_rank when running GPU-based
      benchmarks built with MVAPICH2 and launched with mpirun_rsh
    - Enhance error checking for GPU-based benchmarks
    - Fix issues in GPU-based OpenACC benchmarks
    - Fix race condition with OpenSHMEM non-blocking and overlap benchmarks

OSU Micro Benchmarks v5.4.3 (07/23/18)
* Bug Fixes
    - Fix buffer overflow in osu_reduce_scatter
        - Thanks to Matias A Cabral @Intel for reporting the issue and patch
        - Thanks to Gilles Gouaillardet for creating the patch
    - Fix buffer overflow in one sided tests
        - Thanks to John Byrne @HPE for reporting this issue
    - Fix buffer overflow in multi threaded latency test
    - Fix issues with freeing buffers for one-sided tests
    - Fix issues with freeing buffers for CUDA-enabled tests
    - Fix warning messages for benchmarks that do not support CUDA and/or
      Managed memory
        - Thanks to Carl Ponder@NVIDIA for reporting this issue
    - Fix compilation warnings

OSU Micro Benchmarks v5.4.2 (04/30/18)
* New Features & Enhancements
    - Add "-W --window-size" option to osu_bw and osu_bibw

* Bug Fixes
    - Fix issues with out of tree builds
        - Thanks to Adam Moody @LLNL for reporting the issue
    - Fix PGI and XLC builds by using the correct generated cpp files
    - Fix crash with osu_mbw_mr for large messages
    - Fix minor error in Makefile
    - Fix compilation warnings

OSU Micro Benchmarks v5.4.1 (02/19/18)
* New Features & Enhancements
    - Enhanced help messages and runtime parameters

* Bug Fixes
    - Fix compile and runtime issues in PGAS benchmarks (OpenSHMEM, UPC, and
      UPC++) exposed by PGI compiler
    - Added warning message to display memory limitation when running benchmarks
      with very large messages
    - Fix memory leaks for device buffers
    - Fix issues with type overflows
    - Fix an issue with pWork symmetric heap allocation in oshm_reduce benchmark
        - Thanks to Naveen Ravichandrasekaran@Cray for the report

OSU Micro Benchmarks v5.4.0 (10/30/17)
* New Features & Enhancements
    - Introduce new OpenSHMEM Non-blocking Benchmarks
        * osu_oshm_get_mr_nb
        * osu_oshm_get_nb
        * osu_oshm_put_mr_nb
        * osu_oshm_put_nb
        * osu_oshm_put_overlap
    - Automatically build OpenSHMEM 1.3 benchmarks when library support
      is detected
    - Add ability to specify min and max message size for point-to-point
      and one-sided benchmarks
    - Enhanced error handling for MPI benchmarks
    - Code clean-ups and unification of utility functions across benchmarks
    - Enhanced help messages and runtime parameters

* Bug Fixes
    - Fix compile-time warnings
    - Fix peer calculation formula in UPC/UPC++ benchmarks
    - Fix correct number of warmup iterations in osu_barrier benchmark

OSU Micro Benchmarks v5.3.2 (09/08/16)
* New Features & Enhancements
    - Allow specifying very large message sizes (>2GB) for collective benchmarks

* Bug Fixes
    - Fix compilation errors due to missing type casting

OSU Micro Benchmarks v5.3.1 (08/08/16)
* New Features & Enhancements
    - Add option to control whether CUDA kernels are built
    - Add runtime option to specify number of threads for osu_latency_mt

* Bug Fixes
    - Check if -lrt or -lpthread is needed
    - Fix compilation warnings
    - Fix non-blocking collective memory leak
    - Correct documentation for osu_multi_lat

OSU Micro Benchmarks v5.3 (03/25/16)
* New Features & Enhancements
    - Introduce new UPC++ Benchmarks
        * osu_upcxx_allgather
        * osu_upcxx_alltoall
        * osu_upcxx_async_copy_get
        * osu_upcxx_async_copy_put
        * osu_upcxx_bcast
        * osu_upcxx_gather
        * osu_upcxx_reduce
        * osu_upcxx_scatter

* Bug Fixes
    - Determine page size at runtime in OpenSHMEM benchmarks (fixes issue seen
      on OpenPower machines)

OSU Micro Benchmarks v5.2 (02/05/16)
* New Features & Enhancements
    - Support for CUDA-Aware Managed memory
        * osu_bibw
        * osu_bw
        * osu_latency
        * osu_allgather
        * osu_allgatherv
        * osu_allreduce
        * osu_alltoall
        * osu_alltoallv
        * osu_bcast
        * osu_gather
        * osu_gatherv
        * osu_reduce
        * osu_reduce_scatter
        * osu_scatter
        * osu_scatterv
    - Add ability to specify minimum message size in addition to maximum
      message size for all collective benchmarks

OSU Micro Benchmarks v5.1 (11/10/15)
* New Features & Enhancements
    - Introduce non-blocking collective v-variants as well as ialltoallw
        * osu_iallgatherv
        * osu_ialltoallv
        * osu_igatherv
        * osu_iscatterv
        * osu_ialltoallw
    - Add support for benchmarking GPU-Aware non-blocking collectives.  Overlap
      can be computed using either CPU or GPU kernels
        * osu_iallgather
        * osu_iallgatherv
        * osu_ialltoall
        * osu_ialltoallv
        * osu_ialltoallw
        * osu_ibcast
        * osu_igather
        * osu_igatherv
        * osu_iscatter
        * osu_iscatterv
    - Allow users the ability to specify zero warmup iterations

* Bug Fixes
    - fix openacc pragma

OSU Micro Benchmarks v5.0 (08/17/15)
* New Features & Enhancements
    - Support for a set of non-blocking collectives. The benchmarks can display
      both the amount of time spent in the collectives and the amount of
      overlap achievable
        * osu_iallgather
        * osu_ialltoall
        * osu_ibarrier
        * osu_ibcast
        * osu_igather
        * osu_iscatter
    - Add startup benchmarks to facilitate the ability to measure the amount of
      time it takes for an MPI library to complete MPI_Init
        * osu_init
        * osu_hello
    - Allocate and align data dynamically
        - Thanks to Devendar Bureddy from Mellanox for the suggestion
    - Add options for number of warmup iterations [-x] and number of iterations
      used per message size [-i] to MPI benchmarks
        - Thanks to Devendar Bureddy from Mellanox for the suggestion

* Bug Fixes
    - Do not truncate user specified max memory limits
        - Thanks to Devendar Bureddy from Mellanox for the report and patch

OSU Micro Benchmarks v4.4.1 (10/30/14)
* Bug Fixes
    - adding missing MPI3 guard for WIN_ALLOCATE
    - capture getopt return value in an int instead of char

OSU Micro Benchmarks v4.4 (8/23/14)
* New Features & Enhancements
    - Support for MPI-3 RMA (one-sided) and atomic operations using GPU buffers
        * osu_acc_latency
        * osu_cas_latency
        * osu_fop_latency
        * osu_get_bw
        * osu_get_latency
        * osu_put_bibw
        * osu_put_bw
        * osu_put_latency

* Bug Fixes
    - remove use of AC_FUNC_MALLOC to avoid undefined rpl_malloc reference
    - add missing upc benchmarks for make dist rule

OSU Micro Benchmarks v4.3.1 (6/20/14)
* Bug Fixes
    - Fix typo in MPI collective benchmark help message
    - Explicitly mention that -m and -M parameters are specified in bytes

OSU Micro Benchmarks v4.3 (3/24/14)
* New Features & Enhancements
    - This new suite includes several new (or updated) benchmarks to measure
      performance of MPI-3 RMA communication operations with options to select
      different window creation (WIN_CREATE, WIN_DYNAMIC, and WIN_ALLOCATE) and
      synchronization functions (LOCK, PSCW, FENCE, FLUSH, FLUSH_LOCAL, and
      LOCK_ALL) in each benchmark
        * osu_acc_latency
        * osu_cas_latency
        * osu_fop_latency
        * osu_get_acc_latency
        * osu_get_bw
        * osu_get_latency
        * osu_put_bibw
        * osu_put_bw
        * osu_put_latency
    - New UPC Collective Benchmarks
        * osu_upc_all_barrier
        * osu_upc_all_broadcast
        * osu_upc_all_exchange
        * osu_upc_all_gather
        * osu_upc_all_gather_all
        * osu_upc_all_reduce
        * osu_upc_all_scatter
    - Build MPI3 benchmarks when MPI library support is detected

* Bug Fixes
    - Add shmem_quiet() in OpenSHMEM Message Rate benchmark to ensure all
      previously issued operations are completed
    - Allocate pWrk from symmetric heap in OpenSHMEM Reduce benchmark

OSU Micro Benchmarks v4.2 (11/08/13)
* New Features & Enhancements
    - New OpenSHMEM benchmarks
        * osu_oshm_fcollect
    - Enable handling of GPU device buffers in all MPI collective benchmarks
    - Add device binding for OpenACC benchmarks

* Bug Fixes
    - Add upc_fence after memput in osu_upc_memput benchmark
    - Correct CUDA configuration example in README
    - Fix several warnings

OSU Micro Benchmarks v4.1 (8/24/13)
* New Features & Enhancements
    - New OpenSHMEM benchmarks
        * osu_oshm_barrier
        * osu_oshm_broadcast
        * osu_oshm_collect
        * osu_oshm_reduce
    - New MPI-3 RMA Atomics benchmarks
        * osu_cas_flush
        * osu_fop_flush

OSU Micro Benchmarks v4.0.1 (5/06/13)
* Bug Fixes
    - Fix several warnings

OSU Micro Benchmarks v4.0 (4/16/13)
* New Features & Enhancements
    - Support buffer allocation using OpenACC and CUDA in osu_alltoall,
      osu_gather, and osu_scatter benchmarks
    - Limit amount of memory allocated by collective benchmarks dynamically
      based on number of processes
        - Memory limit can also be explicitly set by the user through the -m
          option
    - Support for 64-bit atomic operations in osu_oshm_atomics

* Bug Fixes
    - Fix numerical overflow error with reporting bandwidth in osu_mbw_mr

OSU Micro Benchmarks v3.9 (2/28/13)
* New Features & Enhancements
    - Support buffer allocation using OpenACC in GPU benchmarks
    - Use average time instead of max time for calculating the bandwidth and
      message rate in osu_mbw_mr
        - Thanks to Alex Mikheev from Mellanox for the patch
* Bug Fixes
    - Properly initialize host buffers for DH and HD transfers in GPU
      benchmarks

OSU Micro Benchmarks v3.8 (11/07/12)
* New Features & Enhancements
    - New UPC benchmarks
        * osu_upc_memput
        * osu_upc_memget

OSU Micro Benchmarks v3.7 (9/07/12)
* New Features & Enhancements
    - New OpenSHMEM benchmarks
        * osu_oshm_get
        * osu_oshm_put_mr
        * osu_oshm_atomics
        * osu_oshm_put
    - Organize installation directory according to benchmark type
* Bug Fixes
    - Destroy cuda context before exiting

OSU Micro Benchmarks v3.6 (4/30/12)
* New Features & Enhancements
    - New collective benchmarks
        * osu_allgather
        * osu_allgatherv
        * osu_allreduce
        * osu_alltoall
        * osu_alltoallv
        * osu_barrier
        * osu_bcast
        * osu_gather
        * osu_gatherv
        * osu_reduce
        * osu_reduce_scatter
        * osu_scatter
        * osu_scatterv
* Bug Fixes
    - Fix GPU binding issue when running with HH mode

OSU Micro Benchmarks v3.5.2 (3/22/12)
* Bug Fixes
    - Fix typo which led to use of incorrect buffers

OSU Micro Benchmarks v3.5.1 (2/02/12)
* New Features & Enhancements
    - Provide script to set GPU affinity for MPI processes
* Bug Fixes
    - Removed GPU binding after MPI_Init to avoid switching context

OSU Micro Benchmarks v3.5 (11/09/11)
* New Features & Enhancements
    - Extension of osu_latency, osu_bw, and osu_bibw benchmarks to evaluate the
      performance of MPI_Send/MPI_Recv operation with NVIDIA GPU device and
      CUDA support
        - This functionality is exposed when configured with --enable-cuda
          option
    - Flexibility for using buffers in NVIDIA GPU device (D) and host memory (H)
    - Flexibility for selecting data movement between D->D, D->H and H->D

OSU Micro Benchmarks v3.4 (09/13/11)
* New Features & Enhancements
    - Add passive one-sided communication benchmarks
    - Update one-sided communication benchmarks to provide shared memory hint
      in MPI_Alloc_mem calls
    - Update one-sided communication benchmarks to use MPI_Alloc_mem for buffer
      allocation
    - Give default values to configure definitions (can now build directly with
      mpicc)
    - Update latency benchmarks to begin from 0 byte message
* Bug Fixes
    - Remove memory leaks in one-sided communication benchmarks
    - Update benchmarks to touch buffers before using them for communication
    - Fix osu_get_bw test to use different buffers for concurrent communication
      operations
    - Fix compilation warnings
