• OSU Micro-Benchmarks 5.7.1 (5/11/21) [Tarball]
    • Please see CHANGES for the total changelog.
    • You might also check out the README for extra info.
    • The benchmarks can be found below the BSD license.
  • This web page incorporates descriptions of the next MPI, OpenSHMEM, UPC and
    UPC++ checks included within the OMB package deal:

    • Level-to-Level MPI Benchmarks: Latency, multi-threaded latency, multi-pair latency, a number of bandwidth /
      message price check bandwidth, bidirectional bandwidth
    • Collective MPI Benchmarks: Collective latency checks for varied MPI collective operations reminiscent of
      MPI_Allgather, MPI_Alltoall, MPI_Allreduce, MPI_Barrier, MPI_Bcast,
      MPI_Gather, MPI_Reduce, MPI_Reduce_Scatter, MPI_Scatter and vector
      collectives.
    • Non-Blocking Collective (NBC) MPI Benchmarks: Collective latency and Overlap checks for varied MPI collective operations reminiscent of
      MPI_Iallgather, MPI_Iallreduce, MPI_Ialltoall, MPI_Ibarrier, MPI_Ibcast,
      MPI_Igather, MPI_Ireduce, MPI_Iscatter and vector
      collectives.
    • One-sided MPI Benchmarks: one-sided put latency, one-sided put bandwidth,
      one-sided put bidirectional bandwidth, one-sided get latency, one-sided get bandwidth, one-sided
      accumulate latency, examine and swap latency, fetch and function and get_accumulate
      latency for MVAPICH2 (MPI-2 and MPI-3).
    • Level-to-Level OpenSHMEM Benchmarks:
      put latency, get latency, message price, atomics,
    • Collective OpenSHMEM Benchmarks:
      accumulate latency, broadcast latency, cut back latency, and barrier latency
    • Level-to-Level UPC Benchmarks: put latency, get latency
    • Collective UPC Benchmarks:
      broadcast latency, scatter latency, collect latency, all_gather latency, and trade latency
    • Level-to-Level UPC++ Benchmarks: async copy put latency, async copy get latency
    • Collective UPC++ Benchmarks:
      broadcast latency, scatter latency, collect latency, cut back latency, all_gather latency, and all_to_all latency
    • Startup Benchmarks:
      osu_init, osu_hello
  • CUDA, ROCm, and OpenACC Extensions to OMB
    • The next benchmarks have been prolonged to guage efficiency of
      MPI communication from and to buffers on NVIDIA and AMD GPU gadgets.

      • osu_bibw – Bidirectional Bandwidth Check
      • osu_bw – Bandwidth Check
      • osu_latency – Latency Check
      • osu_latency_mt – Multi-threaded Latency Check
      • osu_mbw_mr – A number of Bandwidth / Message Charge Check
      • osu_multi_lat – Multi-pair Latency Check
      • osu_put_latency – Latency Check for Put
      • osu_get_latency – Latency Check for Get
      • osu_put_bw – Bandwidth Check for Put
      • osu_get_bw – Bandwidth Check for Get
      • osu_put_bibw – Bidirectional Bandwidth Check for Put
      • osu_acc_latency – Latency Check for Accumulate
      • osu_cas_latency – Latency Check for Evaluate and Swap
      • osu_fop_latency – Latency Check for Fetch and Op
      • osu_allgather – MPI_Allgather Latency Check
      • osu_allgatherv – MPI_Allgatherv Latency Check
      • osu_allreduce – MPI_Allreduce Latency Check
      • osu_alltoall – MPI_Alltoall Latency Check
      • osu_alltoallv – MPI_Alltoallv Latency Check
      • osu_bcast – MPI_Bcast Latency Check
      • osu_gather – MPI_Gather Latency Check
      • osu_gatherv – MPI_Gatherv Latency Check
      • osu_reduce – MPI_Reduce Latency Check
      • osu_reduce_scatter – MPI_Reduce_scatter Latency Check
      • osu_scatter – MPI_Scatter Latency Check
      • osu_scatterv – MPI_Scatterv Latency Check
      • osu_iallgather – MPI_Iallgather Latency Check
      • osu_iallreduce – MPI_Iallreduce Latency Check
      • osu_ialltoall – MPI_Ialltoall Latency Check
      • osu_ibcast – MPI_Ibcast Latency Check
      • osu_igather – MPI_Igather Latency Check
      • osu_ireduce – MPI_Iallreduce Latency Check
      • osu_iscatter – MPI_Iscatter Latency Check
  • osu_latency – Latency Check
  • The latency checks are carried out in a ping-pong style. The sender
    sends a message with a sure information measurement to the receiver and waits for a
    reply from the receiver. The receiver receives the message from the sender
    and sends again a reply with the identical information measurement. Many iterations of this
    ping-pong check are carried out and common one-way latency numbers are
    obtained. Blocking model of MPI features (MPI_Send and MPI_Recv) are
    used within the checks.

  • osu_latency_mt – Multi-threaded Latency Check
  • The multi-threaded latency check performs a ping-pong check with a single
    sender course of and a number of threads on the receiving course of. On this check
    the sending course of sends a message of a given information measurement to the receiver
    and waits for a reply from the receiver course of. The receiving course of has
    a variable variety of receiving threads (set by default to 2), the place every
    thread calls MPI_Recv and upon receiving a message sends again a response
    of equal measurement. Many iterations are carried out and the typical one-way
    latency numbers are reported. Customers can modify the variety of speaking
    threads being utilized by utilizing the “-t” runtime choice. Examples:
    -t 4 // receiver threads = 4 and sender threads = 1
    -t 4:6 // sender threads = 4 and receiver threads = 6
    -t 2: // not outlined

  • osu_latency_mp – Multi-process Latency Check
  • The multi-process latency check performs a ping-pong check with a single
    sender course of and a single receiver course of, each having a number of
    youngster processes which are spawned utilizing the fork() system name. On this check
    the sending course of(father or mother) sends a message of a given information measurement to the
    receiver(father or mother) course of and waits for a reply from the receiver course of.
    Each the sending and receiving course of have a variable variety of youngster
    processes (set by default to 1 youngster course of), the place every youngster course of
    sleeps for two seconds after the fork name and exits. The father or mother processes
    perform the ping-pong check the place many iterations are carried out and the
    common one-way latency numbers are reported. This check is accessible right here.
    “-t” choice can be utilized to set the variety of sender and receiver processes
    together with the father or mother processes for use in a benchmark.

    Examples:
    -t 4 // receiver processes = 4 and sender processes = 1
    -t 4:6 // sender processes = 4 and receiver processes = 6
    -t 2: // not outlined

    The aim of this check is to examine if the underlying MPI communication
    runtime has taken care of fork security even when the appliance has not.

    A brand new atmosphere variable “MV2_SUPPORT_FORK_SAFETY” was launched with
    MVAPICH2 2.3.4 to make MVAPICH2 takes care of fork security for
    functions that require it.

    The assist for fork security is disabled by default in MVAPICH2 resulting from
    efficiency causes. When operating osu_latency_mp with MVAPICH2, set
    the atmosphere variable MV2_SUPPORT_FORK_SAFETY to 1. When operating
    osu_latency_mp with different MPI libraries that don’t assist fork security,
    set the atmosphere variables RDMAV_FORK_SAFE or IBV_FORK_SAFE to 1.

  • osu_bw – Bandwidth Check
  • The bandwidth checks have been carried out by having the sender sending out a
    fastened quantity (equal to the window measurement) of back-to-back messages to the
    receiver after which ready for a reply from the receiver. The receiver
    sends the reply solely after receiving all these messages. This course of is
    repeated for a number of iterations and the bandwidth is calculated primarily based on
    the elapsed time (from the time sender sends the primary message till the
    time it receives the reply again from the receiver) and the variety of bytes
    despatched by the sender. The target of this bandwidth check is to find out
    the utmost sustained date price that may be achieved on the community stage.
    Thus, non-blocking model of MPI features (MPI_Isend and MPI_Irecv) have been
    used within the check.

  • osu_bibw – Bidirectional Bandwidth Check
  • The bidirectional bandwidth check is much like the bandwidth check, besides
    that each the nodes concerned ship out a set variety of back-to-back
    messages and look ahead to the reply. This check measures the utmost
    sustainable mixture bandwidth by two nodes.

  • osu_mbw_mr – A number of Bandwidth / Message Charge Check
  • The multi-pair bandwidth and message price check evaluates the combination
    uni-directional bandwidth and message price between a number of pairs of
    processes. Every of the sending processes sends a set variety of messages
    (the window measurement) back-to-back to the paired receiving course of earlier than
    ready for a reply from the receiver. This course of is repeated for
    a number of iterations. The target of this benchmark is to find out the
    achieved bandwidth and message price from one node to a different node with a
    configurable variety of processes operating on every node.

  • osu_multi_lat – Multi-pair Latency Check
  • This check is similar to the latency check. Nevertheless, on the identical
    instantaneous a number of pairs are performing the identical check concurrently.
    To be able to carry out the check throughout simply two nodes the hostnames should
    be laid out in block style.

  • osu_allgather – MPI_Allgather Latency Check
  • osu_allgatherv – MPI_Allgatherv Latency Check
  • osu_allreduce – MPI_Allreduce Latency Check
  • osu_alltoall – MPI_Alltoall Latency Check
  • osu_alltoallv – MPI_Alltoallv Latency Check
  • osu_barrier – MPI_Barrier Latency Check
  • osu_bcast – MPI_Bcast Latency Check
  • osu_gather – MPI_Gather Latency Check
  • osu_gatherv – MPI_Gatherv Latency Check
  • osu_reduce – MPI_Reduce Latency Check
  • osu_reduce_scatter – MPI_Reduce_scatter Latency Check
  • osu_scatter – MPI_Scatter Latency Check
  • osu_scatterv – MPI_Scatterv Latency Check
    The most recent OMB model contains benchmarks for varied MPI blocking
    collective operations (MPI_Allgather, MPI_Alltoall, MPI_Allreduce,
    MPI_Barrier, MPI_Bcast, MPI_Gather, MPI_Reduce, MPI_Reduce_Scatter,
    MPI_Scatter and vector collectives). These benchmarks work within the
    following method. Suppose customers run the osu_bcast benchmark with N
    processes, the benchmark measures the min, max and the typical latency of
    the MPI_Bcast collective operation throughout N processes, for varied
    message lengths, over numerous iterations. Within the default
    model, these benchmarks report the typical latency for every message
    size. Moreover, the benchmarks supply the next choices:
    “-f” can be utilized to report further statistics of the benchmark,
    reminiscent of min and max latencies and the variety of iterations.
    “-m” choice can be utilized to set the minimal and most message size
    for use in a benchmark. Within the default model, the benchmarks
    report the latencies for as much as 1MB message lengths. Examples:
    -m 128 // min = default, max = 128
    -m 2:128 // min = 2, max = 128
    -m 2: // min = 2, max = default
    “-x” can be utilized to set the variety of warmup iterations to skip for every
    message size.
    “-i” can be utilized to set the variety of iterations to run for every message
    size.
    “-M” can be utilized to set per course of most reminiscence consumption. By
    default the benchmarks are restricted to 512MB allocations.
  • osu_iallgather – MPI_Iallgather Latency Check
  • osu_iallgatherv – MPI_Iallgatherv Latency Check
  • osu_iallreduce – MPI_Iallreduce Latency Check
  • osu_ialltoall – MPI_Ialltoall Latency Check
  • osu_ialltoallv – MPI_Ialltoallv Latency Check
  • osu_ialltoallw – MPI_Ialltoallw Latency Check
  • osu_ibarrier – MPI_Ibarrier Latency Check
  • osu_ibcast – MPI_Ibcast Latency Check
  • osu_igather – MPI_Igather Latency Check
  • osu_igatherv – MPI_Igatherv Latency Check
  • osu_ireduce – MPI_Ireduce Latency Check
  • osu_iscatter – MPI_Iscatter Latency Check
  • osu_iscatterv – MPI_Iscatterv Latency Check
    Along with the blocking collective latency checks talked about above, we
    present a number of non-blocking collectives (NBC): MPI_Iallgather, MPI_Iallgatherv,
    MPI_Iallreduce, MPI_Ialltoall, MPI_Ialltoallv, MPI_Ialltoallw, MPI_Ibarrier, MPI_Ibcast,
    MPI_Igather, MPI_Igatherv, MPI_Ireduce, MPI_Iscatter, and MPI_Iscatterv.
    These consider the identical metrics because the blocking operations in addition to the
    further metric `overlap’. That is outlined as the quantity of computation that may be
    carried out whereas the communication progresses within the background.
    These benchmarks have the extra choices:
    “-t” set the variety of MPI_Test() calls through the dummy computation, set
    CALLS to 100, 1000, or any quantity > 0.
    “-r” set the goal for dummy computation that imitates the impact of helpful
    computation that may be overlapped with the communication, as we offer CUDA-Conscious assist for NBC as nicely, this selection might be set to CPU, GPU, or
    BOTH.
    The next benchmarks have been prolonged to guage efficiency of
    MPI communication from and to buffers allotted utilizing CUDA Managed Reminiscence.

    • osu_bibw – Bidirectional Bandwidth Check
    • osu_bw – Bandwidth Check
    • osu_latency – Latency Check
    • osu_mbw_mr – A number of Bandwidth / Message Charge Check
    • osu_multi_lat – Multi-pair Latency Check
    • osu_allgather – MPI_Allgather Latency Check
    • osu_allgatherv – MPI_Allgatherv Latency Check
    • osu_allreduce – MPI_Allreduce Latency Check
    • osu_alltoall – MPI_Alltoall Latency Check
    • osu_alltoallv – MPI_Alltoallv Latency Check
    • osu_bcast – MPI_Bcast Latency Check
    • osu_gather – MPI_Gather Latency Check
    • osu_gatherv – MPI_Gatherv Latency Check
    • osu_reduce – MPI_Reduce Latency Check
    • osu_reduce_scatter – MPI_Reduce_scatter Latency Check
    • osu_scatter – MPI_Scatter Latency Check
    • osu_scatterv – MPI_Scatterv Latency Check
    Along with assist for communications to and from GPU reminiscences allotted
    utilizing CUDA or OpenACC, we now present further functionality of performing
    communications to and from buffers allotted utilizing the CUDA Managed Reminiscence
    idea. CUDA Managed (or Unified) Reminiscence permits functions to allocate
    reminiscence on both CPU or GPU reminiscences utilizing the cudaMallocManaged() name. This
    permits consumer oblivious switch of the reminiscence buffer between the CPU or GPU.
    Presently, we provide benchmarking with CUDA Managed Reminiscence utilizing the checks
    talked about above. These benchmarks have further choices:
    “M” allocates a ship or obtain buffer as managed for level to level communication.
    “-d managed” makes use of managed reminiscence buffers to carry out collective communications.
  • osu_put_latency – Latency Check for Put with Energetic/Passive Synchronization
  • The put latency benchmark contains window initialization operations
    (MPI_Win_create, MPI_Win_allocate and MPI_Win_create_dynamic) and
    synchronization operations (MPI_Win_lock/unlock, MPI_Win_flush,
    MPI_Win_flush_local, MPI_Win_lock_all/unlock_all,
    MPI_Win_Post/Begin/Full/Wait and MPI_Win_fence). For lively synchronization,
    suppose customers run with MPI_Win_Post/Begin/Full/Wait, the origin course of calls
    MPI_Put to immediately place information of a sure measurement within the distant course of’s window
    after which ready on a synchronization name (MPI_Win_complete) for completion. The distant
    course of participates in synchronization with MPI_Win_post and
    MPI_Win_wait calls. A number of iterations of this check is carried
    out and the typical put latency numbers is reported. The latency contains
    the synchronization time additionally. For passive synchronization, suppose customers run with
    MPI_Win_lock/unlock, the origin course of calls MPI_Win_lock to lock the
    goal course of’s window and calls MPI_Put to immediately place information of sure
    measurement within the window. Then it calls MPI_Win_unlock to make sure completion of the
    Put and launch lock on the window. That is carried out for a number of iterations and the
    common time for MPI_Lock + MPI_Put + MPI_Unlock calls is measured. The default
    window initialization and synchronization operations are MPI_Win_allocate and
    MPI_Win_flush. The benchmark affords the next choices:
    “-w create” use MPI_Win_create to create an MPI Window object.
    “-w allocate” use MPI_Win_allocate to create an MPI Window object.
    “-w dynamic” use MPI_Win_create_dynamic to create an MPI Window object.
    “-s lock” use MPI_Win_lock/unlock synchronizations calls.
    “-s flush” use MPI_Win_flush synchronization name.
    “-s flush_local” use MPI_Win_flush_local synchronization name.
    “-s lock_all” use MPI_Win_lock_all/unlock_all synchronization calls.
    “-s pscw” use Publish/Begin/Full/Wait synchronization calls.
    “-s fence” use MPI_Win_fence synchronization name.
    “-x” can be utilized to set the variety of warmup iterations to
    skip for every message size.
    “-i” can be utilized to set the variety of iterations to run for
    every message size.

  • osu_get_latency – Latency Check for Get with Energetic/Passive Synchronization
  • The get latency benchmark contains window initialization operations
    (MPI_Win_create, MPI_Win_allocate and MPI_Win_create_dynamic) and
    synchronization operations (MPI_Win_lock/unlock, MPI_Win_flush,
    MPI_Win_flush_local, MPI_Win_lock_all/unlock_all,
    MPI_Win_Post/Begin/Full/Wait and MPI_Win_fence). For lively synchronization,
    suppose customers run with MPI_Win_Post/Begin/Full/Wait, the origin
    course of calls MPI_Get to immediately fetch information of a sure measurement from the
    goal course of’s window into an area buffer. It then waits on a
    synchronization name (MPI_Win_complete) for native completion of the Will get.
    The distant course of participates in synchronization with MPI_Win_post and
    MPI_Win_wait calls. A number of iterations of this check is carried
    out and the typical get latency numbers is reported. The latency contains
    the synchronization time additionally. For passive synchronization, suppose customers run
    with MPI_Win_lock/unlock, the origin course of calls MPI_Win_lock to lock the
    goal course of’s window and calls MPI_Get to immediately learn information of sure
    measurement from the window. Then it calls MPI_Win_unlock to make sure completion of the
    Get and releases lock on distant window. That is carried out for a number of iterations and the
    common time for MPI_Lock + MPI_Get + MPI_Unlock calls is measured.
    The default window initialization and synchronization operations are
    MPI_Win_allocate and MPI_Win_flush. The benchmark affords the next choices:
    “-w create” use MPI_Win_create to create an MPI Window object.
    “-w allocate ” use MPI_Win_allocate to create an MPI Window object.
    “-w dynamic” use MPI_Win_create_dynamic to create an MPI Window object.
    “-s lock” use MPI_Win_lock/unlock synchronizations calls.
    “-s flush” use MPI_Win_flush synchronization name.
    “-s flush_local” use MPI_Win_flush_local synchronization name.
    “-s lock_all” use MPI_Win_lock_all/unlock_all synchronization calls.
    “-s pscw” use Publish/Begin/Full/Wait synchronization calls.
    “-s fence” use MPI_Win_fence synchronization name.

  • osu_put_bw – Bandwidth Check for Put with Energetic/Passive Synchronization
  • The put bandwidth benchmark contains window initialization operations
    (MPI_Win_create, MPI_Win_allocate and MPI_Win_create_dynamic) and
    synchronization operations (MPI_Win_lock/unlock, MPI_Win_flush,
    MPI_Win_flush_local, MPI_Win_lock_all/unlock_all,
    MPI_Win_Post/Begin/Full/Wait and MPI_Win_fence). For lively synchronization,
    suppose customers run with MPI_Win_Post/Begin/Full/Wait, the check
    is carried out by the origin course of calling a set variety of
    back-to-back MPI_Puts on distant window after which ready on a
    synchronization name (MPI_Win_complete) for his or her completion. The distant
    course of participates in synchronization with MPI_Win_post and
    MPI_Win_wait calls. This course of is repeated for a number of iterations and
    the bandwidth is calculated primarily based on the elapsed time and the variety of
    bytes put by the origin course of. For passive synchronization, suppose customers run
    with MPI_Win_lock/unlock, the origin course of calls MPI_Win_lock to lock the
    goal course of’s window and calls a set variety of back-to-back MPI_Puts to
    immediately place information within the window. Then it calls MPI_Win_unlock to make sure
    completion of the Places and launch lock on distant window. This course of is repeated for
    a number of iterations and the bandwidth is calculated primarily based on the elapsed
    time and the variety of bytes put by the origin course of. The default window
    initialization and synchronization operations are MPI_Win_allocate and MPI_Win_flush.
    The benchmark affords the next choices:
    “-w create” use MPI_Win_create to create an MPI Window object.
    “-w allocate” use MPI_Win_allocate to create an MPI Window object.
    “-w dynamic” use MPI_Win_create_dynamic to create an MPI Window object.
    “-s lock” use MPI_Win_lock/unlock synchronizations calls.
    “-s flush” use MPI_Win_flush synchronization name.
    “-s flush_local” use MPI_Win_flush_local synchronization name.
    “-s lock_all” use MPI_Win_lock_all/unlock_all synchronization calls.
    “-s pscw” use Publish/Begin/Full/Wait synchronization calls.
    “-s fence” use MPI_Win_fence synchronization name.

  • osu_get_bw – Bandwidth Check for Get with Energetic/Passive Synchronization
  • The get bandwidth benchmark contains window initialization operations
    (MPI_Win_create, MPI_Win_allocate and MPI_Win_create_dynamic) and
    synchronization operations (MPI_Win_lock/unlock, MPI_Win_flush,
    MPI_Win_flush_local, MPI_Win_lock_all/unlock_all,
    MPI_Win_Post/Begin/Full/Wait and MPI_Win_fence). For lively synchronization,
    suppose customers run with MPI_Win_Post/Begin/Full/Wait, the check
    is carried out by origin course of calling a set variety of back-to-back
    MPI_Gets after which ready on a synchronization name (MPI_Win_complete)
    for his or her completion. The distant course of participates in synchronization
    with MPI_Win_post and MPI_Win_wait calls. This course of is repeated for
    a number of iterations and the bandwidth is calculated primarily based on the elapsed
    time and the variety of bytes acquired by the origin course of. For passive
    synchronization, suppose customers run with MPI_Win_lock/unlock, the origin
    course of calls MPI_Win_lock to lock the goal
    course of’s window and calls a set variety of back-to-back MPI_Gets to
    immediately get information from the window. Then it calls MPI_Win_unlock to make sure
    completion of the Will get and launch lock on the window. This course of is
    repeated for a number of iterations and the bandwidth is calculated primarily based on
    the elapsed time and the variety of bytes learn by the origin course of.
    The default window initialization and synchronization operations are
    MPI_Win_allocate and MPI_Win_flush. The benchmark affords the next choices:
    “-w create” use MPI_Win_create to create an MPI Window object.
    “-w allocate” use MPI_Win_allocate to create an MPI Window object.
    “-w dynamic” use MPI_Win_create_dynamic to create an MPI Window object.
    “-s lock” use MPI_Win_lock/unlock synchronizations calls.
    “-s flush” use MPI_Win_flush synchronization name.
    “-s flush_local use MPI_Win_flush_local synchronization name.
    “-s lock_all use MPI_Win_lock_all/unlock_all synchronization calls.
    “-s pscw use Publish/Begin/Full/Wait synchronization calls.
    “-s fence use MPI_Win_fence synchronization.

  • osu_put_bibw – Bi-directional Bandwidth Check for Put with Energetic Synchronization
  • The put bi-directional bandwidth benchmark contains window initialization operations
    (MPI_Win_create, MPI_Win_allocate and MPI_Win_create_dynamic) and synchronization
    operations (MPI_Win_Post/Begin/Full/Wait and MPI_Win_fence).
    This check is much like the bandwidth check, besides that each the processes
    concerned ship out a set variety of back-to-back MPI_Puts and wait for his or her
    completion. This check measures the utmost sustainable mixture
    bandwidth by two processes. The default window initialization and synchronization
    operations are MPI_Win_allocate and MPI_Win_Post/Begin/Full/Wait. The benchmark
    affords the next choices:
    “-w create” use MPI_Win_create to create an MPI Window object.
    “-w allocate” use MPI_Win_allocate to create an MPI Window object.
    “-w dynamic” use MPI_Win_create_dynamic to create an MPI Window object.
    “-s pscw” use Publish/Begin/Full/Wait synchronization calls.
    “-s fence” use MPI_Win_fence synchronization name.

  • osu_acc_latency – Latency Check for Accumulate with Energetic/Passive Synchronization
  • The accumulate latency benchmark contains window initialization operations
    (MPI_Win_create, MPI_Win_allocate and MPI_Win_create_dynamic) and
    synchronization operations (MPI_Win_lock/unlock, MPI_Win_flush,
    MPI_Win_flush_local, MPI_Win_lock_all/unlock_all,
    MPI_Win_Post/Begin/Full/Wait and MPI_Win_fence). For lively synchronization,
    suppose customers run with MPI_Win_Post/Begin/Full/Wait, the origin
    course of calls MPI_Accumulate to mix information from the native buffer with
    the information within the distant window and retailer it within the distant window. The
    combining operation used within the check is MPI_SUM. The origin course of then
    waits on a synchronization name (MPI_Win_complete) for completion
    of the operations. The distant course of waits on a MPI_Win_wait name. A number of
    iterations of this check are carried out and the typical accumulate latency
    quantity is obtained. The latency contains the synchronization time additionally.
    For passive synchronization, suppose customers run with
    MPI_Win_lock/unlock, the origin course of calls MPI_Win_lock to lock the
    goal course of’s window and calls MPI_Accumulate to mix information from a
    native buffer with the information within the distant window and retailer it within the distant window.
    Then it calls MPI_Win_unlock to make sure completion of the Accumulate and launch
    lock on the window. That is carried out for a number of iterations and the
    common time for MPI_Lock + MPI_Accumulate + MPI_Unlock calls is
    measured. The default window initialization and synchronization operations are
    MPI_Win_allocate and MPI_Win_flush. The benchmark affords the next choices:
    “-w create” use MPI_Win_create to create an MPI Window object.
    “-w allocate” use MPI_Win_allocate to create an MPI Window object.
    “-w dynamic” use MPI_Win_create_dynamic to create an MPI Window object.
    “-s lock” use MPI_Win_lock/unlock synchronizations calls.
    “-s flush” use MPI_Win_flush synchronization name.
    “-s flush_local” use MPI_Win_flush_local synchronization name.
    “-s lock_all” use MPI_Win_lock_all/unlock_all synchronization calls.
    “-s pscw” use Publish/Begin/Full/Wait synchronization calls.
    “-s fence” use MPI_Win_fence synchronization name.

  • osu_cas_latency – Latency Check for Evaluate and Swap with Energetic/Passive Synchronization
  • The Compare_and_swap latency benchmark contains window initialization operations
    (MPI_Win_create, MPI_Win_allocate and MPI_Win_create_dynamic) and
    synchronization operations (MPI_Win_lock/unlock, MPI_Win_flush,
    MPI_Win_flush_local, MPI_Win_lock_all/unlock_all,
    MPI_Win_Post/Begin/Full/Wait and MPI_Win_fence). For lively synchronization,
    suppose customers run with MPI_Win_Post/Begin/Full/Wait,the origin course of
    calls MPI_Compare_and_swap to put one ingredient from origin buffer to focus on buffer.
    The preliminary worth within the goal buffer is returned to the calling course of. The origin course of then
    waits on a synchronization name (MPI_Win_complete) for native completion
    of the operations. The distant course of waits on a MPI_Win_wait name. A number of
    iterations of this check are carried out and the typical Compare_and_swap latency
    quantity is obtained. The latency contains the synchronization time additionally.
    For passive synchronization, suppose customers run with
    MPI_Win_lock/unlock, the origin course of calls MPI_Win_lock to lock the
    goal course of’s window and calls MPI_Compare_and_swap to put one ingredient
    from origin buffer to focus on buffer. The preliminary worth within the goal buffer
    is returned to the calling course of. Then it calls MPI_Win_flush to make sure completion of
    the Compare_and_swap. Ultimately, it calls MPI_Win_unlock to launch lock
    on the window. That is carried out for a number of iterations and the typical
    time for MPI_Compare_and_swap + MPI_Win_flush calls is measured. The default
    window initialization and synchronization operations are MPI_Win_allocate
    and MPI_Win_flush. The benchmark affords the next choices:
    “-w create” use MPI_Win_create to create an MPI Window object.
    “-w allocate” use MPI_Win_allocate to create an MPI Window object.
    “-w dynamic” use MPI_Win_create_dynamic to create an MPI Window object.
    “-s lock” use MPI_Win_lock/unlock synchronizations calls.
    “-s flush” use MPI_Win_flush synchronization name.
    “-s flush_local” use MPI_Win_flush_local synchronization name.
    “-s lock_all” use MPI_Win_lock_all/unlock_all synchronization calls.
    “-s pscw” use Publish/Begin/Full/Wait synchronization calls.
    “-s fence” use MPI_Win_fence synchronization name.

  • osu_fop_latency – Latency Check for Fetch and Op with Energetic/Passive Synchronization
  • The Fetch_and_op latency benchmark contains window initialization operations
    (MPI_Win_create, MPI_Win_allocate and MPI_Win_create_dynamic) and
    synchronization operations (MPI_Win_lock/unlock, MPI_Win_flush,
    MPI_Win_flush_local, MPI_Win_lock_all/unlock_all,
    MPI_Win_Post/Begin/Full/Wait and MPI_Win_fence). For lively synchronization,
    suppose customers run with MPI_Win_Post/Begin/Full/Wait, the origin course of calls
    MPI_Fetch_and_op to extend the ingredient in goal buffer by 1. The preliminary worth
    from the goal buffer is returned to the calling course of. The origin course of
    waits on a synchronization name (MPI_Win_complete) for completion of the
    operations. The distant course of waits on a MPI_Win_wait name. A number of
    iterations of this check are carried out and the typical Fetch_and_op latency
    quantity is obtained. The latency contains the synchronization time additionally.
    For passive synchronization, suppose customers run with
    MPI_Win_lock/unlock, the origin course of calls MPI_Win_lock to lock the
    goal course of’s window and calls MPI_Compare_and_swap to put one ingredient
    from origin buffer to focus on buffer. The preliminary worth within the goal buffer
    is returned to the calling course of. Then it calls MPI_Win_flush to make sure completion of
    the Compare_and_swap. Ultimately, it calls MPI_Win_unlock to launch lock
    on the window. That is carried out for a number of iterations and the typical
    time for MPI_Compare_and_swap + MPI_Win_flush calls is measured. The default
    window initialization and synchronization operations are MPI_Win_allocate
    and MPI_Win_flush. The benchmark affords the next choices:
    “-w create” use MPI_Win_create to create an MPI Window object.
    “-w allocate” use MPI_Win_allocate to create an MPI Window object.
    “-w dynamic” use MPI_Win_create_dynamic to create an MPI Window object.
    “-s lock” use MPI_Win_lock/unlock synchronizations calls.
    “-s flush” use MPI_Win_flush synchronization name.
    “-s flush_local” use MPI_Win_flush_local synchronization name.
    “-s lock_all” use MPI_Win_lock_all/unlock_all synchronization calls.
    “-s pscw” use Publish/Begin/Full/Wait synchronization calls.
    “-s fence” use MPI_Win_fence synchronization name.

  • osu_get_acc_latency – Latency Check for Get_accumulate with Energetic/Passive Synchronization
  • The Get_accumulate latency benchmark contains window initialization operations
    (MPI_Win_create, MPI_Win_allocate and MPI_Win_create_dynamic) and
    synchronization operations (MPI_Win_lock/unlock, MPI_Win_flush,
    MPI_Win_flush_local, MPI_Win_lock_all/unlock_all,
    MPI_Win_Post/Begin/Full/Wait and MPI_Win_fence). For lively synchronization,
    suppose customers run with MPI_Win_Post/Begin/Full/Wait, the origin course of
    calls MPI_Get_accumulate to mix information from the native buffer with
    the information within the distant window and retailer it within the distant window. The
    combining operation used within the check is MPI_SUM. The preliminary worth from the
    goal buffer is returned to the calling course of. The origin course of
    waits on a synchronization name (MPI_Win_complete) for native completion
    of the operations. The distant course of waits on a MPI_Win_wait name. A number of
    iterations of this check are carried out and the typical get accumulate latency
    quantity is obtained. The latency contains the synchronization time additionally.
    For passive synchronization, suppose customers run with
    MPI_Win_lock/unlock, the origin course of calls MPI_Win_lock to lock the
    goal course of’s window and calls MPI_Get_accumulate to mix information from a
    native buffer with the information within the distant window and retailer it within the distant window.
    The preliminary worth from the goal buffer is returned to the calling course of.
    Then it calls MPI_Win_unlock to make sure completion of the Get_accumulate and launch
    lock on the window. That is carried out for a number of iterations and the
    common time for MPI_Lock + MPI_Get_accumulate + MPI_Unlock calls is
    measured. The default window initialization and synchronization operations are
    MPI_Win_allocate and MPI_Win_flush. The benchmark affords the next choices:
    “-w create” use MPI_Win_create to create an MPI Window object.
    “-w allocate” use MPI_Win_allocate to create an MPI Window object.
    “-w dynamic” use MPI_Win_create_dynamic to create an MPI Window object.
    “-s lock” use MPI_Win_lock/unlock synchronizations calls.
    “-s flush” use MPI_Win_flush synchronization name.
    “-s flush_local” use MPI_Win_flush_local synchronization name.
    “-s lock_all” use MPI_Win_lock_all/unlock_all synchronization calls.
    “-s pscw” use Publish/Begin/Full/Wait synchronization calls.
    “-s fence” use MPI_Win_fence synchronization name.

  • The next benchmarks have been prolonged to guage efficiency of
    MPI communication from and to buffers on NVIDIA and AMD GPU gadgets.

    • osu_bibw – Bidirectional Bandwidth Check
    • osu_bw – Bandwidth Check
    • osu_latency – Latency Check
    • osu_mbw_mr – A number of Bandwidth / Message Charge Check
    • osu_multi_lat – Multi-pair Latency Check
    • osu_put_latency – Latency Check for Put
    • osu_get_latency – Latency Check for Get
    • osu_put_bw – Bandwidth Check for Put
    • osu_get_bw – Bandwidth Check for Get
    • osu_put_bibw – Bidirectional Bandwidth Check for Put
    • osu_acc_latency – Latency Check for Accumulate
    • osu_cas_latency – Latency Check for Evaluate and Swap
    • osu_fop_latency – Latency Check for Fetch and Op
    • osu_allgather – MPI_Allgather Latency Check
    • osu_allgatherv – MPI_Allgatherv Latency Check
    • osu_allreduce – MPI_Allreduce Latency Check
    • osu_alltoall – MPI_Alltoall Latency Check
    • osu_alltoallv – MPI_Alltoallv Latency Check
    • osu_bcast – MPI_Bcast Latency Check
    • osu_gather – MPI_Gather Latency Check
    • osu_gatherv – MPI_Gatherv Latency Check
    • osu_reduce – MPI_Reduce Latency Check
    • osu_reduce_scatter – MPI_Reduce_scatter Latency Check
    • osu_scatter – MPI_Scatter Latency Check
    • osu_scatterv – MPI_Scatterv Latency Check
    • osu_iallgather – MPI_Iallgather Latency Check
    • osu_iallgatherv – MPI_Iallgatherv Latency Check
    • osu_iallreduce – MPI_Iallreduce Latency Check
    • osu_ialltoall – MPI_Ialltoall Latency Check
    • osu_ialltoallv – MPI_Ialltoallv Latency Check
    • osu_ialltoallw – MPI_Ialltoallw Latency Check
    • osu_ibcast – MPI_Ibcast Latency Check
    • osu_igather – MPI_Igather Latency Check
    • osu_igatherv – MPI_Igatherv Latency Check
    • osu_ireduce – MPI_Ireduce Latency Check
    • osu_iscatter – MPI_Iscatter Latency Check
    • osu_iscatterv – MPI_Iscatterv Latency Check
  • The CUDA extensions are enabled when the benchmark suite is configured
    with –enable-cuda choice. The OpenACC extensions are enabled when
    –enable-openacc is specified. Whether or not a course of allocates its
    communication buffers on the GPU system or on the host might be managed at
    run-time.
  • Every of the pt2pt benchmarks takes two enter parameters. The primary
    parameter signifies the situation of the buffers at rank 0 and the second
    parameter signifies the situation of the buffers at rank 1. The worth of
    every of those parameters might be both ‘H’ or ‘D’ to point if the
    buffers are to be on the host or on the system respectively. When no
    parameters are specified, the buffers are allotted on the host.
  • The collective benchmarks will use buffers allotted on the system if
    the -d choice is used in any other case the buffers might be allotted on the host.
  • The non-blocking collective benchmarks may also use -t for MPI_Test()
    calls and -r choice for setting the goal of dummy computation.
  • osu_oshm_put – Latency Check for OpenSHMEM Put Routine
  • This benchmark measures latency of a shmem putmem operation for various
    information sizes. The consumer is required to pick whether or not the communication
    buffers needs to be allotted in world reminiscence or heap reminiscence, by means of a
    parameter. The check requires precisely two PEs. PE 0 points shmem putmem to
    write information at PE 1 after which calls shmem quiet. That is repeated for a
    fastened variety of iterations, relying on the information measurement. The typical
    latency per iteration is reported. A couple of warm-up iterations are run
    with out timing to disregard any start-up overheads. Each PEs name shmem
    barrier all after the check for every message measurement.

  • osu_oshm_put_nb – Latency Check for OpenSHMEM Non-blocking Put Routine
  • This benchmark measures the non-blocking latency of a shmem putmem_nbi
    operation for various information sizes. The consumer is required to pick
    whether or not the communication buffers needs to be allotted in world
    reminiscence or heap reminiscence, by means of a parameter. The check requires precisely
    two PEs. PE 0 points shmem putmem_nbi to write down information at PE 1 after which calls
    shmem quiet. That is repeated for a set variety of iterations, relying
    on the information measurement. The typical latency per iteration is reported.
    A couple of warm-up iterations are run with out timing to disregard any start-up
    overheads. Each PEs name shmem barrier all after the check for every message measurement.

  • osu_oshm_get – Latency Check for OpenSHMEM Get Routine
  • This benchmark is much like the one above besides that PE 0 does a shmem
    getmem operation to learn information from PE 1 in every iteration. The typical
    latency per iteration is reported.

  • osu_oshm_get_nb – Latency Check for OpenSHMEM Non-blocking Get Routine
  • This benchmark is much like the one above besides that PE 0 does a shmem
    getmem_nbi operation to learn information from PE 1 in every iteration. The typical
    latency per iteration is reported.

  • osu_oshm_put_mr – Message Charge Check for OpenSHMEM Put Routine
  • This benchmark measures the combination uni-directional operation price of
    OpenSHMEM Put between pairs of PEs, for various information sizes. The consumer
    ought to choose for communication buffers to be in world reminiscence and heap
    reminiscence as with the sooner benchmarks. This check requires variety of PEs
    to be even. The PEs are paired with PE 0 pairing with PE n/2 and so forth,
    the place n is the overall variety of PEs. The primary PE in every pair points
    back-to-back shmem putmem operations to its peer PE. The full time for
    the put operations is measured and operation price per second is reported.
    All PEs name shmem barrier all after the check for every message measurement.

  • osu_oshm_put_mr_nb – Message Charge Check for Non-blocking OpenSHMEM Put Routine
  • This benchmark measures the combination uni-directional operation price of
    OpenSHMEM Non-blocking Put between pairs of PEs, for various information sizes.
    The consumer ought to choose for communication buffers to be in world reminiscence
    and heap reminiscence as with the sooner benchmarks. This check requires quantity
    of PEs to be even. The PEs are paired with PE 0 pairing with PE n/2 and so forth,
    the place n is the overall variety of PEs. The primary PE in every pair points
    back-to-back shmem putmem_nbi operations to its peer PE till the window
    measurement. A name to shmem_quite is positioned after the window loop to make sure
    completion of the issued operations. The full time for the non-blocking
    put operations is measured and operation price per second is reported.
    All PEs name shmem barrier all after the check for every message measurement.

  • osu_oshm_get_mr_nb – Message Charge Check for Non-blocking OpenSHMEM Get Routine
  • This benchmark measures the combination uni-directional operation price of
    OpenSHMEM Non-blocking Get between pairs of PEs, for various information sizes.
    The consumer ought to choose for communication buffers to be in world reminiscence
    and heap reminiscence as with the sooner benchmarks. This check requires quantity
    of PEs to be even. The PEs are paired with PE 0 pairing with PE n/2 and so forth,
    the place n is the overall variety of PEs. The primary PE in every pair points
    back-to-back shmem getmem_nbi operations to its peer PE till the window
    measurement. A name to shmem_quite is positioned after the window loop to make sure
    completion of the issued operations. The full time for the non-blocking
    put operations is measured and operation price per second is reported.
    All PEs name shmem barrier all after the check for every message measurement.

  • osu_oshm_put_overlap – Non-blocking Message Charge Overlap Check
    This benchmark measures the combination uni-directional operations price
    overlap for OpenSHMEM Put between paris of PEs, for various information sizes.
    The consumer ought to choose for communication buffers to be in world reminiscence
    and heap reminiscence as with the sooner benchmarks. This check requires quantity
    of PEs. The benchmarks prints statistics for various phases of
    communication, computation and overlap ultimately.

  • osu_oshm_atomics – Latency and Operation Charge Check for OpenSHMEM Atomics Routines
    This benchmark measures the efficiency of atomic fetch-and-operate and
    atomic function routines supported in OpenSHMEM for the integer
    and lengthy datatypes. The buffers might be chosen to be in heap reminiscence or world
    reminiscence. The PEs are paired like within the case of Put Operation Charge
    benchmark and the primary PE in every pair points back-to-back atomic
    operations of a kind to its peer PE. The typical latency per atomic
    operation and the combination operation price are reported. That is
    repeated for every of fadd, finc, add, inc, cswap, swap, set, and fetch
    routines.

  • osu_oshm_collect – OpenSHMEM Gather Latency Check
  • osu_oshm_fcollect – OpenSHMEM FCollect Latency Check
  • osu_oshm_broadcast – OpenSHMEM Broadcast Latency Check
  • osu_oshm_reduce – OpenSHMEM Scale back Latency Check
  • osu_oshm_barrier – OpenSHMEM Barrier Latency Check
  • Collective Latency Assessments
  • The most recent OMB Model contains benchmarks for varied OpenSHMEM
    collective operations (shmem_collect, shmem_fcollect, shmem_broadcast,
    shmem_reduce and shmem_barrier). These benchmarks work within the following
    method. Suppose customers run the osu_oshm_broadcast benchmark with N
    processes, the benchmark measures the min, max and the typical latency of
    the shmem_broadcast collective operation throughout N processes, for varied
    message lengths, over numerous iterations. Within the default
    model, these benchmarks report the typical latency for every message
    size. Moreover, the benchmarks supply the next choices:
    “-f” can be utilized to report further statistics of the benchmark,
    reminiscent of min and max latencies and the variety of iterations.
    “-m” choice can be utilized to set the utmost message size for use in a
    benchmark. Within the default model, the benchmarks report the
    latencies for as much as 1MB message lengths.
    “-i” can be utilized to set the variety of iterations to run for every message
    size.

  • osu_upc_memput – Latency Check for UPC Put Routine
  • This benchmark measures the latency of UPC put operation between a number of UPC
    threads. On this benchmark, UPC threads with ranks lower than (THREADS/2)
    subject UPC memput operations to see UPC threads. Peer threads are recognized
    as (MYTHREAD+THREADS/2). That is repeated for a set variety of iterations, for
    various information sizes. The typical latency per iteration is reported. A couple of
    warm-up iterations are run with out timing to disregard any start-up overheads. All
    UPC threads name UPC barrier after the check for every message measurement.

  • osu_upc_memget – Latency Check for UPC Get Routine
  • This benchmark is analogous because the UPC put benchmark that’s described above.
    The distinction is that the shared string dealing with operate is upc_memget. The
    common get operation latency per iteration is reported.

osu_upc_all_barrier, upc_all_broadcast, osu_upc_all_exchange, osu_upc_all_gather_all, osu_upc_all_gather, osu_upc_all_reduce, and osu_upc_all_scatter

  • osu_upc_all_barrier – UPC Barrier Latency Check
  • osu_upc_all_broadcast – UPC Broadcast Latency Check
  • osu_upc_all_exchange – UPC Trade Latency Check
  • osu_upc_all_gather_all – UPC GatherAll Latency Check
  • osu_upc_all_gather – UPC Collect Latency Check
  • osu_upc_all_reduce – UPC Scale back Latency Check
  • osu_upc_all_scatter – UPC Scatter Latency Check
  • Collective Latency Assessments
  • The most recent OMB Model contains
    benchmarks for varied UPC collective operations
    (osu_upc_all_barrier, upc_all_broadcast, osu_upc_all_exchange,
    osu_upc_all_gather_all, osu_upc_all_gather, osu_upc_all_reduce,
    and osu_upc_all_scatter). These benchmarks work within the following
    method. Suppose customers run the osu_upc_all_broadcast benchmark with
    N processes, the benchmark measures the min, max and the typical
    latency of the upc_all_broadcast collective operation throughout N
    processes, for varied message lengths, over numerous
    iterations. Within the default model, these benchmarks report the
    common latency for every message size. Moreover, the
    benchmarks supply the next choices:
    “-f” can be utilized to report further statistics of the benchmark,
    reminiscent of min and max latencies and the variety of iterations.
    “-m” choice can be utilized to set the utmost message size for use in a
    benchmark. Within the default model, the benchmarks report the
    latencies for as much as 1MB message lengths.
    “-i” can be utilized to set the variety of iterations to run for every message
    size.

  • osu_upcxx_async_copy_put – Latency Check for UPC++ Put
  • This benchmark measures the latency of async_copy (memput) operation
    between a number of UPC++ threads. On this benchmark, UPC++ threads with ranks
    lower than (ranks()/2) copy information from their native reminiscence to their peer
    thread’s reminiscence utilizing async_copy operation. By altering the supply and vacation spot
    buffers in async_copy, we will mimic the habits of upc_memput and upc_memget.
    Peer threads are recognized as (myrank()+ranks()/2). That is
    repeated for a set variety of iterations, for various information sizes. The
    common latency per iteration is reported. A couple of warm-up iterations are run
    with out timing to disregard any start-up overheads. All UPC++ threads name
    barrier() operate after the check for every message measurement.

  • osu_upcxx_async_copy_get – Latency Check for UPC++ Get
  • Just like osu_upcxx_async_copy_put, this benchmark mimics the habits of
    upc_memget and measures the latency of async_copy (memget) operation
    between a number of UPC++ threads. The one distinction is that the supply and vacation spot
    buffers in async_copy are swapped. On this benchmark, UPC++ threads with
    ranks lower than (ranks()/2) copy information from their peer thread’s reminiscence
    to their native reminiscence utilizing async_copy operation. The remainder of the main points
    are identical as mentioned above. The typical get operation latency per
    iteration is reported.

osu_upcxx_bcast, osu_upcxx_reduce, osu_upcxx_allgather, osu_upcxx_gather,
osu_upcxx_scatter, osu_upcxx_alltoall

  • osu_upcxx_bcast – UPC++ Broadcast Latency Check
  • osu_upcxx_reduce – UPC++ Scale back Latency Check
  • osu_upcxx_allgather – UPC++ Allgather Latency Check
  • osu_upcxx_gather – UPC++ Collect Latency Check
  • osu_upcxx_scatter – UPC++ Scatter Latency Check
  • osu_upcxx_alltoall – UPC++ AlltoAll (trade) Latency Check
  • Collective Latency Assessments
  • The most recent OMB Model contains the
    following benchmarks for varied UPC++ collective operations (upcxx_reduce,
    upcxx_bcast, upcxx_gather, upcxx_allgather, upcxx_alltoall,
    upcxx_scatter). These benchmarks work within the following
    method. Suppose customers run the osu_upcxx_bcast benchmark with
    N processes, the benchmark measures the min, max and the typical
    latency of the upcxx_bcast collective operation throughout N
    processes, for varied message lengths, over numerous
    iterations. Within the default model, these benchmarks report the
    common latency for every message size. Moreover, the
    benchmarks supply the next choices:
    “-f” can be utilized to report further statistics of the benchmark,
    reminiscent of min and max latencies and the variety of iterations.
    “-m” choice can be utilized to set the utmost message size for use in a
    benchmark. Within the default model, the benchmarks report the
    latencies for as much as 1MB message lengths.
    “-i” can be utilized to set the variety of iterations to run for every message
    size.

  • osu_init – This benchmark measures the minimal,
    most, and common time every course of takes to finish MPI_Init.
  • osu_hello – This benchmark measures the time it
    takes for all processes to execute MPI_Init + MPI_Finalize.

osu_upc_all_barrier, upc_all_broadcast, osu_upc_all_exchange, osu_upc_all_gather_all, osu_upc_all_gather, osu_upc_all_reduce, and osu_upc_all_scatterosu_upcxx_bcast, osu_upcxx_reduce, osu_upcxx_allgather, osu_upcxx_gather, osu_upcxx_scatter, osu_upcxx_alltoall

Please notice that there are various other ways to measure these
efficiency parameters. For instance, the bandwidth check can have
totally different variations relating to the varieties of MPI calls (blocking
vs. non-blocking) getting used, complete variety of back-to-back messages
despatched in a single iteration, variety of iterations, and so forth. Different methods to
measure bandwidth might give totally different numbers. Readers are welcome to
use different checks, as acceptable to their utility environments.

See also  Introducing AVG Zen

Leave a Reply

Your email address will not be published.