APAC CIOOutlook

Advertise

with us

  • Technologies
      • Artificial Intelligence
      • Big Data
      • Blockchain
      • Cloud
      • Digital Transformation
      • Internet of Things
      • Low Code No Code
      • MarTech
      • Mobile Application
      • Security
      • Software Testing
      • Wireless
  • Industries
      • E-Commerce
      • Education
      • Logistics
      • Retail
      • Supply Chain
      • Travel and Hospitality
  • Platforms
      • Microsoft
      • Salesforce
      • SAP
  • Solutions
      • Business Intelligence
      • Cognitive
      • Contact Center
      • CRM
      • Cyber Security
      • Data Center
      • Gamification
      • Procurement
      • Smart City
      • Workflow
  • Home
  • CXO Insights
  • CIO Views
  • Vendors
  • News
  • Conferences
  • Whitepapers
  • Newsletter
  • Awards
Apac
  • Artificial Intelligence

    Big Data

    Blockchain

    Cloud

    Digital Transformation

    Internet of Things

    Low Code No Code

    MarTech

    Mobile Application

    Security

    Software Testing

    Wireless

  • E-Commerce

    Education

    Logistics

    Retail

    Supply Chain

    Travel and Hospitality

  • Microsoft

    Salesforce

    SAP

  • Business Intelligence

    Cognitive

    Contact Center

    CRM

    Cyber Security

    Data Center

    Gamification

    Procurement

    Smart City

    Workflow

Menu
    • HPC
    • Cyber Security
    • Hotel Management
    • Workflow
    • E-Commerce
    • Business Intelligence
    • MORE
    #

    Apac CIOOutlook Weekly Brief

    ×

    Be first to read the latest tech news, Industry Leader's Insights, and CIO interviews of medium and large enterprises exclusively from Apac CIOOutlook

    Subscribe

    loading

    THANK YOU FOR SUBSCRIBING

    • Home
    Editor's Pick (1 - 4 of 8)
    left
    Operation Excellence through Virtualization: America's Oldest Bank takes on the Future of VMWare Integration

    Jim Mignone, SVP & CIO and Daryl Clark, VP & Director of Technology, Washington Trust

    The Human Capital of High-Performance Computing

    Mike Fisk, Ex-CIO, Los Alamos National Laboratory

    Navigating Technology Transformations for CIO Success

    Steve Betts, SVP & CIO, Health Care Service Corporation

    CIOs Shouldn't See OpenStack and Public Clouds as an Either/or Proposition

    Shelton Shugar, CIO, Barclaycard [NYSE:BCS-D]

    Edge Computing: Does it Support Customer Intimacy and the Race to Digitization

    Mark Thiele, CIO & CSO, Apcera and Chairman of the IDCA Technical Committee

    Diligent IT Adoption for Maximum Gain

    Patrick Quinn, CIO, Acuity Brands Lighting

    HPC in the Cloud

    Mark Seager, Intel Fellow, CTO, the Intel Technical Computing Group Ecosystem

    Deep Learning and Future of Healthcare

    Sanjib Basak, Data Science Lead - Watson Health, IBM

    right

    Delivering high performance computing through interconnected solutions

    Charlie Foo, VP & General Manager, Asia Pacific/Japan, Mellanox

    Tweet
    content-image

    Charlie Foo, VP & General Manager, Asia Pacific/Japan, Mellanox

    We live in a time when the world is at the precipice of another industrial revolution. High Performance Computing (HPC), Artificial Intelligence (AI) and Digitalization have been accelerating at an unprecedented pace. Part of this phenomenon encompasses the steep growth because of increasing use cases and applications. The consequential emergence of data volume, movement, communication, and management has founded its centrality in day-to-day operations. Data had become the new currency and dominates how we transact in this age!

    HPC encompasses advanced computation over parallel processing, enabling faster execution of highly compute intensive tasks such as machine learning, climate research, molecular modeling, physical simulations, cryptanalysis, geophysical modeling, automotive and aerospace design, financial modeling, virtual reality and more. High-performance systems require the most efficient computing and storage platforms, and key performance metrics include the performance, efficiency, and scalability of the interconnected technology. Efficient HPC systems require high-bandwidth, low-latency connections, both between thousands of multi-processor nodes and to high-speed shared storage systems.

    Machine learning is a pillar of today’s technological world, offering solutions that enable better and more accurate decision making based on the great amounts of data being collected. Machine learning encompasses a wide range of applications, ranging from security, finance, and image and voice recognition, to self-driving cars, healthcare, and smart cities. Machine learning applications are based on training a deep neural network, which requires complex computations and fast and efficient data delivery.

    With computing and storage devices accelerating and operating at a very fast speed, the bottleneck tends to rest on the network. This leads to the emergence of a need for innovation in both, speed and smarts. By providing low-latency, high-bandwidth, high message rate, transport offload to facilitate extremely low CPU overhead, Remote Direct Memory Access (RDMA), and advanced communications and computations offloads, Mellanox’s interconnected solutions are the most deployed high-speed solutions for large-scale simulations by delivering the highest scalability, efficiency, and performance for HPC systems today and in the future. Mellanox smart offloading such as RDMA and GPUDirect capabilities can dramatically improve neural network training performance and overall machine learning applications.

    With demanding application requirements, HPC and AI clusters are getting larger. Some applications need millions of cores to run in parallel. The communications between computing cores, as well as computing and storage functionalities are becoming more critical for application performance.

    In-network computing, adaptive routing and self healing technology based smart network can improve application performance, avoid congestion and link failure issues in data center

    Low latency and high bandwidth are not enough for application anymore. New technologies were developed to improve the HPC and AI performances. One is in-network computing technology and another is network self-healing communication technology.

    In all distributed applications, the CPU does both application computing and communication computing. If the communication function occupies more CPU resources, the application will not have adequate CPU resources and vice versa. The best HPC and AI systems need to have the balance between the application and communication. Mellanox’s solution puts the communication computing in InfiniBand HCA (Host Channel Adapter) and switches, leaving all of the CPU resources to run the application. In-network computing technology performs data algorithms within the network devices, delivering ten times higher performance, and enabling the era of “data-centric” data centers. Using in-network computing technologies not only gives application more CPU resource, but also reduces CPU jitters on the application. Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) is the technology that delivers this. Please refer to Diagram 1, all of communication computing operations such as data synchronization, data movement, and collective operation between all end nodes (hosts) can be done in the aggregation nodes (switches), no need go to CPU. A good use case for SHARP is the Allreduce operation in machine learning. The switch can be used to implement the Allreduce and avoid the many to one communication between workers to parameter.

    In HPC and AI systems, the communication between computing node and storage should be the reliable communication. This means they should not have the packet drop during the communication between any nodes. However, there is a propensity of network congestions caused by the communication models of HPC and AI, such as the Allreduce communication. These congestions will lead to the packet drop. Smart solutions like AR (Adaptive Routing) and SHIELD (Self Healing Communication Technology) from Mellanox not only resolve congestion issues, but also help the network avoid the congestion and packet drop. AR uses the specific packet (adaptive routing notification) to detect the congestion of next few hops before the switch sends the actual data packet to next hop. If that packet indicates that the congestion may happen in following hops, then switch can automatically change the routing path to new path to avoid the congestion happening in the original path. Please refer to picture 2.

    SHIELD resolves the link failure issue between the sender and receiver at switch level without the application being aware. If there is a link failure, such as cable, transceiver or switch port issues, the closest switch can identify the failure and send the special packet (fault recovery notification) back to uplink switches until the right switch is identified and the packet is re-routed to a healthy link. If we rely on application to identify the link failure, it takes about 5 to 30 seconds in 1K to 10K node clusters. SHIELD can detect and fix the link failure within a few milliseconds without application involvement. Please refer to picture 3.

    In summary, acceleration of the computing, storage and smart network (adaptive routing, fabric automation and recovery) has become imperative in this age of HPC, AI and Digitalization, when there is no room for performance compromise. Mellanox's scalable HPC and AI interconnect solutions are paving the road to Exascale computing by delivering the highest scalability, efficiency, and performance for HPC and AI systems today and in the future.

    tag

    Machine Learning

    Financial

    Aerospace

    High Performance Computing

    IoT

    Virtual Reality

    Voice Recognition

    Weekly Brief

    loading
    Top 10 HPC Companies - 2020
    ON THE DECK

    HPC 2020

    I agree We use cookies on this website to enhance your user experience. By clicking any link on this page you are giving your consent for us to set cookies. More info

    Read Also

    Building Agile, Secure and Human-Centered IT at Globe

    Building Agile, Secure and Human-Centered IT at Globe

    Raul Macatangay, Chief Information Officer, Globe Telecom
    Digital Hands, Human Focus: Rethinking Productivity with Automation and AI

    Digital Hands, Human Focus: Rethinking Productivity with Automation and AI

    Samuel Budianto, Head Of Information Technology, Time International
    Transforming Cybersecurity Leadership in Critical Industries

    Transforming Cybersecurity Leadership in Critical Industries

    Joel Earnshaw, Senior Manager, Cybersecurity, Perenti
    The Blueprint behind Modernizing Branch Networks

    The Blueprint behind Modernizing Branch Networks

    Ronaldo S. Batisan, Senior Vice President - Branch Channel Management Head Of Union Bank Of The Philippines
    The Blueprint behind Modernizing Branch Networks

    The Blueprint behind Modernizing Branch Networks

    Ronaldo S. Batisan, Senior Vice President - Branch Channel Management Head Of Union Bank Of The Philippines
    Meeting Business Travel Demands with Intelligent Platforms

    Meeting Business Travel Demands with Intelligent Platforms

    Zamil Murji, Chief Technology Officer, Corporate Travel Management – Asia
    From Friction to Function: How Winc Turned Customer Feedback into Business Growth

    From Friction to Function: How Winc Turned Customer Feedback into Business Growth

    Cara Pring, Digital & Cx Director, Winc Australia
    Why Contact Centres are Becoming Strategic Hubs for Social Insight

    Why Contact Centres are Becoming Strategic Hubs for Social Insight

    Cindy Chaimowitz, GM Wholesale & Customer Service and Karen Smith, Head of Customer Service, Foodstuffs North Island
    Loading...
    Copyright © 2025 APAC CIOOutlook. All rights reserved. Registration on or use of this site constitutes acceptance of our Terms of Use and Privacy and Anti Spam Policy 

    Home |  CXO Insights |   Whitepapers |   Subscribe |   Conferences |   Sitemaps |   About us |   Advertise with us |   Editorial Policy |   Feedback Policy |  

    follow on linkedinfollow on twitter follow on rss
    This content is copyright protected

    However, if you would like to share the information in this article, you may use the link below:

    https://hpc.apacciooutlook.com/cxoinsights/delivering-high-performance-computing-through-interconnected-solutions-nwid-5802.html