Making AI workloads deployable & scalable for a simpler and faster way to tackle all your AI & deep leaning infrastructure.

Clusters – Full Solutions Stack Provider

Accelerate your team’s AI progress with a with our plug & play rack integration solutions that are delivered with our optimized, lab tested & verified GPU clusters.

We’re your single vendor managing your entire cluster for compute, storage, and networking workloads.

Save time and effort with our fully populated racks delivered to you completely assembled, racked, stacked, cabled, and labeled. Once your components have been set up, the system can be powered on, tested, and integrated into your environment.

Equus helps you eliminate the labor-intensive tasks required to install new server clusters, storage devices and networking equipment by utilizing our experience delivering scalable rack solutions. You will feel at ease knowing that you’ll get consistency and quality, regardless of the specific requirements for your deployment.

Accelerated GPU Compute

Accelerate your applications with hybrid compute building blocks using GPU dense nodes and the Nvidia Cuda toolkit, plus NVlink interconnects for paralleled processing power to help you boost the performance of your computational output.

See our list of AI servers here

Fast, High-Density Storage

Delivers unrivaled storage performance for your GPU-intensive workloads. We can help you get the performance and scalability of an object-oriented parallel file system, with the simplicity of an all-flash NAS appliance, or the economics of HDD-based archive storage. Delivering storage performance to meet the need of any workload and management simple.

High-Speed Networking

We will help you accelerate your best-in-class multi-node compute, data access, and automation to reduce complexity while maximizing your cluster’s reliability. Our engineers will help you design optimal configuration regardless of whether we’re installing ethernet, DPU’s or cables and transceivers.

Management Tools

Our clusters are designed and tailored for your workloads. Each solution is complete with provisioning, workload management, and a variety of prescripted SDK frameworks for performance-optimized AI/HPC applications and storage software to meet your requirements.

Clustering With Weka

Which is an NVME-native, resilient, parallel, and scalable file storage system. Weka will manage the retrieval of data between an operating system and a storage server and support GPU, and CPU based clusters designed for maximum performance and scalability.

Weka Options:

PCI-Express (PCI-E) 8x AS -1114S-WN10RT (1U 1-Node) 8x AS -2114S-WN24RT (2U 1-Node) x SYS-220BT-HNTR (2U 4-Node) 4x SYS-220BT-DNTR (2U 2-Node)
Form Factor 8U 16U 4U 8U
Usable Capacity 363TB 726TB 218TB 436TB
Storage Media NVMe* 80x (10x 7.68TB/node) KIOXIA Gen4 10x (20x 7.68TB/node) KIOXIA Gen4 48x (6x 7.68TB/node) KIOXIA Gen4 96x (12x 7.68TB/node) KIOXIA Gen4
CPU 8x AMD EPYC™ 7402P Processors** 8x AMD EPYC™ 7402P Processors** 16x Intel® Xeon® Silver 4314 Processors 16x Intel® Xeon® Silver 4314 Processors
Memory 2048GB 2048GB 2048GB 2048GB
Network 16x single-port Mellanox CX-6 200G VPI 16x single-port Mellanox CX-6 200G VPI 16x single-port Mellanox CX-6 200G VPI 16x single-port Mellanox CX-6 200G VPI
IOPS*** 10.2M (1.275M per node) 8.4M (1.05M per node) 8.5M (1.06M per node) 11.28M (1.41M per node)
Bandwidth*** 235GB/s (29.4GB/s per node) 280GB/s (35GB/s per node) 202GB/s (25GB/s per node) 286GB/s (35.8GB/s per node)
I/O Latency < 200μs < 200μs < 200μs < 200μs
SW Subscription 1yr, 3yrs, 5yrs 1yr, 3yrs, 5yrs 1yr, 3yrs, 5yrs 1yr, 3yrs, 5yrs

Putting all the Pieces Together

You can leverage our full turnkey solution services and save time with our fully populated equipment delivered to you completely assembled, racked, stacked, cabled, and labeled and ready to roll. Each of our clusters goes through extensive design and testing, so that all you have to do is simply put it in and you’re ready to go.

Our rack integration services allow a simpler way to cluster and scale

  • Faster Deployment
  • Reduced Costs
  • Near Zero Defects Builds
  • Increased Configuration Control
  • Improved Testing Capabilities & Automation

Rack Integration Process

Our process consists of six steps for full rack turnkey solutions.

Design

Review Requirements
Application Analysis
Power Budget
BOM Creation
Rack Layout
Cabling Diagram
Network Design Review

Assemble

Material Planning
Node Assembly
Rack & Stack
Network & Power
Cable & Label
Subcomponent
Serialization Capture

Configure

Bios Setting
Firmware Management
Provision Switch & IP Address
OS, Customer Image & Software Load

Test

Component Pretesting
Vendor Interoperability
Full Rack Burn-IN & QA
Acceptance Testing
Full Rack Test Report & Audit Report
End-To-End Performance Report

Logistics

Asset Labeling & Documentation
Prepack Assessment
Crating or Kitting
Specialty Packaging (Air-Ride, Anti Tip)
White Glove Services

Install

Optional Site-Survey & Assessment
Onsite Installation Manual Creation
Installation (Global Smart Hands)
Onsite Troubleshooting and Diagnosis

Full Rack Testing Automation

Pretesting

Functionality testing of all components used in the assembled rack are tested for 100% verification based on expected performance (servers, CPU’s, memory, hard drives, switches, PDUs, NICs, SFP, optical cables, network cables, disk controllers, KVM trays)

Al & ML

Network Communication

The network points will be verified to be communicating between the proper designated source and destination ports and at the expected data speed parameters and the related timing. At-speed network traffic tests incorporate comprehensive testing for packet loss/collisions/framing errors, switch fabric and routing, and other performance issues not detectable using a simple link test.

Private Cloud Workload

Automated Test Scripts

Automated test sequencing software and scripts for pass or fail determination ensuring that all tests are properly executed consistently and reliably, reduces human intervention and errors on every component and cable. Verification of the actual hardware configuration against the specified BOM.

Al & ML

Full-Rack Burn-in

Stress testing all major sub-systems of a computer for underlying rack-level issues that may otherwise go undetected (e.g., airflow, thermal performance/analysis, loadGen (CPU/GPU/Storage/Network), power consumption balancing, max power, failover and redundancy testing, etc.

Private Cloud Workload

Collect Function

Will create a log file from the functional test that is a critical baseline reference that creates a serialized inventory and captures the performance of the components. This complete traceability record is a critical diagnostic tool for future reference during failure events for lot containment in the field.

HPC

The team at Deeplearning specialize in building highly customized GPU or CPU based HPC clusters, and we will guide you through the entire process from start to finish as your one-stop shop for all your datacenter needs.