Machine Learning Machine Learning

Machine learning solutions

Background of the solution
Customer Challenge
Our Solutions
Customer Value

Background of the solution

Artificial intelligence has become an important driving force for a new round of technological revolution and industrial transformation

With the advancement of deep learning algorithms and the rapid development of technologies such as cloud computing, big data, and GPU, the related technologies of AI and machine learning have achieved comprehensive improvements in algorithm, computing power, and data dimensions, sparking a new wave of artificial intelligence development. In China, artificial intelligence and the real economy are deeply integrated, becoming an important driving force for a new round of scientific and technological revolution and industrial transformation.
SandStone actively responds to the demand for industrial digital transformation and upgrading, and timely launches machine learning storage solutions to address the challenges of collecting, storing, accessing, and applying massive amounts of data in the process of using data as a key production factor.

Customer Challenge

How to accelerate the machine learning training process with massive training tasks?

How to aggregate and manage data from various sources and formats

How to aggregate and manage data from various sources and formats

Need to gather external datasets or databases; Data is distributed across multiple locations, collected from different data sources (across data centers, clouds, and edges), and converted into a unified format; Massive unstructured or semi-structured data (images, videos, audio, annotation files, etc.) require high storage throughput and latency.

Traditional storage cannot meet the requirements of different stages of machine learning for storage capacity, performance, and interface protocols

Traditional storage cannot meet the requirements of different stages of machine learning for storage capacity, performance, and interface protocols

The data collection and archiving stage has typical I/O intensive characteristics, requiring high bandwidth and large capacity; The model training phase involves a large number of random and small file read operations, requiring high bandwidth and low latency; The inference stage requires low latency and high performance; The data archiving and preparation stage requires high management capabilities for storing and retrieving massive amounts of data, with object storage (S3 protocol) being more suitable. However, training inference requires high storage latency response and concurrent access capabilities, and due to the historical reasons of the training platform, it is usually more suitable to use distributed file systems (NFS/CIFS, POSIX interface protocol).

The utilization rate of expensive GPU resources is not high, and resources cannot be shared

The utilization rate of expensive GPU resources is not high, and resources cannot be shared

In current machine learning solutions, it is increasingly common to use GPUs to provide computing power to accelerate the learning and training process. Expensive GPU resources are shared, and multi machine and multi card clusters can simultaneously perform more training tasks, which not only accelerates the learning process but also improves resource utilization and reduces resource waste.

Our Solutions

Distributed cache acceleration, multi access protocol support, accelerates AI training

SandStone machine learning storage solution provides massive, elastic, and cost-effective storage services through the MOS intelligent storage engine. It is compatible with POSIX semantic file interfaces, HDFS interfaces, S3 interfaces, and CSI interfaces, making it easy to integrate with multiple training platforms. Through distributed caching technology, it accelerates machine learning efficiency. At the same time, data management services provide rich management strategies to simplify data management and value mining.

Distributed cache acceleration, multi access protocol support, accelerates AI training

Customer Value

Provide strong support for the development of digital businesses

Support the aggregation and massive storage of unstructured, semi-structured, and structured data

Support the aggregation and massive storage of unstructured, semi-structured, and structured data

DataIngestor supports data aggregation from multiple data sources

Support writing through various interface protocols such as NFS/CIFS/FTP/POSIX/S3/HDFS

Single namespace supports billions of small files and EB level storage

Tag retrieval, intelligent data management

Tag retrieval, intelligent data management

Support custom tags for objects, with billions of files retrieved in seconds

Tiered data storage, meeting high-performance and large capacity requirements while ensuring the lowest overall cost of ownership

Supports multiple replicas and erasure codes, balancing training performance with the need for archiving raw data, reducing storage costs by 40% compared to NAS storage

Multi access protocol support

Multi access protocol support

Compatible with mainstream access protocols such as POSIX, HDFS, S3, and CSI, implementing a set of storage supported data access methods for different stages of artificial intelligence

Client distributed cache acceleration, unleashing GPU potential

Client distributed cache acceleration, unleashing GPU potential

Distributed caching technology can greatly improve the overall I/O performance of multi machine and multi card training clusters in response to the I/O characteristics of one write and multiple reads during machine learning training, with an average GPU utilization rate of over 97%

Assist in the digital upgrade of intelligent manufacturing production lines

The storage and management of production line inspection data

Free to download
Assist in the digital upgrade of intelligent manufacturing production lines

Learn more about industry solutions

Consult now to store and manage massive amounts of data efficiently and intelligently

The solution expert will answer you within 30 minutes

Consult now to store and manage massive amounts of data efficiently and intelligently

Your privacy is important to us

We use cookies to personalize and enhance your browsing experience on our website. By clicking "Accept all cookies", you agree to the use of cookies. You can read our Cookie Policy for more information.

Phone

Service Hotline

400-838-3331

More contact information

Top

Scan code attention