Product Overview
AI Data Service

For massive training data, it provides an open, easy-to-use and efficient AI DataHub (Git for AI Data) to serve the demands of AI developers' data management and enterprise asset management.
1. Tens of billions of data management: provide a platform to manage tens of billions of unstructured data to boost the training inference of AI large models.
2. Second-level retrieval and mining: provide retrieval of large-scale unstructured data to achieve second-level return and quickly mine new samples.
3. Guarantee data security: provide a complete set of solutions for data security in AI scenarios to guarantee the privacy compliance of AI data.
4. High-quality public datasets: provide high-quality public datasets in the industry and quickly load data via PythonSDK tools.

Product Superiority
Based on SenseTime's years of experience in large-scale AI data management, it provides an AI DataHub that is in line with AI developers' habits and meets the demands of enterprise asset management and compliance.
  • 01Git for AI Data
  • 02Security compliance
  • 03Ready to use
  • 04AI for AI
Git for AI Data
01Git for AI Data

Build Git for AI Data to perform full lifecycle management of unstructured data in the enterprise, enabling version management, multi-user collaboration and data sharing.

Security compliance
02Security compliance

Every aspect of data management is protected by security measures, such as permission control, digital watermarking, data desensitization and compliance authorization, to protect data security.

Ready to use
03Ready to use

Datasets are ready to use, no need to download to local and can be loaded directly through a line of script. It enables high-speed AI training with the acceleration of AI caching service.

AI for AI
04AI for AI

Retrieve unstructured data in natural language with the power of AI large models and return it in seconds to mine valuable business data.

Git for AI Data
01
Git for AI Data

Build Git for AI Data to perform full lifecycle management of unstructured data in the enterprise, enabling version management, multi-user collaboration and data sharing.

Security compliance
02
Security compliance

Every aspect of data management is protected by security measures, such as permission control, digital watermarking, data desensitization and compliance authorization, to protect data security.

Ready to use
03
Ready to use

Datasets are ready to use, no need to download to local and can be loaded directly through a line of script. It enables high-speed AI training with the acceleration of AI caching service.

AI for AI
04
AI for AI

Retrieve unstructured data in natural language with the power of AI large models and return it in seconds to mine valuable business data.

01
/
04
Product Features
Build Git for AI Data to provide efficient AI data tools that can quickly retrieve and mine data, and visualize data in the interface.
  • Git for AI Data
    Git for AI Data

    Build Git for AI Data to provide data iteration version management, branch collaboration and dataset sharing from data import and data processing to the full lifecycle of data use.

  • AI Data Tools
    AI Data Tools

    Single-line script can load dataset via SDK tool to boost high-speed model training. CLI tool can achieve version and branch management to control data iteration.

  • Retrieval and mining
    Retrieval and mining

    Provide image retrieval function in natural language based on large models and retrieve sample data flexibly based on metadata, annotated data, predicted data and customized tags.

  • Data Visualization
    Data Visualization

    On the Web end, multimodal data and annotated data can be visualized easily, an overview of the dataset can be viewed quickly and file operations on the Web end can be performed.

Application Scenarios
Provide the management capability of large-scale unstructured AI data, accelerate model iteration and data iteration and support the rapid implementation of AI applications.
  • 01Enterprise-level data management
  • 02Dataset acquisition and use
  • 03Foundation model datasets
  • 04Sample retrieval and mining
  • 05Data security and compliance
Enterprise-level data management
Enterprise-level data management
Manage large-scale unstructured data in the enterprise, suitable for multi-user collaboration and sharing and iterate data rapidly based on version management to continuously improve data quality.

Build Git for AI Data to manage the data lifecycle

Provide CLI tools for version and branch management

Extract file features to avoid redundant storage

Dataset acquisition and use
Dataset acquisition and use
Retrieve and acquire various scenarios of in-enterprise datasets and public datasets, understand datasets through data overview, documentation, visualization, etc., and use SDK tools to make datasets ready to use.

Tags on datasets for fast retrieval

SDK tools ready to use to load datasets

Foundation model datasets
Foundation model datasets
Provide public datasets such as LAION5B, LAION-400M, CCNews, etc. in large model training applications to fine-tune the datasets for business and perform full lifecycle management of data.

Provide large models with corresponding public datasets

Fine-tuning datasets to perform full lifecycle management

Sample retrieval and mining
Sample retrieval and mining
Combine sample attributes, annotation, prediction, customized tags, etc., retrieve and analyze data and filter higher quality and more focused training data based on the natural language retrieval capability of large models.

Multi-class tagging of samples for fast retrieval mining

Natural language retrieval based on large models

Data security and compliance
Data security and compliance
Protect the privacy and security of data in the enterprise with strict access control, data authorization, digital watermarking, data desensitization and other measures to ensure no data leakage and guarantee data compliance and security.

Permission control to manage data access

Digital watermarking to prevent data leakage

Data desensitization to guarantee data compliance

01Enterprise-level data management
02Dataset acquisition and use
03Foundation model datasets
04Sample retrieval and mining
05Data security and compliance
Enterprise-level data management
Enterprise-level data management
Manage large-scale unstructured data in the enterprise, suitable for multi-user collaboration and sharing and iterate data rapidly based on version management to continuously improve data quality.

Build Git for AI Data to manage the data lifecycle

Provide CLI tools for version and branch management

Extract file features to avoid redundant storage

Dataset acquisition and use
Dataset acquisition and use
Retrieve and acquire various scenarios of in-enterprise datasets and public datasets, understand datasets through data overview, documentation, visualization, etc., and use SDK tools to make datasets ready to use.

Tags on datasets for fast retrieval

SDK tools ready to use to load datasets

Foundation model datasets
Foundation model datasets
Provide public datasets such as LAION5B, LAION-400M, CCNews, etc. in large model training applications to fine-tune the datasets for business and perform full lifecycle management of data.

Provide large models with corresponding public datasets

Fine-tuning datasets to perform full lifecycle management

Sample retrieval and mining
Sample retrieval and mining
Combine sample attributes, annotation, prediction, customized tags, etc., retrieve and analyze data and filter higher quality and more focused training data based on the natural language retrieval capability of large models.

Multi-class tagging of samples for fast retrieval mining

Natural language retrieval based on large models

Data security and compliance
Data security and compliance
Protect the privacy and security of data in the enterprise with strict access control, data authorization, digital watermarking, data desensitization and other measures to ensure no data leakage and guarantee data compliance and security.

Permission control to manage data access

Digital watermarking to prevent data leakage

Data desensitization to guarantee data compliance

Continuously update the whole line of products and insist on sincere communication and win-win cooperation

Help you achieve new breakthroughs in business with professional AI solutions and advanced AI products