Tesla S2050 GPU Computing System


Based on the new NVIDIA CUDA™ GPU architecture codenamed “Fermi”, the Tesla S2050 1U Computing Systems are designed from the ground up for high performance computing. It delivers “must have” features for the technical and enterprise computing space including ECC memory for uncompromised accuracy and scalability, and 7X the double precision performance compared Tesla 10-series GPU computing products. Compared to typical quad-core CPUs, Tesla 20-series computing systems deliver equivalent performance at 1/10th the cost and1/20th the power consumption.

Designed with four (4) Fermi-based Tesla computing processors in a standard 1U chassis, the Tesla S2050 computing system scales to solve the world’s most important computing challenges — more quickly and accurately.

Features

GPUs powered by the Fermi-generation of the CUDA architecture Delivers cluster performance at 1/10th the cost and 1/20th the power of CPU-only systems compared to typical quad core CPUs.
448 Computing Cores Delivers up to 515 Gigaflops of double-precision peak performance in each GPU, enabling 2Teraflops of double precision performance in a 1U of space. Single precision peak performance is over a Teraflop per GPU.
ECC Memory Meets a critical requirement for mission critical applications with uncompromised computing accuracy and reliability. Offers protection of data in memory to enhance data integrity and reliability for applications. Register files, L1/L2 caches, shared memory, and DRAM all are ECC protected.
System Monitoring Features Simplifies management and remote monitoring post-installation via NVSMI. Status lights on the front and rear of the unit ensures IT staff can see the status whether they are on the either side of the rack.
Up to 6GB of GDDR5 memory per GPU Maximizes performance and reduces data transfers by keeping larger data sets in local memory that is attached directly to the GPU. Tesla S2050 includes 3GB/GPU.
NVIDIA Parallel DataCache™ Accelerates algorithms such as physics solvers, ray-tracing, and sparse matrix multiplication where data addresses are not known beforehand. This includes a configurable L1 cache per Streaming Multiprocessor block and a unified L2 cache for all of the processor cores.
NVIDIA GigaThread™ Engine Maximizes the throughput by faster context switching that is 10X faster than previous architecture, concurrent kernel execution, and improved thread block scheduling.
Asynchronous Transfer Turbocharges system performance by transferring data over the PCIe bus while the computing cores are crunching other data. Even applications with heavy data-transfer requirements, such as seismic processing, can maximize the computing efficiency by transferring data to local memory before it is needed.
CUDA programming environment with broad support of programming languages and APIs Choose C, C++, OpenCL, DirectCompute, or Fortran to express application parallelism and take advantage of the “Fermi” GPU’s innovative architecture.
High Speed , PCIe Gen 2.0 Data Transfer Maximizes bandwidth between the host system and the Tesla processors. Enables Tesla systems to work with virtually any PCIe-compliant host system with an open PCI-E slot (x8 or x16).
RMA REQUIREMENT
  1. Please fill out RMA Question List when you submit RMA request. You can also get the RMA Question List from Leadtek RMA representatives.
  2. Please submit "Log File" when you submit RMA request. Log File need to be generated on Linux system, if you do not use Linux system, please left blank and note down on the RMA request
    Download Log File Instruction
    Download Log File example
Tesla S2050 GPU Computing System
Form Factor 1U
# of Tesla GPUs 4
GPU Memory Speed 1.55 GHz
GPU Memory Interface 384-bit
GPU Memory Bandwidth 148 GB/sec
Double Precision floating point performance (peak) 2. Tflops
Single Precision floating point performance (peak) 4.13 Tflops
Total Dedicated Memory* 12GB GDDR5
Power Consumption (Typical) 900W TDP
System Interface PCIe x16 Gen2
Software Development Tools CUDA C/C++/Fortran, OpenCL, DirectCompute Toolkits.
NVIDIA Parallel Nsight™
* Note: With ECC on, a portion of the dedicated memory is used for ECC bits, so the available user memory is reduced by 12.5%. (e.g. 3 GB total memory yields 2.625 GB of user available memory.)