HDF Scalable Data Service (HSDS) is a REST-based web service that allows web clients to read from and write to HDF data sources. HSDS can be deployed in the cloud or on-premises, making it ideal for situations where multiple clients need to access data via an HTTP endpoint. This functionality enables HDF to be used by web clients, such as web applications, as well as programs written in any language that can make HTTP requests. HSDS's robust support for multiple storage platforms, including POSIX, AWS S3, and Azure Blob Storage, provides reassurance and flexibility. It can be deployed using Docker, Kubernetes, or as an operating system service (e.g., Unix Daemon). The development of HSDS has been supported by both public and commercial organizations seeking specialized features or improved performance for specific use cases. One <a href='https://github.com/HDFGroup/hsds/blob/master/docs/design/query/chunk_summary.md>proposal</a> has generated significant interest, although there has been no monetary support offered thus far. The idea is to dramatically speed up SQL-like queries over large datasets without the need to build indexes on the datasets. While indexing is a common technique used by many databases, creating indexes for large-scale standard HDF5 datasets (multi-terabyte) is time-consuming and requires substantial additional storage. The chunk summary design aims to address these challenges and has the potential to be a valuable enhancement to the HSDS software package.
Fund this project