FoxStream is a high performance platform that migrates large distributed data pools into pipelines that can be scaled to velocities of gigabyte streams and petabyte storage.
FoxStream™ is a collection of distributed stream management, real-time computation, and data logistics systems built on open source technologies. FoxStream™ reliably ingests, transports, processes, stores and analyzes unbounded streams of data in real time. Some of the core features of FoxStream™ include the following:
- Massively parallel and distributed real time processing
- Fault-tolerant persistent buffers
- Increased modularity through publish/subscribe communication
- Guarded UDP packet streams
- Elastic Transfer
- Multiple sources and sinks
- Wizard based provisioning and configuration
- Web application dashboard for Management and Monitoring
Metrics for the present operation of FoxStream:
- 10 billion – bits per second, a gigabyte every second of sustained throughput
- 10 million – discrete transactions per second of rich source data for network analysis
- 460 – individual host machines
- 280 – large single cluster, managed with one console, by one engineer
- 40% – average compression ratio, 40% reduction on network traffic driving costs down
- 8 – Geographically diverse locations across continental US feeding a single centralized collection location
- 1 to 10 – Second average end-to-end latency
FoxStream™ incorporates the following “best of breed” open source frameworks into its technology stack.
- Storm – Real time distributed processing engine
- Kafka – High speed distributed messaging queue
- Ambari – Provisioning, Management, and Monitoring dashboard
- Redis – Distributed memory cache
Data from different sources can be collected either through a remote agent system or a field programmable gate array (FPGA). High volume data streams are ingested by central endpoint aggregators.
The data is then published into a highly distributed message pipeline, and made available to internal and external consumers. A stream processor analyzes the data in real time to provide instant feedback and analysis, before storing it in a high volume and high velocity NoSQL data store. The input infrastructure can be scaled to Gb streams and petabyte storage on standard commodity hardware or cloud infrastructure.