What is it about?

Unix and Unix-like systems run most of the internet; developers, analysts, and scientists rely on Unix tools to do their work every day. The Unix tools are very good at processing line-oriented data---think tables, logs, comma-separated values, simple spreadsheets, etc. In the past, most data was line-oriented. But these days, a lot of data comes in so-called "semi-structured" formats, like JSON. JSON offers rich structure, like nested fields, but it isn't line-oriented. The Unix tools aren't very good at working with these formats. We built a tool called 'ffs' that can map a deeply-nested JSON file to a system of directories and files. The Unix tools---and the Unix shell in particular---excels at processing these directory structures. By phrasing structured data in terms of the filesystem's existing concepts, we can use our favored, trusty Unix tools to work with modern data.

Featured Image

Why is it important?

The Unix shell is a great way to process data before analysis, and people continue to use it when processing line-oriented formats like comma-separated values (CSV). But there's a discontinuity: once some of your data comes in a modern format like JSON, you have to stop using the shell and start using some industrial-strength programming language. Our ffs tool smooths out this discontinuity, making it easy to explore and play with data in the early stages of analysis.

Perspectives

I like working in the shell, but many people are frustrated by how 'ancient' the shell feels. The problem isn't so much the shell itself, but the shell ecosystem. I'm excited by the prospect of rehabilitating and rejuvenating the shell by making it easy for people to work in 'modern' ways with this powerful, venerable tool.

Michael Greenberg
Stevens Institute of Technology

Read the Original

This page is a summary of: Files-as-Filesystems for POSIX Shell Data Processing, October 2021, ACM (Association for Computing Machinery),
DOI: 10.1145/3477113.3487265.
You can read the full text:

Read

Resources

Contributors

The following have contributed to this page