Commonly, we process JSON data by writing a program to load, deserialize and manipulate this data.
Depending on the programming language, this program may require an additional compilation step before being executed within a terminal. For simple operations, such as filtering and mapping, we don't need to write an additional program to perform these operations on our JSON data. Rather, we can directly manipulate our JSON data within a terminal via the
jq command-line utility, which allows the editing of streamed JSON data without an interactive text editor interface ("
sed for JSON"). If you're looking for a tool to retrieve JSON data from an API endpoint, process this data and save the result to a CSV, TSV or JSON file, then
jq easily accomplishes this task in a single-line command.
Below, I'm going to show you how to process JSON data with
Installation and Setup#
jq command-line utility by visiting the homepage of the
jq website, downloading a prebuilt binary (compatible with your operating system) and executing this binary once the download is complete.
For MacOSX users, if you have Homebrew installed on your machine, then run the following command to automatically download and install
shell $ brew install jq
For Windows users, if you have Chocolatey NuGet installed on your machine, then run the following command to automatically download and install
shell $ chocolatey install jq
For Linux users, run the following command to automatically download and install
shell $ sudo apt-get install jq
To verify the installation was successful, restart the terminal, and inside of this terminal, enter the command
This should print an overview of the
For extensive documentation, enter the command
man jq, which summons manual pages for the
Manipulating JSON Data#
To get started, let's pretty-print a JSON dataset (with formatting and syntax-highlighting).
jq command must be passed a filter as its first argument. A filter is a program that tells
jq what output should be returned given the input JSON data.
The most basic filter is the pre-defined identity filter
., which tells
jq to do nothing to the input JSON data and return it as is.
jq on a JSON dataset, pipe the stringified JSON to
jq (e.g., the file content of a
.json file via the
cat command or the JSON response from an API endpoint via the
cURL command). If we pipe the JSON response of a cURL command to
jq ., then
jq pretty-prints this response in the terminal.
Suppose we only wanted a single element from the JSON data. To access a single element from a JSON array, pass the array index filter to
jq, which follows the syntax
x representing an index value (positive and negative integer).
To access the first element:
To access the last element:
To access the penultimate (second to last) element:
To access the element at index
If the index value is outside of the JSON array's bounds, then no element is returned by the array index filter:
Here, the dataset only contains
41 rows. Therefore, any index beyond
40 causes the filter to return no element.
If an index value is omitted, then all of the elements are returned by the array index filter:
. filter can be used on JSON objects to return all top-level values within the object.
In case you are unsure whether the input data is not valid JSON, then append a
? to the empty square brackets to suppress errors.
For example, if the input data is a stringified integer value...
?, the error
jq: error (at <stdin>:1): Cannot iterate over number (1) will be thrown.
?, this error is suppressed as if no error occurred.
Slicing a JSON Array#
Suppose we only wanted a subset of the JSON data. To extract a sub-array from a JSON array, pass the array/string slice filter to
jq, which follows the syntax
To extract the first element only:
To extract the last element only:
To extract all elements but the first element (omit the first element):
To extract all elements but the last element (omit the last element):
To extract the elements at indices
Length of a JSON Array#
This returns the total number of elements within the JSON array. For our example dataset, the total number of records returned by the NYC Open Data API is
For a JSON object, the
length function returns the total number of top-level keys within this object.
To retrieve the length of each item of a JSON array, pipe the output of a
. slice filter to the
This returns a list of each element's length. For our example dataset, each record contains four pieces of information: the year, the population of NYC for that year, the total number of gallons (in millions) of water consumed by NYC residents per day and the average number of gallons of water consumed by a NYC resident per day.
If an element is a string, then
length returns the string's length. If an element is a
null value, then
length returns zero.
Keys of a JSON Array#
To retrieve the top-level keys from JSON, use the built-in
keys function. These keys are returned as an array of strings. Unlike the
length function, the
keys function requires no filter piping.
By default, these keys are sorted alphabetically.
keys_unsorted function does not sort keys alphabetically and returns the keys in their original order.
For JSON arrays, this function returns a list of indices.
Experiment with these techniques on other JSON data sources/files.