Traversing Directories Recursively and Sorting Objects by Attribute Value in Go

Lets say you would like to sort all the files in a directory, as well as its sub directories by an attribute like file size.

Approach:

First, you need to recursively traverse or walk the specified directory, which is easy in Golang with the filepath.Walk() function from the path/filepath package. In order to use filepath.Walk() you need a walkfunc or walk function, which can be nested in your walk function as demonstrated in the sample program below. The walkfunc will allow you to handle errors that occur when you approach directory files or files you do not have permission to access.

Recommended tutorial: Flavio Copes – LIST THE FILES IN A FOLDER WITH GO

As you walk the directories, you will need to get the size of each file, which can be achieved using the stat() system call on Linux. In Golang, you can the os.Stat() function to get an object of type FileInfo which contains an a Size() attribute, which is what we want. As you get the file size for each file in the directory, you will need to store the relative file path and the size in bytes of each file as an object in a slice of type file objects.

Once have the file names and sizes of all of the files, you will then need to sort the files by the size attribute, either from largest to smallest or from smallest to largest, which can easily be done using sort.Slice() from the sort package. If you want sort the files from smallest to largest, you would want to use a less / < function like: sort.Slice(files, func(i, j int) bool { return files[i].size < files[j].size }). However, if you wish to sort from largest to smallest, you would use a greater / > function instead, which would look like this: sort.Slice(files, func(i, j int) bool { return files[i].size > files[j].size }). Finally, you can just simply print the desired number of files from the sorted slice.

I would say that it is important to know that in many Linux filesystems such as EXT3 and EXT4, filenames are stored in a directory table that lists files as name (key), inode (value) pairs. You can find the names of all of the files from the directory table and if you refer to a file’s inode, you can get its file size in bytes.

The relationship between the directory entry, an inode, and blocks of an allocated file
The relationship between the directory entry, an inode, and blocks of an allocated file

Here is a command line utility that takes in a directory as a string, a number of files as an int, and an order to sort them as a string. It then lists the largest or smallest files in the directory ordered by size.

Recommended tutorial: Rapid7 – Building a Simple CLI Tool with Golang

Note that since slices are references to arrays, you do not have to pass them as pointers. You can see this demonstrated with the call to sortFiles(*sortPtr, files) on line 125 as nothing is returned from it.
See: The Minimum You Need To Know About Arrays and Slices in Golang

Instructions:
Feed in the directory using --dir, the file count using --cnt, and the sort order which is either largest or smallest --sort

Example:
go run .\largestFiles.go --dir . --cnt 10 --sort smallest

Golang Sorted Files Windows
Golang Sorted Files Windows

On a side note, Golang’s stat() function works quite well on Windows systems, which do not have a native stat() system call.

Sources:

  1. https://unix.stackexchange.com/questions/18605/how-are-directories-implemented-in-unix-filesystems
  2. https://premaseem.wordpress.com/2016/02/14/what-is-inode-in-linux-unit/
  3. http://www.grymoire.com/Unix/Inodes.html

Posted

in

,

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.