List all files in a directory

Listing all the files in a directory

Let's start with the basics, the most staigthforward way to list all the files in a direcoty is to use a combinaison of the listdir function and isfile form os.path. You can use a list comprehension to store all the results in a list.

mypath = "./test_directory/"
from os import listdir
from os.path import isfile, join
[f for f in listdir(mypath) if isfile(join(mypath, f))]
['logfile.log', 'myfile.txt', 'super_music.mp3', 'textfile.txt']

Listing all the files of a certain type in a directory

similarly, if you want to filter only a certain kind of file based on its extension you can use the endswith method. In the following example, we will filter all the "txt" files contained in the directory

[f for f in listdir(mypath)  if f.endswith('.' + "txt")]
['myfile.txt', 'textfile.txt']

Listing all the files matching a pattern in a directory

The glob module finds all the pathnames matching a specified pattern according to the rules used by the Unix shell. You can use the *, ?, and character ranges expressed with [] wildcards

import glob

glob.glob("*.txt")
['myfile.txt']

Listing files recusively

If you want to list all files recursively you can select all the sub-directories using the "**" wildcard

import glob
glob.glob(mypath + '/**/*.txt', recursive=True)
['./test_directory\\myfile.txt',
 './test_directory\\textfile.txt',
 './test_directory\\subdir1\\file_hidden_in_a_sub_direcotry.txt']

Using a regular expression

If you'd rather use a regular expression to select the files, the pathlib library provides the rglob function.

from pathlib import Path
list(Path("./test_directory/").rglob("*.[tT][xX][tT]"))
[WindowsPath('test_directory/myfile.txt'),
 WindowsPath('test_directory/textfile.txt'),
 WindowsPath('test_directory/subdir1/file_hidden_in_a_sub_direcotry.txt')]

Using regular expressions you can for example select multiple types of files. In the following example, we list all the files that finish either with "txt" or with "log".

list(Path("./test_directory/").rglob("*.[tl][xo][tg]"))
[WindowsPath('test_directory/logfile.log'),
 WindowsPath('test_directory/myfile.txt'),
 WindowsPath('test_directory/textfile.txt'),
 WindowsPath('test_directory/subdir1/file_hidden_in_a_sub_direcotry.txt')]