List all files in a directory
Listing all the files in a directory
Let's start with the basics, the most staigthforward way to list all the files in a direcoty is to use a combinaison of the listdir function and isfile form os.path. You can use a list comprehension to store all the results in a list.
mypath = "./test_directory/"
from os import listdir
from os.path import isfile, join
[f for f in listdir(mypath) if isfile(join(mypath, f))]
['logfile.log', 'myfile.txt', 'super_music.mp3', 'textfile.txt']
Listing all the files of a certain type in a directory
similarly, if you want to filter only a certain kind of file based on its extension you can use the endswith method. In the following example, we will filter all the "txt" files contained in the directory
[f for f in listdir(mypath) if f.endswith('.' + "txt")]
['myfile.txt', 'textfile.txt']
Listing all the files matching a pattern in a directory
The glob module finds all the pathnames matching a specified pattern according to the rules used by the Unix shell. You can use the *, ?, and character ranges expressed with [] wildcards
import glob
glob.glob("*.txt")
['myfile.txt']
Listing files recusively
If you want to list all files recursively you can select all the sub-directories using the "**" wildcard
import glob
glob.glob(mypath + '/**/*.txt', recursive=True)
['./test_directory\\myfile.txt',
'./test_directory\\textfile.txt',
'./test_directory\\subdir1\\file_hidden_in_a_sub_direcotry.txt']
Using a regular expression
If you'd rather use a regular expression to select the files, the pathlib library provides the rglob function.
from pathlib import Path
list(Path("./test_directory/").rglob("*.[tT][xX][tT]"))
[WindowsPath('test_directory/myfile.txt'),
WindowsPath('test_directory/textfile.txt'),
WindowsPath('test_directory/subdir1/file_hidden_in_a_sub_direcotry.txt')]
Using regular expressions you can for example select multiple types of files. In the following example, we list all the files that finish either with "txt" or with "log".
list(Path("./test_directory/").rglob("*.[tl][xo][tg]"))
[WindowsPath('test_directory/logfile.log'),
WindowsPath('test_directory/myfile.txt'),
WindowsPath('test_directory/textfile.txt'),
WindowsPath('test_directory/subdir1/file_hidden_in_a_sub_direcotry.txt')]