Skip to content Skip to sidebar Skip to footer

Find All Urls In File

Okay, my problem is that my code only finds and prints the last url in the list, not all of the urls as i want. def convert(lst): return ' '.join(lst) with open('test.txt', '

Solution 1:

Your lines is each line in the file. You want to do something like the following:

def convert(lst):
    return' '.join(lst)

with open("test.txt", 'r') as f:
    lines = f.read()
    test = convert(lines)
    urls = re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*(),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', lines)

print(urls)

Solution 2:

There is absolutely no need to first read all the lines then join them. Instead you can directly read all the data in the file using f.read() in one step.

Try this:

withopen("test.txt", 'r') as f:
        urls = re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*(),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', f.read())

Now executing print(urls) will produced the desired output.

Post a Comment for "Find All Urls In File"