Python脚本编写-匹配同时间段的音频和图片

2024-04-16 2024-04-16 约 1186 字预计阅读 6 分钟

需求

现有两个目录, 其中图片image/source/目录部分内容:

-rw-r--r-- 1 root root  632719 May 16  2023 20230516091500_8ea8f4755ab211ed864a0050569539e0.jpg
-rw-r--r-- 1 root root  165811 May 16  2023 20230516091500_9535f376ca0611ed9d690050569539e0.jpg
-rw-r--r-- 1 root root  282415 May 16  2023 20230516091500_a08f20f9cdf111ed9d690050569539e0.jpg
...

音频目录audio部分内容:

drwxr-xr-x 2 root root     4096 Apr  1 15:33 ffe4db567548a8852daa715ef23336b8
drwxr-xr-x 2 root root    12288 Mar 12 18:18 fff75084e08309e56bbb6034f00ca75d
drwxr-xr-x 2 root root     4096 Oct 31 15:33 fffd822ac90a7546cf5ab457c8a1ca64
...

音频目录下一个子目录audio/fffd822ac90a7546cf5ab457c8a1ca64下内容:

-rw-r--r-- 1 root root 176090 Oct 31 15:30 20231031152954_e25e2be5ba9011ed9d690050569539e0.wav
-rw-r--r-- 1 root root 176090 Oct 31 15:32 20231031153154_e25e2be5ba9011ed9d690050569539e0.wav
-rw-r--r-- 1 root root 176090 Oct 31 15:34 20231031153354_e25e2be5ba9011ed9d690050569539e0.wav
...

其中image/source/目录下文件名中的uuid代表摄像头id, 与audio子目录下wav文件的文件名中的uuid含义一致, 但是与audio目录下的子目录名称无关.

现要求将两个目录下相近时间记录的音频和图片一一对应.

例如: 20231031091500_a08f20f9cdf111ed9d690050569539e0.jpg 与 20231031091454_a08f20f9cdf111ed9d690050569539e0.wav 对应.

提取文件列表

首先用find命令提取完整的audio所有子目录文件列表

find audio/ -type f > audio.txt

文件内容:

audio/140cb49659d4105009c8c8ca4d539cd9/20231020195154_547a1f69ca0611ed9d690050569539e0.wav
audio/140cb49659d4105009c8c8ca4d539cd9/20231020193554_547a1f69ca0611ed9d690050569539e0.wav
audio/140cb49659d4105009c8c8ca4d539cd9/20231020202754_547a1f69ca0611ed9d690050569539e0.wav

image/source/目录下是完整的文件列表, 没有子目录层级, 所以使用ls命令提取:

ls -1 image/source/ > img.txt

文件内容:

20230303153200_b999967758c411ed864a0050569539e0.jpg
20230305084000_3103656d5ab511ed864a0050569539e0.jpg
20230305084000_3a4bb49e5ab511ed864a0050569539e0.jpg

解析文件名

由观察可以得知文件名称格式固定, 由14位日期+32位uuid+后缀构成. 编写解析函数:

直接切割:

def split(s):
    return (s[0:14], s[15:48], s[-3:])

根据符号切割:

def split(s):
    return (
        s[0 : s.find("_")], 
        s[s.find("_") + 1 : s.find(".")], 
        s[s.find(".") + 1 :]
    )

运行:

print(split("20230303153200_b999967758c411ed864a0050569539e0.jpg"))

输出:

('20230303153200', 'b999967758c411ed864a0050569539e0', 'jpg')

提取特征

由于要对比20231031091500_a08f20f9cdf111ed9d690050569539e0.jpg 与 20231031091454_a08f20f9cdf111ed9d690050569539e0.wav 这种对应关系, 只需要让音频文件精确到分钟的时间+1分钟等于图片的时间即为同时的图片, 将精确到分钟的时间和后面的uuid拼接为字符串, 字符串相等, 图片和音频就对应上了.

处理images目录文件列表

文件较小, 不到60MB, 不需要优化内存占用, 所以直接加载到内存在内存处理

def img():
    with open("匹配图片/img.txt", "r") as f:
        # 读取并去除换行符
        data = f.read().splitlines()
    res = dict()
    for i in data:
        timestamp, uuid, _ = split(i)
        trait = timestamp[:-2] + uuid
        res[trait] = i
    return res

解析audio目录文件列表

和images目录类似, 需要注意处理文件路径, 还有时间+1分钟需要转换为时间对象, 这样可以比较方便地进行60进制运算.

def audio():
    with open("匹配图片/audio.txt", "r") as f:
        # 读取并去除换行符
        data = f.read().splitlines()
    res = dict()
    for i in data:
        timestamp, uuid, _ = split(i[i.rfind("/") + 1 :])

        # 将时间+1分钟
        date_time = datetime.strptime(timestamp, r"%Y%m%d%H%M%S") + timedelta(minutes=1)
        timestamp = date_time.strftime(r"%Y%m%d%H%M%S")

        trait = timestamp[:-2] + uuid
        res[trait] = i

    return res

比较两个目录文件列表

    imgd = img()
    audiod = audio()

    res = list()
    for key, value in audiod.items():
        if key in imgd:
            res.append(("image/source/" + imgd[key], value))
    print(res)

复制文件

    for i in res:

        _, uuid, _ = split(i[0][i[0].rfind("/") + 1 :])
        target = Path(f"data/{uuid}")
        target.mkdir(parents=True, exist_ok=True)

        shutil.copy(i[0], target)
        shutil.copy(i[1], target)

最终代码

import shutil
from datetime import datetime, timedelta
from pathlib import Path


def split(s):
    return (s[0 : s.find("_")], s[s.find("_") + 1 : s.find(".")], s[s.find(".") + 1 :])


def img():
    with open("img.txt", "r") as f:
        # 读取并去除换行符
        data = f.read().splitlines()
    res = dict()
    for i in data:
        timestamp, uuid, _ = split(i)
        trait = timestamp[:-2] + uuid
        res[trait] = i
    return res


def audio():
    with open("audio.txt", "r") as f:
        # 读取并去除换行符
        data = f.read().splitlines()
    res = dict()
    for i in data:
        timestamp, uuid, _ = split(i[i.rfind("/") + 1 :])

        # 将时间+1分钟
        date_time = datetime.strptime(timestamp, r"%Y%m%d%H%M%S") + timedelta(minutes=1)
        timestamp = date_time.strftime(r"%Y%m%d%H%M%S")

        trait = timestamp[:-2] + uuid
        res[trait] = i

    return res


if __name__ == "__main__":
    imgd = img()
    audiod = audio()

    res = list()
    for key, value in audiod.items():
        if key in imgd:
            res.append(("image/source/" + imgd[key], value))
    print(res)
    ...

    # 复制文件
    for i in res:

        _, uuid, _ = split(i[0][i[0].rfind("/") + 1 :])
        target = Path(f"data/{uuid}")
        target.mkdir(parents=True, exist_ok=True)

        shutil.copy(i[0], target)
        shutil.copy(i[1], target)