yt-dlp/youtube_dl/postprocessor/metadatafromtitle.py

from __future__ import unicode_literals

import re

from .common import PostProcessor


class MetadataFromTitlePP(PostProcessor):
    def __init__(self, downloader, titleformat):
        super(MetadataFromTitlePP, self).__init__(downloader)
        self._titleformat = titleformat
        self._titleregex = self.format_to_regex(titleformat)

    def format_to_regex(self, fmt):
        r"""
        Converts a string like
           '%(title)s - %(artist)s'
        to a regex like
           '(?P<title>.+)\ \-\ (?P<artist>.+)'
        """
        lastpos = 0
        regex = ''
        # replace %(..)s with regex group and escape other string parts
        for match in re.finditer(r'%\((\w+)\)s', fmt):
            regex += re.escape(fmt[lastpos:match.start()])
            regex += r'(?P<' + match.group(1) + '>.+)'
            lastpos = match.end()
        if lastpos < len(fmt):
            regex += re.escape(fmt[lastpos:])
        return regex

    def run(self, info):
        title = info['title']
        match = re.match(self._titleregex, title)
        if match is None:
            self._downloader.to_screen('[fromtitle] Could not interpret title of video as "%s"' % self._titleformat)
            return [], info
        for attribute, value in match.groupdict().items():
            value = match.group(attribute)
            info[attribute] = value
            self._downloader.to_screen('[fromtitle] parsed ' + attribute + ': ' + value)

        return [], info
[metadatafromtitle] Some improvements and cleanup * Remove the 'songtitle' field, 'title' can be used instead. * Remove newlines in the help text, for consistency with other options. * Add 'from __future__ import unicode_literals'. * Call '__init__' from the parent class. * Add test for the format_to_regex method 2015-03-14 19:55:42 +01:00			`from __future__ import unicode_literals`
Add metadata from title parser (Closes #5125) 2015-03-04 22:33:56 +01:00
			`import re`

			`from .common import PostProcessor`


			`class MetadataFromTitlePP(PostProcessor):`
			`def __init__(self, downloader, titleformat):`
[metadatafromtitle] Some improvements and cleanup * Remove the 'songtitle' field, 'title' can be used instead. * Remove newlines in the help text, for consistency with other options. * Add 'from __future__ import unicode_literals'. * Call '__init__' from the parent class. * Add test for the format_to_regex method 2015-03-14 19:55:42 +01:00			`super(MetadataFromTitlePP, self).__init__(downloader)`
Add metadata from title parser (Closes #5125) 2015-03-04 22:33:56 +01:00			`self._titleformat = titleformat`
[metadatafromtitle] Some improvements and cleanup * Remove the 'songtitle' field, 'title' can be used instead. * Remove newlines in the help text, for consistency with other options. * Add 'from __future__ import unicode_literals'. * Call '__init__' from the parent class. * Add test for the format_to_regex method 2015-03-14 19:55:42 +01:00			`self._titleregex = self.format_to_regex(titleformat)`
Add metadata from title parser (Closes #5125) 2015-03-04 22:33:56 +01:00
[metadatafromtitle] Some improvements and cleanup * Remove the 'songtitle' field, 'title' can be used instead. * Remove newlines in the help text, for consistency with other options. * Add 'from __future__ import unicode_literals'. * Call '__init__' from the parent class. * Add test for the format_to_regex method 2015-03-14 19:55:42 +01:00			`def format_to_regex(self, fmt):`
Fix "invalid escape sequences" error on Python 3.6 2017-01-02 13:08:07 +01:00			`r"""`
Add metadata from title parser (Closes #5125) 2015-03-04 22:33:56 +01:00			`Converts a string like`
			`'%(title)s - %(artist)s'`
			`to a regex like`
			`'(?P<title>.+)\ \-\ (?P<artist>.+)'`
			`"""`
			`lastpos = 0`
[refactor] Single quotes consistency 2016-02-14 10:37:17 +01:00			`regex = ''`
Add metadata from title parser (Closes #5125) 2015-03-04 22:33:56 +01:00			`# replace %(..)s with regex group and escape other string parts`
			`for match in re.finditer(r'%\((\w+)\)s', fmt):`
			`regex += re.escape(fmt[lastpos:match.start()])`
			`regex += r'(?P<' + match.group(1) + '>.+)'`
			`lastpos = match.end()`
			`if lastpos < len(fmt):`
improve coding style 2017-04-12 21:38:43 +02:00			`regex += re.escape(fmt[lastpos:])`
Add metadata from title parser (Closes #5125) 2015-03-04 22:33:56 +01:00			`return regex`

			`def run(self, info):`
			`title = info['title']`
			`match = re.match(self._titleregex, title)`
			`if match is None:`
Make --metadata-from-title non fatal Output a warning if the metadata can't be parsed from the title (and don't write any metadata) instead of raising a critical error. 2016-08-06 01:21:39 +02:00			`self._downloader.to_screen('[fromtitle] Could not interpret title of video as "%s"' % self._titleformat)`
			`return [], info`
Add metadata from title parser (Closes #5125) 2015-03-04 22:33:56 +01:00			`for attribute, value in match.groupdict().items():`
			`value = match.group(attribute)`
			`info[attribute] = value`
			`self._downloader.to_screen('[fromtitle] parsed ' + attribute + ': ' + value)`

Postprocessors: use a list for the files that can be deleted We could only know if we had to delete the original file, but this system allows to specify us more files (like subtitles). 2015-04-18 11:36:42 +02:00			`return [], info`