Merge 84b90ec378 into 46f4c80bc3

[ie/SampleFocus] Fix extractor (#10947 )
Closes #10945 Authored by: seproDev
2024-09-07 17:53:50 +02:00 · 2024-09-07 17:06:12 +02:00 · 2024-09-05 20:47:14 +02:00 · 2024-09-05 20:06:15 +02:00 · 2024-08-31 05:43:15 +00:00 · 2024-08-30 18:05:29 +00:00
12 changed files with 138 additions and 28 deletions
--- a/.github/ISSUE_TEMPLATE/1_broken_site.yml
+++ b/.github/ISSUE_TEMPLATE/1_broken_site.yml
@ -80,5 +80,8 @@ body:
  - type: markdown
    attributes:
      value: |
-        ### NOTE: Due to a recent increase in malicious spam activity, this issue will be automatically locked until it is triaged by a maintainer.
-        ### If you receive any replies asking you download a file, do NOT follow the download links!
+        > [!CAUTION]
+        > ### GitHub is experiencing a high volume of malicious spam comments.
+        > ### If you receive any replies asking you download a file, do NOT follow the download links!
+        >
+        > Note that this issue may be temporarily locked as an anti-spam measure after it is opened.
--- a/.github/ISSUE_TEMPLATE/2_site_support_request.yml
+++ b/.github/ISSUE_TEMPLATE/2_site_support_request.yml
@ -92,5 +92,8 @@ body:
  - type: markdown
    attributes:
      value: |
-        ### NOTE: Due to a recent increase in malicious spam activity, this issue will be automatically locked until it is triaged by a maintainer.
-        ### If you receive any replies asking you download a file, do NOT follow the download links!
+        > [!CAUTION]
+        > ### GitHub is experiencing a high volume of malicious spam comments.
+        > ### If you receive any replies asking you download a file, do NOT follow the download links!
+        >
+        > Note that this issue may be temporarily locked as an anti-spam measure after it is opened.
--- a/.github/ISSUE_TEMPLATE/3_site_feature_request.yml
+++ b/.github/ISSUE_TEMPLATE/3_site_feature_request.yml
@ -88,5 +88,8 @@ body:
  - type: markdown
    attributes:
      value: |
-        ### NOTE: Due to a recent increase in malicious spam activity, this issue will be automatically locked until it is triaged by a maintainer.
-        ### If you receive any replies asking you download a file, do NOT follow the download links!
+        > [!CAUTION]
+        > ### GitHub is experiencing a high volume of malicious spam comments.
+        > ### If you receive any replies asking you download a file, do NOT follow the download links!
+        >
+        > Note that this issue may be temporarily locked as an anti-spam measure after it is opened.
--- a/.github/ISSUE_TEMPLATE/4_bug_report.yml
+++ b/.github/ISSUE_TEMPLATE/4_bug_report.yml
@ -73,5 +73,8 @@ body:
  - type: markdown
    attributes:
      value: |
-        ### NOTE: Due to a recent increase in malicious spam activity, this issue will be automatically locked until it is triaged by a maintainer.
-        ### If you receive any replies asking you download a file, do NOT follow the download links!
+        > [!CAUTION]
+        > ### GitHub is experiencing a high volume of malicious spam comments.
+        > ### If you receive any replies asking you download a file, do NOT follow the download links!
+        >
+        > Note that this issue may be temporarily locked as an anti-spam measure after it is opened.
--- a/.github/ISSUE_TEMPLATE/5_feature_request.yml
+++ b/.github/ISSUE_TEMPLATE/5_feature_request.yml
@ -67,5 +67,8 @@ body:
  - type: markdown
    attributes:
      value: |
-        ### NOTE: Due to a recent increase in malicious spam activity, this issue will be automatically locked until it is triaged by a maintainer.
-        ### If you receive any replies asking you download a file, do NOT follow the download links!
+        > [!CAUTION]
+        > ### GitHub is experiencing a high volume of malicious spam comments.
+        > ### If you receive any replies asking you download a file, do NOT follow the download links!
+        >
+        > Note that this issue may be temporarily locked as an anti-spam measure after it is opened.
--- a/.github/ISSUE_TEMPLATE/6_question.yml
+++ b/.github/ISSUE_TEMPLATE/6_question.yml
@ -73,5 +73,8 @@ body:
  - type: markdown
    attributes:
      value: |
-        ### NOTE: Due to a recent increase in malicious spam activity, this issue will be automatically locked until it is triaged by a maintainer.
-        ### If you receive any replies asking you download a file, do NOT follow the download links!
+        > [!CAUTION]
+        > ### GitHub is experiencing a high volume of malicious spam comments.
+        > ### If you receive any replies asking you download a file, do NOT follow the download links!
+        >
+        > Note that this issue may be temporarily locked as an anti-spam measure after it is opened.
--- a/.github/workflows/issue-lockdown.yml
+++ b/.github/workflows/issue-lockdown.yml
@ -1,4 +1,4 @@
-name: Anti-Spam
+name: Issue Lockdown
 on:
  issues:
    types: [opened]
@ -9,6 +9,7 @@ permissions:
 jobs:
  lockdown:
    name: Issue Lockdown
+    if: vars.ISSUE_LOCKDOWN
    runs-on: ubuntu-latest
    steps:
      - name: "Lock new issue"
@ -17,4 +18,4 @@ jobs:
          ISSUE_NUMBER: ${{ github.event.issue.number }}
          REPOSITORY: ${{ github.repository }}
        run: |
-          gh issue lock "${ISSUE_NUMBER}" -r too_heated -R "${REPOSITORY}"
+          gh issue lock "${ISSUE_NUMBER}" -R "${REPOSITORY}"
--- a/.github/workflows/sanitize-comment.yml
+++ b/.github/workflows/sanitize-comment.yml
@ -0,0 +1,17 @@
+name: Sanitize comment
+
+on:
+  issue_comment:
+    types: [created, edited]
+
+permissions:
+  issues: write
+
+jobs:
+  sanitize-comment:
+    name: Sanitize comment
+    if: vars.SANITIZE_COMMENT && !github.event.issue.pull_request
+    runs-on: ubuntu-latest
+    steps:
+      - name: Sanitize comment
+        uses: yt-dlp/sanitize-comment@v1
--- a/devscripts/make_issue_template.py
+++ b/devscripts/make_issue_template.py
@ -49,8 +49,11 @@ VERBOSE_TMPL = '''
  - type: markdown
    attributes:
      value: |
-        ### NOTE: Due to a recent increase in malicious spam activity, this issue will be automatically locked until it is triaged by a maintainer.
-        ### If you receive any replies asking you download a file, do NOT follow the download links!
+        > [!CAUTION]
+        > ### GitHub is experiencing a high volume of malicious spam comments.
+        > ### If you receive any replies asking you download a file, do NOT follow the download links!
+        >
+        > Note that this issue may be temporarily locked as an anti-spam measure after it is opened.
 '''.strip()

 NO_SKIP = '''
--- a/yt_dlp/extractor/dailymotion.py
+++ b/yt_dlp/extractor/dailymotion.py
@ -10,11 +10,14 @@ from ..utils import (
    OnDemandPagedList,
    age_restricted,
    clean_html,
+    extract_attributes,
    int_or_none,
    traverse_obj,
    try_get,
    unescapeHTML,
    unsmuggle_url,
+    update_url,
+    url_or_none,
    urlencode_postdata,
 )

@ -96,14 +99,26 @@ class DailymotionBaseInfoExtractor(InfoExtractor):


 class DailymotionIE(DailymotionBaseInfoExtractor):
-    _VALID_URL = r'''(?ix)
+    _VALID_URL_PREFIX = r'''(?ix)
                    https?://
                        (?:
-                            (?:(?:www|touch|geo)\.)?dailymotion\.[a-z]{2,3}/(?:(?:(?:(?:embed|swf|\#)/)|player(?:/\w+)?\.html\?)?video|swf)|
-                            (?:www\.)?lequipe\.fr/video
-                        )
-                        [/=](?P<id>[^/?_&]+)(?:.+?\bplaylist=(?P<playlist_id>x[0-9a-z]+))?
+                            (?:(?:www|touch|geo)\.)?dailymotion\.[a-z]{2,3}|
+                            (?:www\.)?lequipe\.fr
+                        )/
                    '''
+    _VALID_URL = [
+        rf'''{_VALID_URL_PREFIX}
+                        (?:
+                            video/|
+                            swf(?:/(?!video)|/video/)
+                        )(?P<id>[^/?_&#]+)(?:.+?\bplaylist=(?P<playlist_id>x[0-9a-z]+))?
+                    ''',
+        rf'''{_VALID_URL_PREFIX}
+                        (?:
+                            player(?:/\w+)?\.html\?
+                        )(?:video[=/](?P<id>[^/?_&#]+))?(?:.*?\bplaylist=(?P<playlist_id>x[0-9a-z]+))?
+                    ''',
+    ]
    IE_NAME = 'dailymotion'
    _EMBED_REGEX = [r'<(?:(?:embed|iframe)[^>]+?src=|input[^>]+id=[\'"]dmcloudUrlEmissionSelect[\'"][^>]+value=)(["\'])(?P<url>(?:https?:)?//(?:www\.)?dailymotion\.com/(?:embed|swf)/video/.+?)\1']
    _TESTS = [{
@ -217,6 +232,35 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
    }, {
        'url': 'https://geo.dailymotion.com/player/xakln.html?video=x8mjju4&customConfig%5BcustomParams%5D=%2Ffr-fr%2Ftennis%2Fwimbledon-mens-singles%2Farticles-video',
        'only_matching': True,
+    }, {
+        'url': 'https://geo.dailymotion.com/player/xf7zn.html?playlist=x7wdsj',
+        'only_matching': True,
+    }]
+    _WEBPAGE_TESTS = [{
+        # https://geo.dailymotion.com/player/xmyye.html?video=x93blhi
+        'url': 'https://www.financialounge.com/video/2024/08/01/borse-europee-in-rosso-dopo-la-fed-a-milano-volano-mediobanca-e-tim-edizione-del-1-agosto/',
+        'info_dict': {
+            'id': 'x93blhi',
+            'ext': 'mp4',
+            'title': 'OnAir - 01/08/24',
+            'description': '',
+            'duration': 217,
+            'timestamp': 1722505658,
+            'upload_date': '20240801',
+            'uploader': 'Financialounge',
+            'uploader_id': 'x2vtgmm',
+            'age_limit': 0,
+            'tags': [],
+            'view_count': int,
+            'like_count': int,
+        },
+    }, {
+        # https://geo.dailymotion.com/player/xf7zn.html?playlist=x7wdsj
+        'url': 'https://www.cycleworld.com/blogs/ask-kevin/ducati-continues-to-evolve-with-v4/',
+        'info_dict': {
+            'id': 'x7wdsj',
+        },
+        'playlist_mincount': 50,
    }]
    _GEO_BYPASS = False
    _COMMON_MEDIA_FIELDS = '''description
@ -232,6 +276,22 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
        for mobj in re.finditer(
                r'(?s)DM\.player\([^,]+,\s*{.*?video[\'"]?\s*:\s*["\']?(?P<id>[0-9a-zA-Z]+).+?}\s*\);', webpage):
            yield from 'https://www.dailymotion.com/embed/video/' + mobj.group('id')
+        for mobj in re.finditer(
+                r'(?s)<script [^>]*\bsrc=(["\'])(?:https?:)?//[\w-]+\.dailymotion\.com/player/(?:(?!\1).)+\1[^>]*>', webpage):
+            attrs = extract_attributes(mobj.group(0))
+            player_url = url_or_none(attrs.get('src'))
+            if not player_url:
+                continue
+            player_url = player_url.replace('.js', '.html')
+            if player_url.startswith('//'):
+                player_url = f'https:{player_url}'
+            if video_id := attrs.get('data-video'):
+                query_string = f'video={video_id}'
+            elif playlist_id := attrs.get('data-playlist'):
+                query_string = f'playlist={playlist_id}'
+            else:
+                continue
+            yield update_url(player_url, query=query_string)

    def _real_extract(self, url):
        url, smuggled_data = unsmuggle_url(url)
@ -282,6 +342,8 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
        title = metadata['title']
        is_live = media.get('isOnAir')
        formats = []
+        subtitles = {}
+
        for quality, media_list in metadata['qualities'].items():
            for m in media_list:
                media_url = m.get('url')
@ -289,8 +351,10 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
                if not media_url or media_type == 'application/vnd.lumberjack.manifest':
                    continue
                if media_type == 'application/x-mpegURL':
-                    formats.extend(self._extract_m3u8_formats(
-                        media_url, video_id, 'mp4', live=is_live, m3u8_id='hls', fatal=False))
+                    fmt, subs = self._extract_m3u8_formats_and_subtitles(
+                        media_url, video_id, 'mp4', live=is_live, m3u8_id='hls', fatal=False)
+                    formats.extend(fmt)
+                    subtitles.update(subs)
                else:
                    f = {
                        'url': media_url,
@ -310,7 +374,6 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
            if not f.get('fps') and f['format_id'].endswith('@60'):
                f['fps'] = 60

-        subtitles = {}
        subtitles_data = try_get(metadata, lambda x: x['subtitles']['data'], dict) or {}
        for subtitle_lang, subtitle in subtitles_data.items():
            subtitles[subtitle_lang] = [{
--- a/yt_dlp/extractor/khanacademy.py
+++ b/yt_dlp/extractor/khanacademy.py
@ -15,7 +15,7 @@ from ..utils import (
 class KhanAcademyBaseIE(InfoExtractor):
    _VALID_URL_TEMPL = r'https?://(?:www\.)?khanacademy\.org/(?P<id>(?:[^/]+/){%s}%s[^?#/&]+)'

-    _PUBLISHED_CONTENT_VERSION = '171419ab20465d931b356f22d20527f13969bb70'
+    _PUBLISHED_CONTENT_VERSION = 'dc34750f0572c80f5effe7134082fe351143c1e4'

    def _parse_video(self, video):
        return {
@ -39,7 +39,7 @@ class KhanAcademyBaseIE(InfoExtractor):
            query={
                'fastly_cacheable': 'persist_until_publish',
                'pcv': self._PUBLISHED_CONTENT_VERSION,
-                'hash': '1242644265',
+                'hash': '3712657851',
                'variables': json.dumps({
                    'path': display_id,
                    'countryCode': 'US',
--- a/yt_dlp/extractor/samplefocus.py
+++ b/yt_dlp/extractor/samplefocus.py
@ -36,7 +36,7 @@ class SampleFocusIE(InfoExtractor):

    def _real_extract(self, url):
        display_id = self._match_id(url)
-        webpage = self._download_webpage(url, display_id)
+        webpage = self._download_webpage(url, display_id, impersonate=True)

        sample_id = self._search_regex(
            r'<input[^>]+id=(["\'])sample_id\1[^>]+value=(?:["\'])(?P<id>\d+)',
@ -82,7 +82,15 @@ class SampleFocusIE(InfoExtractor):
        return {
            'id': sample_id,
            'title': title,
+            'formats': [{
                'url': mp3_url,
+                'ext': 'mp3',
+                'vcodec': 'none',
+                'acodec': 'mp3',
+                'http_headers': {
+                    'Referer': url,
+                },
+            }],
            'display_id': display_id,
            'thumbnail': thumbnail,
            'uploader': uploader,
Author	SHA1	Message	Date
Mozi	322d0c9d80	Merge `84b90ec378` into `46f4c80bc3`	2024-09-07 17:53:50 +02:00
sepro	46f4c80bc3	[ie/SampleFocus] Fix extractor (#10947 ) Closes #10945 Authored by: seproDev	2024-09-07 17:06:12 +02:00
sepro	0fba08485b	[ie/khanacademy] Fix extractor (#10913 ) Closes #10912 Authored by: seproDev	2024-09-05 20:47:14 +02:00
Simon Sawicki	b6200bdcf3	[ci] Add comment sanitization workflow (#10915 ) Co-authored-by: bashonly <bashonly@protonmail.com> Authored by: bashonly, Grub4K	2024-09-05 20:06:15 +02:00
Mozi	84b90ec378	use multiple regexes to match urls I can't live without regex101	2024-08-31 05:43:15 +00:00
Mozi	c61db5e643	matching "/video/x3z49k?playlist=xv4bw" as a video instead of a playlist	2024-08-30 18:05:29 +00:00
Mozi	856239baf3	add comments about those extracted urls	2024-08-30 17:31:55 +00:00
Mozi	6328c5aa1f	we have vtt subtitles?! did I miss something?	2024-08-30 17:24:08 +00:00
Mozi	cddd001021	support embedded playlist; write sane regex; rework _VALID_URL Co-authored-by: bashonly <88596187+bashonly@users.noreply.github.com> Co-authored-by: sepro <sepro@sepr0.com>	2024-08-30 17:24:08 +00:00
Mozi	c4172f6de6	[ie/dailymotion/embed] Support embedded videos with "data-video" https://developers.dailymotion.com/player/#player-embed-script-video-embed	2024-08-25 08:11:36 +00:00