🩹 Change regex pattern to use [\\s\\S]

📝 Code commentary updated to reflect change to `[\\s\\S]`.
 Tested on both default and Beehiiv feeds.
This commit is contained in:
Stuart Breckenridge 2025-01-01 11:48:27 +08:00
parent d2364ff660
commit 9b508e068b
No known key found for this signature in database

View File

@ -189,37 +189,28 @@ public extension String {
/// Removes an HTML tag and everything between its start and end tags. /// Removes an HTML tag and everything between its start and end tags.
/// ///
/// The regex pattern `<tag>.*?</tag>` explanation: /// The regex pattern `<tag>[\\s\\S]*?</tag>` explanation:
/// - `<` matches the literal `<` character. /// - `<` matches the literal `<` character.
/// - `tag` matches the literal parameter provided to the function, e.g., `style`. /// - `tag` matches the literal parameter provided to the function, e.g., `style`.
/// - `>` matches the literal `>` character. /// - `>` matches the literal `>` character.
/// - `.*?` /// - `[\\s\\S]*?`
/// - `.` matches _any_ character **except** a new line /// - `[\\s\\S]` matches _any_ character, including new lines.
/// - `*` will match zero or more of the preceeding character, in this case _any_ /// - `*` will match zero or more of the preceeding character, in this case _any_
/// character /// character.
/// - `?` switches the matching mode to [lazy](https://javascript.info/regexp-greedy-and-lazy) /// - `?` switches the matching mode to [lazy](https://javascript.info/regexp-greedy-and-lazy)
/// so it will match as few as characters as possible before satisfying the rest of the pattern. /// so it will match as few as characters as possible before satisfying the rest of the pattern.
/// - `<` matches the literal `<` character. /// - `<` matches the literal `<` character.
/// - `/` matches the literal `/` character. /// - `/` matches the literal `/` character.
/// - `tag` matches the literal parameter provided to the function, e.g., `style` /// - `tag` matches the literal parameter provided to the function, e.g., `style`.
/// - `>` matches the literal `>` character. /// - `>` matches the literal `>` character.
/// ///
///
/// - Parameter tag: The tag to remove. /// - Parameter tag: The tag to remove.
/// ///
/// - Returns: A new copy of `self` with the tag removed. /// - Returns: A new copy of `self` with the tag removed.
/// ///
/// - Note: Doesn't work correctly with nested tags of the same name. /// - Note: Doesn't work correctly with nested tags of the same name.
private func removingTagAndContents(_ tag: String) -> String { private func removingTagAndContents(_ tag: String) -> String {
let pattern = "<\(tag)>.*?<\\/\(tag)>" return self.replacingOccurrences(of: "<\(tag)>[\\s\\S]*?</\(tag)>", with: "", options: [.regularExpression, .caseInsensitive])
if let regex = try? NSRegularExpression(pattern: pattern, options: [.dotMatchesLineSeparators, .caseInsensitive]) {
let range = NSRange(location: 0, length: self.utf16.count)
let modifiedString = regex.stringByReplacingMatches(in: self, options: [], range: range, withTemplate: "")
return modifiedString
} else {
// If the above regex fails, fall back to the original method.
return self.replacingOccurrences(of: "<\(tag).+?</\(tag)>", with: "", options: [.regularExpression, .caseInsensitive])
}
} }
/// Strips HTML from a string. /// Strips HTML from a string.