totally reworked process argument tokenization, allow to use both single and double quotes and removing some other limitation
This commit is contained in:
parent
60bacf395a
commit
edf696bae2
@ -19,8 +19,24 @@ However, if you choose `Script` option, then you cannot provide URL of your feed
|
|||||||
|
|
||||||
Any errors in your script must be written to [**error output** (stderr)](https://en.wikipedia.org/wiki/Standard_streams#Standard_error_(stderr)).
|
Any errors in your script must be written to [**error output** (stderr)](https://en.wikipedia.org/wiki/Standard_streams#Standard_error_(stderr)).
|
||||||
|
|
||||||
```{warning}
|
:::{warning}
|
||||||
As of RSS Guard 4.2.0, you cannot separate your arguments with `#`. If your argument contains spaces, then enclose it with DOUBLE quotes, for example `"my argument"`. DO NOT use SINGLE quotes to do that.
|
If your path to executable contains backslashes as directory separators, make sure to escape them with another backslash. Quote each individual argument with double quotes `"arg"` or single quotes `'arg'` and separate all arguments with spaces. You have to escape some characters inside double-quoted argument, for example double quote itself like this `"arg with \"quoted\" part"`.
|
||||||
|
|
||||||
|
Examples (one per line):
|
||||||
|
|
||||||
|
```
|
||||||
|
C:\\MyFolder\\My.exe "arg1" "arg2" "my \"quoted\" arg3" 'my "quoted" arg4'
|
||||||
|
|
||||||
|
bash "%data%/scripts/download-feed.sh"
|
||||||
|
|
||||||
|
%data%\jq.exe '{ version: "1.1", title: "Stars", items: map( . | .title=.full_name | .content_text=.description | .date_published=.pushed_at)}'
|
||||||
|
```
|
||||||
|
:::
|
||||||
|
|
||||||
|
RSS Guard offers [placeholder](userdata.md#data-placeholder) `%data%` which is automatically replaced with full path to RSS Guard user data folder and you can use this placeholder anywhere in your script call line.
|
||||||
|
|
||||||
|
```{attention}
|
||||||
|
Working directory of process executing the script is set to point to RSS Guard [user data](userdata) folder.
|
||||||
```
|
```
|
||||||
|
|
||||||
Format of post-process script execution line can be seen on picture below.
|
Format of post-process script execution line can be seen on picture below.
|
||||||
@ -31,25 +47,7 @@ If everything goes well, script must return `0` as the process exit code, or a n
|
|||||||
|
|
||||||
Executable file must be always be specified, while arguments do not. Be very careful when quoting arguments. Tested examples of valid execution lines are:
|
Executable file must be always be specified, while arguments do not. Be very careful when quoting arguments. Tested examples of valid execution lines are:
|
||||||
|
|
||||||
| Command | Explanation |
|
## Dataflow
|
||||||
| :--- | --- |
|
|
||||||
| `bash -c "curl 'https://github.com/martinrotter.atom'"` | Download ATOM feed file using Bash and Curl. |
|
|
||||||
| `Powershell Invoke-WebRequest "https://github.com/martinrotter.atom" \| Select-Object -ExpandProperty Content` | Download ATOM feed file with Powershell. |
|
|
||||||
| `php tweeper.php -v 0 "https://twitter.com/NSACareers"` | Scrape Twitter RSS feed file with [Tweeper](https://git.ao2.it/tweeper.git). Tweeper is the utility that produces RSS feed from Twitter and other similar social platforms. |
|
|
||||||
|
|
||||||
```{note}
|
|
||||||
The above examples are cross-platform. You can use exactly the same command on Windows, Linux or macOS, if your operating system is properly configured.
|
|
||||||
```
|
|
||||||
|
|
||||||
RSS Guard offers [placeholder](userdata.md#data-placeholder) `%data%` which is automatically replaced with full path to RSS Guard user data folder, allowing you to make your configuration fully portable. You can, therefore, use something like this as a source script line: `bash %data%/scripts/download-feed.sh`.
|
|
||||||
|
|
||||||
```{attention}
|
|
||||||
Working directory of process executing the script is set to point to RSS Guard [user data](userdata) folder.
|
|
||||||
```
|
|
||||||
|
|
||||||
There are [examples of website scrapers](https://github.com/martinrotter/rssguard/tree/master/resources/scripts/scrapers). Most of them are written in Python 3, so their execution line is similar to `python script.py`. Make sure to examine each script for more information on how to use it.
|
|
||||||
|
|
||||||
----
|
|
||||||
After your source feed data is downloaded either via URL or custom script, you can optionally post-process it with one more custom script, which will take **raw source data as input**. It must produce valid feed data to standard output while printing all error messages to error output.
|
After your source feed data is downloaded either via URL or custom script, you can optionally post-process it with one more custom script, which will take **raw source data as input**. It must produce valid feed data to standard output while printing all error messages to error output.
|
||||||
|
|
||||||
Here is little flowchart explaining where and when scripts are used:
|
Here is little flowchart explaining where and when scripts are used:
|
||||||
@ -76,6 +74,10 @@ Typical post-processing filter might do things like CSS formatting, localization
|
|||||||
|
|
||||||
It's completely up to you if you decide to only use script as `Source` of the script or separate your custom functionality between `Source` script and `Post-process` script. Sometimes you might need different `Source` scripts for different online sources and the same `Post-process` script and vice versa.
|
It's completely up to you if you decide to only use script as `Source` of the script or separate your custom functionality between `Source` script and `Post-process` script. Sometimes you might need different `Source` scripts for different online sources and the same `Post-process` script and vice versa.
|
||||||
|
|
||||||
|
## Example Scrapers
|
||||||
|
There are [examples of website scrapers](https://github.com/martinrotter/rssguard/tree/master/resources/scripts/scrapers). Most of them are written in Python 3, so their execution line is similar to `python "script.py"`. Make sure to examine each script for more information on how to use it.
|
||||||
|
|
||||||
|
## 3rd-party Tools
|
||||||
Third-party tools for scraping made to work with RSS Guard:
|
Third-party tools for scraping made to work with RSS Guard:
|
||||||
* [CSS2RSS](https://github.com/Owyn/CSS2RSS) - can be used to scrape websites with CSS selectors.
|
* [CSS2RSS](https://github.com/Owyn/CSS2RSS) - can be used to scrape websites with CSS selectors.
|
||||||
* [RSSGuardHelper](https://github.com/pipiscrew/RSSGuardHelper) - another CSS selectors helper.
|
* [RSSGuardHelper](https://github.com/pipiscrew/RSSGuardHelper) - another CSS selectors helper.
|
||||||
|
@ -175,45 +175,166 @@ QString TextFactory::capitalizeFirstLetter(const QString& sts) {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
QStringList TextFactory::tokenizeProcessArguments(QStringView command) {
|
enum class TokenState {
|
||||||
|
// We are not inside argument, we are between arguments.
|
||||||
|
Normal,
|
||||||
|
|
||||||
|
// We have detected escape "\" character coming from double-quoted argument.
|
||||||
|
EscapedFromDoubleQuotes,
|
||||||
|
|
||||||
|
// We have detected escape "\" character coming from spaced argument.
|
||||||
|
EscapedFromSpaced,
|
||||||
|
|
||||||
|
// We are inside argument which was separated by spaces.
|
||||||
|
InsideArgSpaced,
|
||||||
|
|
||||||
|
// We are inside argument.
|
||||||
|
InsideArgDoubleQuotes,
|
||||||
|
|
||||||
|
// We are inside argument, do not evaluate anything, just take it all
|
||||||
|
// as arw text.
|
||||||
|
InsideArgSingleQuotes
|
||||||
|
};
|
||||||
|
|
||||||
|
QStringList TextFactory::tokenizeProcessArguments(const QString& command) {
|
||||||
|
// Each argument containing spaces must be enclosed with single '' or double "" quotes.
|
||||||
|
// Some characters must be escaped with \ to keep their textual values as
|
||||||
|
// long as double-quoted argument is used.
|
||||||
|
|
||||||
|
if (command.isEmpty()) {
|
||||||
|
return {};
|
||||||
|
}
|
||||||
|
|
||||||
|
// We append space to end of command to make sure that
|
||||||
|
// ending space-separated argument is processed.
|
||||||
|
QString my_command = command + u' ';
|
||||||
|
|
||||||
|
TokenState state = TokenState::Normal;
|
||||||
QStringList args;
|
QStringList args;
|
||||||
QString tmp;
|
QString arg;
|
||||||
int quote_count = 0;
|
|
||||||
bool in_quote = false;
|
|
||||||
|
|
||||||
for (int i = 0; i < command.size(); ++i) {
|
for (QChar chr : my_command) {
|
||||||
if (command.at(i) == QL1C('"')) {
|
switch (state) {
|
||||||
++quote_count;
|
case TokenState::Normal: {
|
||||||
|
switch (chr.unicode()) {
|
||||||
|
case u'"':
|
||||||
|
// We start double-quoted argument.
|
||||||
|
state = TokenState::InsideArgDoubleQuotes;
|
||||||
|
continue;
|
||||||
|
|
||||||
if (quote_count == 3) {
|
case u'\'':
|
||||||
quote_count = 0;
|
// We start single-quoted argument.
|
||||||
tmp += command.at(i);
|
state = TokenState::InsideArgSingleQuotes;
|
||||||
|
continue;
|
||||||
|
|
||||||
|
case u' ':
|
||||||
|
// Whitespace, just go on.
|
||||||
|
continue;
|
||||||
|
|
||||||
|
default:
|
||||||
|
// We found some actual text which marks
|
||||||
|
// beginning of argument, we assume spaced argument.
|
||||||
|
arg.append(chr);
|
||||||
|
state = TokenState::InsideArgSpaced;
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
break;
|
||||||
}
|
}
|
||||||
|
|
||||||
continue;
|
case TokenState::EscapedFromDoubleQuotes: {
|
||||||
}
|
// Previous character was "\".
|
||||||
|
arg.append(chr);
|
||||||
if (quote_count) {
|
state = TokenState::InsideArgDoubleQuotes;
|
||||||
if (quote_count == 1) {
|
break;
|
||||||
in_quote = !in_quote;
|
|
||||||
}
|
}
|
||||||
|
|
||||||
quote_count = 0;
|
case TokenState::EscapedFromSpaced: {
|
||||||
}
|
// Previous character was "\".
|
||||||
|
arg.append(chr);
|
||||||
if (!in_quote && command.at(i).isSpace()) {
|
state = TokenState::InsideArgSpaced;
|
||||||
if (!tmp.isEmpty()) {
|
break;
|
||||||
args += tmp;
|
}
|
||||||
tmp.clear();
|
|
||||||
|
case TokenState::InsideArgSpaced: {
|
||||||
|
switch (chr.unicode()) {
|
||||||
|
case u'\\':
|
||||||
|
// We found escaped!
|
||||||
|
state = TokenState::EscapedFromSpaced;
|
||||||
|
continue;
|
||||||
|
|
||||||
|
case u' ':
|
||||||
|
// We need to end this argument.
|
||||||
|
args.append(arg);
|
||||||
|
arg.clear();
|
||||||
|
state = TokenState::Normal;
|
||||||
|
continue;
|
||||||
|
|
||||||
|
default:
|
||||||
|
arg.append(chr);
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
|
||||||
|
case TokenState::InsideArgDoubleQuotes: {
|
||||||
|
switch (chr.unicode()) {
|
||||||
|
case u'\\':
|
||||||
|
// We found escaped!
|
||||||
|
state = TokenState::EscapedFromDoubleQuotes;
|
||||||
|
continue;
|
||||||
|
|
||||||
|
case u'"':
|
||||||
|
// We need to end this argument.
|
||||||
|
args.append(arg);
|
||||||
|
arg.clear();
|
||||||
|
state = TokenState::Normal;
|
||||||
|
continue;
|
||||||
|
|
||||||
|
default:
|
||||||
|
arg.append(chr);
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
|
||||||
|
case TokenState::InsideArgSingleQuotes: {
|
||||||
|
switch (chr.unicode()) {
|
||||||
|
case u'\'':
|
||||||
|
// We need to end this argument.
|
||||||
|
args.append(arg);
|
||||||
|
arg.clear();
|
||||||
|
state = TokenState::Normal;
|
||||||
|
continue;
|
||||||
|
|
||||||
|
default:
|
||||||
|
arg.append(chr);
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
|
||||||
|
break;
|
||||||
}
|
}
|
||||||
}
|
|
||||||
else {
|
|
||||||
tmp += command.at(i);
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
if (!tmp.isEmpty()) {
|
switch (state) {
|
||||||
args += tmp;
|
case TokenState::EscapedFromSpaced:
|
||||||
|
case TokenState::EscapedFromDoubleQuotes:
|
||||||
|
throw ApplicationException(QObject::tr("escape sequence not completed"));
|
||||||
|
break;
|
||||||
|
|
||||||
|
case TokenState::InsideArgDoubleQuotes:
|
||||||
|
throw ApplicationException(QObject::tr("closing \" is missing"));
|
||||||
|
break;
|
||||||
|
|
||||||
|
case TokenState::InsideArgSingleQuotes:
|
||||||
|
throw ApplicationException(QObject::tr("closing ' is missing"));
|
||||||
|
break;
|
||||||
|
|
||||||
|
default:
|
||||||
|
break;
|
||||||
}
|
}
|
||||||
|
|
||||||
return args;
|
return args;
|
||||||
|
@ -36,7 +36,7 @@ class TextFactory {
|
|||||||
static QString decrypt(const QString& text, quint64 key = 0);
|
static QString decrypt(const QString& text, quint64 key = 0);
|
||||||
static QString newline();
|
static QString newline();
|
||||||
static QString capitalizeFirstLetter(const QString& sts);
|
static QString capitalizeFirstLetter(const QString& sts);
|
||||||
static QStringList tokenizeProcessArguments(QStringView args);
|
static QStringList tokenizeProcessArguments(const QString& command);
|
||||||
|
|
||||||
// Shortens input string according to given length limit.
|
// Shortens input string according to given length limit.
|
||||||
static QString shorten(const QString& input, int text_length_limit = TEXT_TITLE_LIMIT);
|
static QString shorten(const QString& input, int text_length_limit = TEXT_TITLE_LIMIT);
|
||||||
|
@ -7,6 +7,7 @@
|
|||||||
#include "exceptions/networkexception.h"
|
#include "exceptions/networkexception.h"
|
||||||
#include "exceptions/scriptexception.h"
|
#include "exceptions/scriptexception.h"
|
||||||
#include "miscellaneous/iconfactory.h"
|
#include "miscellaneous/iconfactory.h"
|
||||||
|
#include "miscellaneous/textfactory.h"
|
||||||
#include "network-web/networkfactory.h"
|
#include "network-web/networkfactory.h"
|
||||||
#include "services/abstract/category.h"
|
#include "services/abstract/category.h"
|
||||||
#include "services/standard/definitions.h"
|
#include "services/standard/definitions.h"
|
||||||
@ -260,11 +261,12 @@ void StandardFeedDetails::onUrlChanged(const QString& new_url) {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
else if (sourceType() == StandardFeed::SourceType::Script) {
|
else if (sourceType() == StandardFeed::SourceType::Script) {
|
||||||
if (new_url.simplified().isEmpty()) {
|
try {
|
||||||
m_ui.m_txtSource->setStatus(LineEditWithStatus::StatusType::Error, tr("The source is empty."));
|
TextFactory::tokenizeProcessArguments(new_url);
|
||||||
|
m_ui.m_txtSource->setStatus(LineEditWithStatus::StatusType::Ok, tr("Source is ok."));
|
||||||
}
|
}
|
||||||
else {
|
catch (const ApplicationException& ex) {
|
||||||
m_ui.m_txtSource->setStatus(LineEditWithStatus::StatusType::Ok, tr("The source is ok."));
|
m_ui.m_txtSource->setStatus(LineEditWithStatus::StatusType::Error, tr("Error: %1").arg(ex.message()));
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
else {
|
else {
|
||||||
@ -273,11 +275,12 @@ void StandardFeedDetails::onUrlChanged(const QString& new_url) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
void StandardFeedDetails::onPostProcessScriptChanged(const QString& new_pp) {
|
void StandardFeedDetails::onPostProcessScriptChanged(const QString& new_pp) {
|
||||||
if (QRegularExpression(QSL(SCRIPT_SOURCE_TYPE_REGEXP)).match(new_pp).hasMatch() || !new_pp.simplified().isEmpty()) {
|
try {
|
||||||
|
TextFactory::tokenizeProcessArguments(new_pp);
|
||||||
m_ui.m_txtPostProcessScript->setStatus(LineEditWithStatus::StatusType::Ok, tr("Command is ok."));
|
m_ui.m_txtPostProcessScript->setStatus(LineEditWithStatus::StatusType::Ok, tr("Command is ok."));
|
||||||
}
|
}
|
||||||
else {
|
catch (const ApplicationException& ex) {
|
||||||
m_ui.m_txtPostProcessScript->setStatus(LineEditWithStatus::StatusType::Ok, tr("Command is empty."));
|
m_ui.m_txtPostProcessScript->setStatus(LineEditWithStatus::StatusType::Error, tr("Error: %1").arg(ex.message()));
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
Loading…
x
Reference in New Issue
Block a user