Commit Graph

4520 Commits

Author SHA1 Message Date
dependabot[bot] a29bc166a6
Bump httpx-socks[asyncio] from 0.7.2 to 0.7.4 (#3237)
Bumps [httpx-socks[asyncio]](https://github.com/romis2012/httpx-socks) from 0.7.2 to 0.7.4.
- [Release notes](https://github.com/romis2012/httpx-socks/releases)
- [Commits](https://github.com/romis2012/httpx-socks/compare/v0.7.2...v0.7.4)

---
updated-dependencies:
- dependency-name: httpx-socks[asyncio]
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-06-06 00:22:44 +02:00
Noémi Ványi 2719fd2526
Pick pass cookies from searxng (#3252)
* [enh] Allow passing headers/cookies from settings.yml

Example:

   - engine: xpath
   - search_url: example.org
   - headers: {'example_header': 'example_header'}
   - cookies: {'safesearch': 'off'}

* [fix[ Update only cookies/headers

* [enh] XPath engine - add time range support

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>

* [enh] XPath engine - add time safe-search support

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>

Co-authored-by: Allen <64094914+allendema@users.noreply.github.com>
Co-authored-by: Markus Heiser <markus.heiser@darmarit.de>
2022-06-06 00:18:33 +02:00
Noémi Ványi f00d9e0ec4
Pick minor fixes from searxng (#3251)
* [fix] Rename ccengine engine to openverse

The CC engine was merged with WordPress and renamed to Openverse

Source: https://wordpress.org/news/2021/05/welcome-to-openverse/

* [fix] ccengine engine - avoid unwanted redirects

api.openverse.engineering is a little picky and wants to have a trailing slash
in the path:

    /v1/images? -->/ v1/images/?

otherwise it redirects, here is the debug log:

    DEBUG   searx.network.openverse       : HTTP Request: GET https://api.openverse.engineering/v1/images?&page=1&page_size=20&format=json&q=foo "HTTP/2 301 Moved Permanently" (text/html; charset=utf-8)
    DEBUG   searx.network.openverse       : HTTP Request: GET https://api.openverse.engineering/v1/images/?&page=1&page_size=20&format=json&q=foo "HTTP/2 200 OK" (application/json)
    WARNING searx.engines.openverse       : ErrorContext('searx/search/processors/online.py', 105, 'count_error(', None, '1 redirects, maximum: 0', ('200', 'OK', 'api.openverse.engineering')) True

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>

* [fix] FutureWarning from lxml

Just in case if content is None, the original code will skip extract_text(), and
just append the None value to 'content'. So just add allow_none=True, and this
will return None without raising a ValueError in extract_text().

* [enh] Add pagination to Brave

Also added ```&spellcheck=1``` because now it is disabled by default, not returning any ```suggestion_xpath```.

Co-authored-by: Léon Tiekötter <leon@tiekoetter.com>
Co-authored-by: Markus Heiser <markus.heiser@darmarit.de>
Co-authored-by: capric98 <42015599+capric98@users.noreply.github.com>
Co-authored-by: Allen <64094914+allendema@users.noreply.github.com>
2022-06-06 00:01:27 +02:00
dependabot[bot] 8ee980979a
Bump lxml from 4.7.1 to 4.9.0 (#3249)
Bumps [lxml](https://github.com/lxml/lxml) from 4.7.1 to 4.9.0.
- [Release notes](https://github.com/lxml/lxml/releases)
- [Changelog](https://github.com/lxml/lxml/blob/master/CHANGES.txt)
- [Commits](https://github.com/lxml/lxml/compare/lxml-4.7.1...lxml-4.9.0)

---
updated-dependencies:
- dependency-name: lxml
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-06-05 23:47:08 +02:00
liimee a3e41c3cd6
Add TVmaze engine (#3246) 2022-06-05 23:36:04 +02:00
Noémi Ványi f0b1c9bbcc
Updated version of "Ddg safe search" PR (#3247)
* fix safe search with ddg engine

* fix unused imports

* extract title from htmlextractor

Co-authored-by: Nivesh Krishna <nivesh@e.email>
2022-06-02 21:36:04 +02:00
searx-bot 6ffa70d879
Update searx.data - update_wikidata_units.py (#3222)
Co-authored-by: dalf <dalf@users.noreply.github.com>
2022-05-24 21:08:52 +02:00
searx-bot 81b8bf3fe0
Update searx.data - update_firefox_version.py (#3223)
Co-authored-by: dalf <dalf@users.noreply.github.com>
2022-05-24 21:08:36 +02:00
nathannaveen 260949ed48
chore: Set permissions for GitHub actions (#3225)
Restrict the GitHub token permissions only to the required ones; this way, even if the attackers will succeed in compromising your workflow, they won’t be able to do much.

- Included permissions for the action. https://github.com/ossf/scorecard/blob/main/docs/checks.md#token-permissions

https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#permissions

https://docs.github.com/en/actions/using-jobs/assigning-permissions-to-jobs

[Keeping your GitHub Actions and workflows secure Part 1: Preventing pwn requests](https://securitylab.github.com/research/github-actions-preventing-pwn-requests/)

Signed-off-by: nathannaveen <42319948+nathannaveen@users.noreply.github.com>
2022-05-24 21:07:23 +02:00
searx-bot f522f92250
Update searx.data - update_currencies.py (#3203)
Co-authored-by: dalf <dalf@users.noreply.github.com>
Co-authored-by: Noémi Ványi <kvch@users.noreply.github.com>
2022-04-13 21:11:59 +02:00
searx-bot 3a2a153cb8
Update searx.data - update_firefox_version.py (#3202)
Co-authored-by: dalf <dalf@users.noreply.github.com>
Co-authored-by: Noémi Ványi <kvch@users.noreply.github.com>
2022-04-13 21:10:18 +02:00
searx-bot a87555755d
Update searx.data - update_ahmia_blacklist.py (#3201)
Co-authored-by: dalf <dalf@users.noreply.github.com>
Co-authored-by: Noémi Ványi <kvch@users.noreply.github.com>
2022-04-13 21:09:55 +02:00
searx-bot ddb9870acf
Update searx.data - update_wikidata_units.py (#3200)
Co-authored-by: dalf <dalf@users.noreply.github.com>
Co-authored-by: Noémi Ványi <kvch@users.noreply.github.com>
2022-04-13 21:09:32 +02:00
Eric Zhang b7d91c9c95
yahoo engine - don't lump all search suggestions together (#3208) 2022-04-13 21:00:54 +02:00
Noémi Ványi ba95fd570b
Merge pull request #3209 from kvch/update-flask
Update flask and jinja2 to fix build
2022-04-13 20:55:30 +02:00
Markus Heiser 3abf620418 [fix] issue when upgrading from werkzeug v2.0.3 to v2.1.0
In v2.1.0 werkzeug [1] fixed an issue [2] to keep relative redirect locations by
default [3].  Since relative locations are returned, we need to fix out test
cases to avoid AssertionErrors like this one::

    ======================================================================
    FAIL: test_index_html_get (tests.unit.test_webapp.ViewsTestCase)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
    File "/home/runner/work/searxng/searxng/tests/unit/test_webapp.py", line 105, in test_index_html_get
      self.assertEqual(result.location, 'http://localhost/search?q=test')
    AssertionError: '/search?q=test' != 'http://localhost/search?q=test'
    - /search?q=test
    + http://localhost/search?q=test

[1] https://werkzeug.palletsprojects.com/
[2] https://github.com/pallets/werkzeug/issues/2352 fixed in
[3] https://github.com/pallets/werkzeug/pull/2354

Related-to: https://github.com/searxng/searxng/pull/1039#issuecomment-1085538288
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-04-13 20:49:42 +02:00
Noémi Ványi bca820589e Update flask and jinja2 2022-04-13 20:45:24 +02:00
Noémi Ványi 03eb9c2461 Provide better error message if settings.yml cannot be loaded
Closes #3184
2022-03-17 20:34:50 +01:00
Markus Heiser f231d79a5d [fix] engine: Semantic Scholar (Science) // rework & fix
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-03-05 20:59:11 +01:00
Noémi Ványi c56f2f1d6b Skip result in Semantic Scholar engine if URL is missing 2022-03-03 22:06:04 +01:00
searx-bot e2ab703f3e
Update searx.data - update_firefox_version.py (#3171)
Co-authored-by: dalf <dalf@users.noreply.github.com>
Co-authored-by: Noémi Ványi <kvch@users.noreply.github.com>
2022-03-02 22:29:00 +01:00
searx-bot c9777de0d5
Update searx.data - update_wikidata_units.py (#3170)
Co-authored-by: dalf <dalf@users.noreply.github.com>
Co-authored-by: Noémi Ványi <kvch@users.noreply.github.com>
2022-03-02 22:23:06 +01:00
Marc Abonce Seguin c9e6d9f5f6
Fix Tor proxy when using httpx 0.21.x (#3165)
## What does this PR do?

This should fix #3164.
The problem is that `httpx` keeps making breaking changes to their library, so we just have to adjust the code a little bit to make it work with the new version of the library.


## Related issues
Closes  #3164
2022-03-01 20:21:25 +01:00
Noémi Ványi 0669bfd7a5
Fix issues in network after updating httpx to 0.21.x (#3169)
* [mod] upgrade httpx 0.21.2

httpx 0.21.2 and httpcore 0.14.4 fix multiple issues:
* https://github.com/encode/httpx/releases/tag/0.21.2
* https://github.com/encode/httpcore/releases/tag/0.14.4

so most of the workarounds in searx.network have been removed.

* pick even more changes from searxng

Co-authored-by: Alexandre Flament <alex@al-f.net>
2022-02-28 22:05:20 +01:00
searx-bot 0248777f95
Update searx.data - update_ahmia_blacklist.py (#3158)
Co-authored-by: dalf <dalf@users.noreply.github.com>
2022-02-11 21:24:50 +01:00
searx-bot 22ecae7d48
Update searx.data - update_currencies.py (#3157)
Co-authored-by: dalf <dalf@users.noreply.github.com>
2022-02-11 21:24:43 +01:00
searx-bot fa2ad3cb03
Update searx.data - update_wikidata_units.py (#3156)
Co-authored-by: dalf <dalf@users.noreply.github.com>
2022-02-11 21:24:26 +01:00
searx-bot bf021c538d
Update searx.data - update_firefox_version.py (#3155)
Co-authored-by: dalf <dalf@users.noreply.github.com>
2022-02-11 21:24:12 +01:00
israelyago 3fd18ab51b
Fix digg engine (#3150) 2022-01-30 16:41:53 +01:00
Noémi Ványi a164585118 Add extra features to Gigablast engine:
* fast can be enabled to results are returned quicker
* collection can be configured
* search_type can be changed to images or news

Closes #3078
2022-01-22 19:14:45 +01:00
iko 01e28757d3
Fixed Hoogle engine (#3146) 2022-01-22 18:22:24 +01:00
Noémi Ványi accba7afb2 Install searx as root in Docker
Closes #2901
2022-01-22 18:09:38 +01:00
Noémi Ványi ea38fea711
Pick image_proxy changes from searxng (#2965)
* [mod] /image_proxy: don't decompress images

* [fix] image_proxy: always close the httpx respone

previously, when the content type was not an image and some other error,
the httpx response was not closed

* [mod] /image_proxy: use HTTP/1 instead of HTTP/2

httpx: HTTP/2 is slow when a lot data is downloaded.
https://github.com/dalf/pyhttp-benchmark

also, the usage of HTTP/1 decreases the load average

* [mod] searx.utils.dict_subset: rewrite with comprehension

Co-authored-by: Alexandre Flament <alex@al-f.net>
2022-01-22 13:49:00 +01:00
Alexandre Flament ad7e00ad03 [fix] startpage autocompletion 2022-01-22 12:18:57 +01:00
Allen 0c351ea364
[enh] Add Tineye reverse image search (#3040)
* [enh] Add Tineye reverse image search 

Other optional parametesr:

"&sort=crawl_date" can be appended to search_string to sort results by date.
"&domain=example.org" can be implemented to search_string to get results from just one domain.

Public instances could get relatively fast timed-out for 3600s.

* [enh] Add TIneye to settings.yml 

Check if that's the right shortcut.

* [mod] Fix checks

* [mod] Try to fix checks

* [mod] Use Four spaces for indentation

And set paging back to True

Co-authored-by: Noémi Ványi <kvch@users.noreply.github.com>
2022-01-22 12:15:19 +01:00
Noémi Ványi fd9d6b58d5 Add scheme to img_src and thumbnail_url if missing from URL
Closes #3092
2022-01-22 11:59:21 +01:00
Noémi Ványi 148090df12 Minor fixes to satisfy the linter 2022-01-21 17:59:10 +01:00
Alexandre Flament d592159cc5 [fix] startpage: workaround to use the startpage network
workaround for the issue #762
2022-01-21 17:59:10 +01:00
Markus Heiser 036d80ed20 [mod] starpage engine: add comment about Startpage's FFox add-on
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-21 17:59:10 +01:00
Markus Heiser a4bc089091 [fix] startpage engine: fetch CAPTCHA & issues related to PR-695
In case of CAPTCHA raise a SearxEngineCaptchaException and suspend for 7 days.
When get_sc_code() fails raise a SearxEngineResponseException and suspend for 7
days.

[1] https://github.com/searxng/searxng/pull/695

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-21 17:59:10 +01:00
Markus Heiser 1076d7e52e [fix] Get an actual `sc` argument from startpage's home page.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-21 17:59:10 +01:00
Markus Heiser a6184ac32c [pylint] Startpage engine
Fix remarks from pylint

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-21 17:59:10 +01:00
Markus Heiser 4750586fb0 [fix] startpage engine - avoid captcha
Startpage has introduced new anti-scraping measures that make SearXNG instances
run into captchas:

1. some arguments has been removed and a new `sc` has been added.
2. search path changed from `do/search` to `sp/search`
3. POST request is no longer needed

Closes: https://github.com/searxng/searxng/issues/692
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-21 17:59:10 +01:00
Markus Heiser 99128537a8 [fix] googel engine - "some results are invalids: invalid content"
Fix google issues listet in the `/stats?engine=google` and message::

    some results are invalids: invalid content

The log is::

    DEBUG   searx                         : result: invalid content: {'url': 'https://de.wikipedia.org/wiki/Foo', 'title': 'Foo - Wikipedia', 'content': None, 'engine': 'google'}
    WARNING searx.engines.google          : ErrorContext('searx/search/processors/abstract.py', 111, 'result_container.extend(self.engine_name, search_results)', None, 'some results are invalids: invalid content', ()) True

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-21 17:59:10 +01:00
dependabot[bot] 5e45b5abd7
Bump sphinx from 4.3.2 to 4.4.0 (#3143)
Bumps [sphinx](https://github.com/sphinx-doc/sphinx) from 4.3.2 to 4.4.0.
- [Release notes](https://github.com/sphinx-doc/sphinx/releases)
- [Changelog](https://github.com/sphinx-doc/sphinx/blob/4.x/CHANGES)
- [Commits](https://github.com/sphinx-doc/sphinx/compare/v4.3.2...v4.4.0)

---
updated-dependencies:
- dependency-name: sphinx
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-01-21 17:31:05 +01:00
dependabot[bot] 1440cefb8d
Bump flask from 2.0.1 to 2.0.2 (#3142)
Bumps [flask](https://github.com/pallets/flask) from 2.0.1 to 2.0.2.
- [Release notes](https://github.com/pallets/flask/releases)
- [Changelog](https://github.com/pallets/flask/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/flask/compare/2.0.1...2.0.2)

---
updated-dependencies:
- dependency-name: flask
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-01-21 17:17:38 +01:00
Markus Heiser 26c92d5f50 [fix] google engine: remove adds and fix mobile_ui selector
1. Fix issue reported in comment [1]
2. Fix XPath selector for the response of google's mobile UI, reported in
   comment [2]

[1] https://github.com/searxng/searxng/pull/777#issuecomment-1015121322
[2] https://github.com/searxng/searxng/pull/777#issuecomment-1015236238

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-20 08:33:53 +01:00
Émilien Devos a2ec27696c Update XPath for Google engine 2022-01-19 23:03:36 +01:00
dependabot[bot] 6ebe9b15a2
Bump flask from 1.1.2 to 2.0.2 (#2881)
* Bump flask from 1.1.2 to 2.0.2

Bumps [flask](https://github.com/pallets/flask) from 1.1.2 to 2.0.1.
- [Release notes](https://github.com/pallets/flask/releases)
- [Changelog](https://github.com/pallets/flask/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/flask/compare/1.1.2...2.0.1)

---
updated-dependencies:
- dependency-name: flask
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

* rm non existent version

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Noémi Ványi <kvch@users.noreply.github.com>
Co-authored-by: Noémi Ványi <sitbackandwait@gmail.com>
2022-01-17 23:02:32 +01:00
Maciej "RooTer" Urbański acefa65ac5
Run tests under python 3.10 (#3035)
* fix SC2086 on mkdir $SEARX_SETTINGS_PATH
* run tests under python 3.10
* Update requirements.txt for now to downgrade transifex

Co-authored-by: Noémi Ványi <sitbackandwait@gmail.com>
2022-01-17 22:45:01 +01:00