Commit Graph

1208 Commits

Author SHA1 Message Date
Brett Kosinski 2fc1cd3a15
Fix regression is retrieving sc code (#3453)
I broke the code when I fixed this.  The old codepath did some trimming
of the sc value and I didn't kill that line, so the value was being
clipped.

This fixes #3430 and is confirmed working on a live instance.
2023-04-04 21:29:17 +02:00
br4nnigan 38606234a8
fix bing results sometimes still using bing redirect urls (#3482) 2023-04-04 21:26:44 +02:00
Dr. Rolf Jansen 9ba072bb74
fix duckduckgo engine (#3486)
Co-authored-by: rolf <rolf@>
2023-04-04 21:18:04 +02:00
Grant Lanham Jr 67c233d0c3
add "Accept" header to bing.py (#3473) 2023-04-04 21:16:04 +02:00
ganeshlab 2ec47dce5e
Fix google engine (#3489)
This issue popped up again and part of the fix was in 6f9e678346.
2023-04-04 21:12:46 +02:00
Dr. Rolf Jansen a79e8194d7
DuckDuckGo fixes (#3444)
Co-authored-by: rolf <rolf@>
2023-01-14 17:42:57 +01:00
Brett Kosinski 3c84af95ba
Fix scraping of 'sc' value from homepage (#3397)
Looking at the current HTML for the Startpage front page, the previous
footer logo element is no longer present.  This change scrapes the "sc"
parameter from one of the hidden HTML form elements, which should
(hopefully) be a bit more stable long term, since that form is used by
Startpage to submit requests to the engine.
2022-10-31 22:34:43 +01:00
Kian-Meng Ang 629ebb426f
Fix typos (#3366)
Found via `codespell -S ./searx/translations,./searx/data,./searx/static -L ans,te,fo,doubleclick,tthe,dum`
2022-09-29 23:06:59 +02:00
br4nnigan a9dadda6f7 allow engines to override pretty_url and use this in bing to show meaningful urls 2022-09-28 20:49:51 +02:00
Adam Tauber 2222caec22 [enh] add omnom engine 2022-09-20 23:04:25 +02:00
Markus Heiser 1abecbc835 [fix] google - simplify XPath selectors to fetch more results
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-09-18 19:47:09 +02:00
Émilien Devos d656b340ee output format protobuf to HTML for google mobile 2022-09-18 19:44:48 +02:00
Elena Poelman ca27e91594
New search engine: IPFS search (#3218)
* Feat: initial support for the ipfs-search engine

* Feat: enable paging for ipfs search

* Make ipfs-search code more readable

* added support for images, music and video to the ipfs search engine

* FIX: redefined some variables that where redefining built-ins

* adjust code so it works on older python versions

* Feat: add support for time ranges
2022-09-07 22:13:19 +02:00
dependabot[bot] d86cb95560
Bump pylint from 2.14.5 to 2.15.0 (#3353)
* Bump pylint from 2.14.5 to 2.15.0

Bumps [pylint](https://github.com/PyCQA/pylint) from 2.14.5 to 2.15.0.
- [Release notes](https://github.com/PyCQA/pylint/releases)
- [Commits](https://github.com/PyCQA/pylint/compare/v2.14.5...v2.15.0)

---
updated-dependencies:
- dependency-name: pylint
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* fix code

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Noémi Ványi <sitbackandwait@gmail.com>
2022-09-06 22:23:34 +02:00
Noémi Ványi 5b50d7455a Do not consent to tracking when using google 2022-08-02 19:22:37 +02:00
dependabot[bot] a1c06cbb1b
Bump pycodestyle from 2.8.0 to 2.9.0 (#3320)
* Bump pycodestyle from 2.8.0 to 2.9.0

Bumps [pycodestyle](https://github.com/PyCQA/pycodestyle) from 2.8.0 to 2.9.0.
- [Release notes](https://github.com/PyCQA/pycodestyle/releases)
- [Changelog](https://github.com/PyCQA/pycodestyle/blob/main/CHANGES.txt)
- [Commits](https://github.com/PyCQA/pycodestyle/compare/2.8.0...2.9.0)

---
updated-dependencies:
- dependency-name: pycodestyle
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* fix mongodb

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Noémi Ványi <sitbackandwait@gmail.com>
2022-08-02 19:01:46 +02:00
Noémi Ványi 3e0c39eafa Fix tyop: online_dictionnary -> online_dictionary 2022-07-31 17:09:03 +02:00
Noémi Ványi 54697a8705 Fix online dictionaries 2022-07-30 21:54:24 +02:00
Noémi Ványi 05fe2ee093
pick engine fixes (#3306)
* [fix] google engine: results XPath

* [fix] google & youtube - set EU consent cookie

This change the previous bypass method for Google consent using
``ucbcb=1`` (6face215b8) to accept the consent using ``CONSENT=YES+``.

The youtube_noapi and google have a similar API, at least for the consent[1].

Get CONSENT cookie from google reguest::

    curl -i "https://www.google.com/search?q=time&tbm=isch" \
         -A "Mozilla/5.0 (X11; Linux i686; rv:102.0) Gecko/20100101 Firefox/102.0" \
         | grep -i consent
    ...
    location: https://consent.google.com/m?continue=https://www.google.com/search?q%3Dtime%26tbm%3Disch&gl=DE&m=0&pc=irp&uxe=eomtm&hl=en-US&src=1
    set-cookie: CONSENT=PENDING+936; expires=Wed, 24-Jul-2024 11:26:20 GMT; path=/; domain=.google.com; Secure
    ...

PENDING & YES [2]:

  Google change the way for consent about YouTube cookies agreement in EU
  countries. Instead of showing a popup in the website, YouTube redirects the
  user to a new webpage at consent.youtube.com domain ...  Fix for this is to
  put a cookie CONSENT with YES+ value for every YouTube request

[1] https://github.com/iv-org/invidious/pull/2207
[2] https://github.com/TeamNewPipe/NewPipeExtractor/issues/592

Closes: https://github.com/searxng/searxng/issues/1432

* [fix] sjp engine - convert enginename to a latin1 compliance name

The engine name is not only a *name* its also a identifier that is used in
logs, HTTP headers and more.  Unicode characters in the name of an engine could
cause various issues.

Closes: https://github.com/searxng/searxng/issues/1544
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>

* [fix] engine tineye: handle 422 response of not supported img format

Closes: https://github.com/searxng/searxng/issues/1449
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>

* bypass google consent with ucbcb=1

* [mod] Adds Lingva translate engine

Add the lingva engine (which grabs data from google translate).  Results from
Lingva are added to the infobox results.

* openstreetmap engine: return the localized named.

For example: display "Tokyo" instead of "東京都" when the language is English.

* [fix] engines/openstreetmap.py typo: user_langage --> user_language

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>

* Wikidata engine: ignore dummy entities

* Wikidata engine: minor change of the SPARQL request

The engine can be slow especially when the query won't return any answer.
See https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual/MWAPI#Find_articles_in_Wikipedia_speaking_about_cheese_and_see_which_Wikibase_items_they_correspond_to

Co-authored-by: Léon Tiekötter <leon@tiekoetter.com>
Co-authored-by: Emilien Devos <contact@emiliendevos.be>
Co-authored-by: Markus Heiser <markus.heiser@darmarit.de>
Co-authored-by: Emilien Devos <github@emiliendevos.be>
Co-authored-by: ta <alt3753.7@gmail.com>
Co-authored-by: Alexandre Flament <alex@al-f.net>
2022-07-30 21:45:07 +02:00
Noémi Ványi 85034b49ef
Remove `httpx` and use `requests` instead (#3305)
## What does this PR do?

This PR prepares for removing `httpx`, and reverts back to `requests`.

## Why is this change important?

`httpx` hasn't proven itself to be faster or better than `requests`. On the other
hand it has caused issues on Windows.

=============================================
Please update your environment to use requests instead of httpx.
=============================================
2022-07-30 20:56:56 +02:00
james-still 210e59c68c
Add engine for Emojipedia (#3278) 2022-07-28 21:45:07 +02:00
Noémi Ványi 7bb499cb1e fix pylint error in bing engine 2022-07-01 13:12:21 +02:00
Adam Tauber a3ad9f9b34 [fix] use chrome ua to quickfix bing result urls - closes #3239 2022-06-06 14:34:56 +02:00
Noémi Ványi 2719fd2526
Pick pass cookies from searxng (#3252)
* [enh] Allow passing headers/cookies from settings.yml

Example:

   - engine: xpath
   - search_url: example.org
   - headers: {'example_header': 'example_header'}
   - cookies: {'safesearch': 'off'}

* [fix[ Update only cookies/headers

* [enh] XPath engine - add time range support

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>

* [enh] XPath engine - add time safe-search support

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>

Co-authored-by: Allen <64094914+allendema@users.noreply.github.com>
Co-authored-by: Markus Heiser <markus.heiser@darmarit.de>
2022-06-06 00:18:33 +02:00
Noémi Ványi f00d9e0ec4
Pick minor fixes from searxng (#3251)
* [fix] Rename ccengine engine to openverse

The CC engine was merged with WordPress and renamed to Openverse

Source: https://wordpress.org/news/2021/05/welcome-to-openverse/

* [fix] ccengine engine - avoid unwanted redirects

api.openverse.engineering is a little picky and wants to have a trailing slash
in the path:

    /v1/images? -->/ v1/images/?

otherwise it redirects, here is the debug log:

    DEBUG   searx.network.openverse       : HTTP Request: GET https://api.openverse.engineering/v1/images?&page=1&page_size=20&format=json&q=foo "HTTP/2 301 Moved Permanently" (text/html; charset=utf-8)
    DEBUG   searx.network.openverse       : HTTP Request: GET https://api.openverse.engineering/v1/images/?&page=1&page_size=20&format=json&q=foo "HTTP/2 200 OK" (application/json)
    WARNING searx.engines.openverse       : ErrorContext('searx/search/processors/online.py', 105, 'count_error(', None, '1 redirects, maximum: 0', ('200', 'OK', 'api.openverse.engineering')) True

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>

* [fix] FutureWarning from lxml

Just in case if content is None, the original code will skip extract_text(), and
just append the None value to 'content'. So just add allow_none=True, and this
will return None without raising a ValueError in extract_text().

* [enh] Add pagination to Brave

Also added ```&spellcheck=1``` because now it is disabled by default, not returning any ```suggestion_xpath```.

Co-authored-by: Léon Tiekötter <leon@tiekoetter.com>
Co-authored-by: Markus Heiser <markus.heiser@darmarit.de>
Co-authored-by: capric98 <42015599+capric98@users.noreply.github.com>
Co-authored-by: Allen <64094914+allendema@users.noreply.github.com>
2022-06-06 00:01:27 +02:00
liimee a3e41c3cd6
Add TVmaze engine (#3246) 2022-06-05 23:36:04 +02:00
Noémi Ványi f0b1c9bbcc
Updated version of "Ddg safe search" PR (#3247)
* fix safe search with ddg engine

* fix unused imports

* extract title from htmlextractor

Co-authored-by: Nivesh Krishna <nivesh@e.email>
2022-06-02 21:36:04 +02:00
Eric Zhang b7d91c9c95
yahoo engine - don't lump all search suggestions together (#3208) 2022-04-13 21:00:54 +02:00
Markus Heiser f231d79a5d [fix] engine: Semantic Scholar (Science) // rework & fix
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-03-05 20:59:11 +01:00
Noémi Ványi c56f2f1d6b Skip result in Semantic Scholar engine if URL is missing 2022-03-03 22:06:04 +01:00
Noémi Ványi 0669bfd7a5
Fix issues in network after updating httpx to 0.21.x (#3169)
* [mod] upgrade httpx 0.21.2

httpx 0.21.2 and httpcore 0.14.4 fix multiple issues:
* https://github.com/encode/httpx/releases/tag/0.21.2
* https://github.com/encode/httpcore/releases/tag/0.14.4

so most of the workarounds in searx.network have been removed.

* pick even more changes from searxng

Co-authored-by: Alexandre Flament <alex@al-f.net>
2022-02-28 22:05:20 +01:00
israelyago 3fd18ab51b
Fix digg engine (#3150) 2022-01-30 16:41:53 +01:00
Noémi Ványi a164585118 Add extra features to Gigablast engine:
* fast can be enabled to results are returned quicker
* collection can be configured
* search_type can be changed to images or news

Closes #3078
2022-01-22 19:14:45 +01:00
Allen 0c351ea364
[enh] Add Tineye reverse image search (#3040)
* [enh] Add Tineye reverse image search 

Other optional parametesr:

"&sort=crawl_date" can be appended to search_string to sort results by date.
"&domain=example.org" can be implemented to search_string to get results from just one domain.

Public instances could get relatively fast timed-out for 3600s.

* [enh] Add TIneye to settings.yml 

Check if that's the right shortcut.

* [mod] Fix checks

* [mod] Try to fix checks

* [mod] Use Four spaces for indentation

And set paging back to True

Co-authored-by: Noémi Ványi <kvch@users.noreply.github.com>
2022-01-22 12:15:19 +01:00
Noémi Ványi 148090df12 Minor fixes to satisfy the linter 2022-01-21 17:59:10 +01:00
Alexandre Flament d592159cc5 [fix] startpage: workaround to use the startpage network
workaround for the issue #762
2022-01-21 17:59:10 +01:00
Markus Heiser 036d80ed20 [mod] starpage engine: add comment about Startpage's FFox add-on
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-21 17:59:10 +01:00
Markus Heiser a4bc089091 [fix] startpage engine: fetch CAPTCHA & issues related to PR-695
In case of CAPTCHA raise a SearxEngineCaptchaException and suspend for 7 days.
When get_sc_code() fails raise a SearxEngineResponseException and suspend for 7
days.

[1] https://github.com/searxng/searxng/pull/695

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-21 17:59:10 +01:00
Markus Heiser 1076d7e52e [fix] Get an actual `sc` argument from startpage's home page.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-21 17:59:10 +01:00
Markus Heiser a6184ac32c [pylint] Startpage engine
Fix remarks from pylint

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-21 17:59:10 +01:00
Markus Heiser 4750586fb0 [fix] startpage engine - avoid captcha
Startpage has introduced new anti-scraping measures that make SearXNG instances
run into captchas:

1. some arguments has been removed and a new `sc` has been added.
2. search path changed from `do/search` to `sp/search`
3. POST request is no longer needed

Closes: https://github.com/searxng/searxng/issues/692
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-21 17:59:10 +01:00
Markus Heiser 99128537a8 [fix] googel engine - "some results are invalids: invalid content"
Fix google issues listet in the `/stats?engine=google` and message::

    some results are invalids: invalid content

The log is::

    DEBUG   searx                         : result: invalid content: {'url': 'https://de.wikipedia.org/wiki/Foo', 'title': 'Foo - Wikipedia', 'content': None, 'engine': 'google'}
    WARNING searx.engines.google          : ErrorContext('searx/search/processors/abstract.py', 111, 'result_container.extend(self.engine_name, search_results)', None, 'some results are invalids: invalid content', ()) True

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-21 17:59:10 +01:00
Markus Heiser 26c92d5f50 [fix] google engine: remove adds and fix mobile_ui selector
1. Fix issue reported in comment [1]
2. Fix XPath selector for the response of google's mobile UI, reported in
   comment [2]

[1] https://github.com/searxng/searxng/pull/777#issuecomment-1015121322
[2] https://github.com/searxng/searxng/pull/777#issuecomment-1015236238

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-20 08:33:53 +01:00
Émilien Devos a2ec27696c Update XPath for Google engine 2022-01-19 23:03:36 +01:00
Noémi Ványi 179784068f
Bump pylint from 2.10.2 to 2.12.2 (#3124)
Bumps [pylint](https://github.com/PyCQA/pylint) from 2.10.2 to 2.12.2.
- [Release notes](https://github.com/PyCQA/pylint/releases)
- [Changelog](https://github.com/PyCQA/pylint/blob/main/ChangeLog)
- [Commits](https://github.com/PyCQA/pylint/compare/v2.10.2...v2.12.2)

---
updated-dependencies:
- dependency-name: pylint
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-01-15 20:23:09 +01:00
Dario Nuevo 1a18adcc16
New files engine: Prowlarr (#3118)
## What does this PR do?

Gives the user the possibility to search their own prowlarr instances.

Info: https://wiki.servarr.com/en/prowlarr
Github: https://github.com/Prowlarr/Prowlarr

## Why is this change important?

Prowlarr searchs multiple upstream search providers, thus allows to use that functionality through searx.
2022-01-15 19:18:15 +01:00
Andy Jones 3ddd0f8944
Update httpx and friends to 0.21.3 (#3121) 2022-01-15 19:16:10 +01:00
Noémi Ványi 82ac634070 make port configurable in MySQL engine
Closes #3117
2022-01-11 22:49:53 +01:00
Dario Nuevo 8f07442fb6
feature: new engine xpath_flex (#3119) 2022-01-11 22:44:19 +01:00
Finn 5dc886136b
[fix] Qwant: Remove extra q from URL (#3091)
Fixes #3090
2022-01-07 21:41:39 +01:00