Commit Graph

12 Commits

Author SHA1 Message Date
Ondřej Synáček ae700330b9 fix: do not attempt to parse non-existent FB pages
When user enters something invalid (like number 123), it wuold still hit
FB webpage and return HTML. I do simple detection by looking at title
when it attempts to parse the DOM directly. When it contains "Content
Not Found", it is skipped.

Non-existent web pages cannot be parsed by using LDJSON parser and
looking at null data.
2020-12-26 21:03:03 +01:00
Ondřej Synáček cfd939a668 normalize URL when parsing event number from HTML file 2020-07-23 17:23:04 +02:00
Ondřej Synáček a661e54524 fix invalid function signature for event data extraction
Additionally fix specs
2020-07-19 08:14:35 +02:00
Ondřej Synáček f577fb6385 implement same parsing logic on server and frontend
Server now downloads the HTML file via new endpoint but the parsing logic now happens in
the browser. The reason for this is to have a way same code for both
environments.

If the JavaScript is disabled, it's still possible to call the previous
endpoint and download the file from the server.
2020-07-19 08:14:35 +02:00
Ondřej Synáček 9da4c33ffd remove URL requirement for DOM parser 2020-07-19 08:14:34 +02:00
Ondřej Synáček 456eaa1fbc add specs for ICS retriever 2020-07-19 08:14:29 +02:00
Ondřej Synáček 8458ae0b69 add minimum duration for DOM parser 2020-07-19 08:14:23 +02:00
Ondřej Synáček db0ee12c4f add specs for parser utils 2020-07-15 20:54:24 +02:00
Ondřej Synáček 48e012bbc0 add specs for LD JSON parser 2020-07-15 20:54:24 +02:00
Ondřej Synáček 094f0bf242 add unit tests for DOM parser 2020-07-15 20:54:24 +02:00
Ondřej Synáček 5e08b56ef9 add util specs 2020-07-15 20:54:24 +02:00
Ondřej Synáček d9212f707b add specs for crawler service 2020-07-15 20:54:24 +02:00