|
5 years ago | |
---|---|---|
app | 6 years ago | |
bin | 6 years ago | |
src | 5 years ago | |
.gitignore | 6 years ago | |
COPYING | 6 years ago | |
CoC.md | 6 years ago | |
Makefile | 6 years ago | |
README.md | 6 years ago | |
Setup.hs | 6 years ago | |
Vagrantfile | 6 years ago | |
package.yaml | 6 years ago | |
sample.json | 6 years ago | |
stack.yaml | 6 years ago |
README.md
Json Search Server
v0.1.0
This application provides fuzzy search server for data stored in JSON format.
The purpose of the development of this application was to use it with the website written on Jekyll.
License
Copyright (C) 2019, Maxim Lihachev, <envrm@yandex.ru>
This program is free software: you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the Free
Software Foundation, version 3.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
Algorithm
A logic of this application is pretty simple:
- All fields in the file are divided into separate words.
- Each unique word in the search string is compared to the words in the file.
- If the match is exact, the highest score is awarded.
- If there is an inaccurate coincidence, the Levenshtein distance is calculated and the word score is formed on its basis.
- Results are sorted in descending order of accuracy.
How to use
Build
$ make build
or
$ make static
There is Vagrant file configured for starting Centos 7. This VM is used for making statically-linked binary file for target server.
host:~$ vagrant up
host:~$ vagrant ssh
vm:~$ cd /vagrant
vm:~$ make static
Static executable file is in bin/ directory.
Run
$ make exec
json-search-server v0.1.0: seach server for Json
json-search-server [OPTIONS]
Common flags:
-p --port=3000 Search server port
-l --logs=apache apache | simple | json | disable | full
-c --cached Store Json data into memory
-j -f --json=data.json --file Json file name
-? --help Display help message
-V --version Print version information
--numeric-version Print just the version number
Install
$ make install
Requests
Get Server Settings
$ curl -sq http://localhost:3000/info
{
"cached": false,
"logs": "full",
"file": "sample.json",
"port": 3000
}
Health Check
$ curl -sq http://localhost:3000/health
{
"status": "ok",
"message": "2019-06-19 09:03:26.402385 UTC"
}
$ chmod a-r sample.json
$ curl -sq http://localhost:3000/health
{
"status": "fail",
"message": "2019-06-19 09:05:32.950256 UTC sample.json: openFile: permission denied (Permission denied)"
}
Search
$ curl -sq http://localhost:3000/search/article
$ curl -sq http://localhost:3000/search?query=article
[
[
{
"url": "/tags/foo.html",
"authors": "Author I, Author II",
"content": "This is article about Foo and Bar.",
"year": "1990",
"title": "Page one"
},
1,
-100
]
]
Logs
There are few formats for log messages. It is possible to disable logs completely passing argument --log disable
.
--log apache (default)
127.0.0.1 - - [19/Jun/2019:12:14:16 +0300] "GET /info HTTP/1.1" 200 - "" "curl/7.54.0"
127.0.0.1 - - [19/Jun/2019:12:14:18 +0300] "GET /health HTTP/1.1" 200 - "" "curl/7.54.0"
--log simple
GET /health
Accept: */*
Status: 200 OK 0.00003s
GET /info
Accept: */*
Status: 200 OK 0.000018s
--log json
{"time":"19/Jun/2019:12:20:06 +0300","response":{"status":200,"size":null,"body":null},"request":{"httpVersion":"1.1","path":"/health","size":0,"body":"","durationMs":7.0e-2,"remoteHost":{"hostAddress":"127.0.0.1","port":63436},"headers":[["Host","localhost:3000"],["User-Agent","curl/7.54.0"],["Accept","*/*"]],"queryString":[],"method":"GET"}}
{"time":"19/Jun/2019:12:20:07 +0300","response":{"status":200,"size":null,"body":null},"request":{"httpVersion":"1.1","path":"/info","size":0,"body":"","durationMs":4.0e-2,"remoteHost":{"hostAddress":"127.0.0.1","port":63438},"headers":[["Host","localhost:3000"],["User-Agent","curl/7.54.0"],["Accept","*/*"]],"queryString":[],"method":"GET"}}
Or pretty-printed:
{
"time": "19/Jun/2019:12:20:06 +0300",
"response": {
"status": 200,
"size": null,
"body": null
},
"request": {
"httpVersion": "1.1",
"path": "/health",
"size": 0,
"body": "",
"durationMs": 0.07,
"remoteHost": {
"hostAddress": "127.0.0.1",
"port": 63436
},
"headers": [
[
"Host",
"localhost:3000"
],
[
"User-Agent",
"curl/7.54.0"
],
[
"Accept",
"*/*"
]
],
"queryString": [],
"method": "GET"
}
}
--log full
"~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"
"2019-06-19 09:05:22.226024 UTC"
"------------------------------------------------------------------------------------------------------------------------"
"Query: /health"
"------------------------------------------------------------------------------------------------------------------------"
Request
{ requestMethod = "GET"
, httpVersion = HTTP/1.1
, rawPathInfo = "/health"
, rawQueryString = ""
, requestHeaders =
[
( "Host"
, "localhost:3000"
)
,
( "User-Agent"
, "curl/7.54.0"
)
,
( "Accept"
, "*/*"
)
]
, isSecure = False
, remoteHost = 127.0.0.1:63212
, pathInfo = [ "health" ]
, queryString = []
, requestBody = <IO ByteString>
, vault = <Vault>
, requestBodyLength = KnownLength 0
, requestHeaderHost = Just "localhost:3000"
, requestHeaderRange = Nothing
}
Caching
It is possible to store all data in RAM and use it even if the file is not readable.
To do this, specify the argument --cached
.
NGINX
For using this service behind nginx web server might be used following configuration:
upstream search_backend {
server 127.0.0.1:3000;
}
server {
listen 8080;
server_name search.server www.search.server;
# ...
location / {
add_header 'Access-Control-Allow-Origin' "$http_origin";
add_header 'Access-Control-Allow-Methods' 'GET, POST';
add_header 'Access-Control-Allow-Credentials' 'true';
add_header 'Access-Control-Allow-Headers' 'User-Agent,Keep-Alive,Content-Type';
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Host $remote_addr;
proxy_pass http://search_backend$uri?$args;
}
# This prevents intruders from obtaining information
# about the internal structure of the server.
location /info {
proxy_pass http://search_backend/health;
}
}
Jekyll
Creating a Json file using jekyll will look like this:
---
layout: none
search: none
---
{% assign all_pages = site.pages | where_exp: 'p', 'p.search != "none"' | where_exp: 'p', 'p.layout != "none"' %}
[
{% for p in all_pages %}
{% capture all_authors %}{{ p.authors | join: ',' }}, {{ p.translators | join: ',' }}, {{ p.editors | join: ',' }}{% endcapture %}
{
"title": "{{ p.title | split: '<br />' | join: ' ' | xml_escape }}{% if p.tag %} «{{ p.tag }}» {% endif %}",
"authors": "{{ p.authors | join: ',' }}",
"persons": "{{ all_authors }}",
"content": {{ p.content | strip_html | jsonify }},
"tags": "{{ p.tags | join: ', ' }}",
"year": "{{ p.year }}",
"url": "{{ p.url | xml_escape }}"
}
{% unless forloop.last %},{% endunless %}
{% endfor %}
]