Skip to content

Commit 6c37430

Browse files
author
Hämäläinen, Mika K
committed
2 parents 599ed85 + 6d5b39d commit 6c37430

21 files changed

+113495
-1038635
lines changed

.gitignore

+1
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,3 @@
11
*.*~
22
*.pyc
3+
.idea/*

.idea/.name

-1
This file was deleted.

.idea/misc.xml

-14
This file was deleted.

.idea/modules.xml

-8
This file was deleted.

.idea/syntaxmaker.iml

-8
This file was deleted.

.idea/vcs.xml

-6
This file was deleted.

.idea/workspace.xml

-507
This file was deleted.

LICENSE.txt

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
Copyright 2015 Mika Hämäläinen, University of Helsinki
1+
Copyright 2015-2017 Mika Hämäläinen
22

33
Licensed under the Apache License, Version 2.0 (the "License");
44
you may not use this file except in compliance with the License.

README.md

+19-3
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,20 @@
1-
# Syntaxmaker
2-
A python NLG tool for Finnish
1+
Syntax maker
2+
=======
3+
The tool NLG tool for Finnish by [Mika Hämäläinen](https://mikakalevi.com)
34

4-
For readme, see the [Wiki](https://github.com/DiscoveryGroup/syntaxmaker/wiki)
5+
Syntax maker is the natural language generation tool for generating syntactically correct sentences in Finnish automatically. The tool is especially useful in the case of Finnish which has such a high diversity in its morphosyntax. All you need to know are the lemmas and their parts-of-speech and syntax maker will take care of the rest.
6+
7+
For instance, just throw in words `rantaleijona`, `uneksia`, `korkea` and `aalto` and you will get `rantaleijonat uneksivat korkeista aalloista`. So you will get the morphology right automatically! Don't believe me? [Just take a look at this tutorial to find out how.](https://github.com/mikahama/syntaxmaker/wiki/Creating-a-sentence,-the-basics)
8+
9+
10+
# Requirements
11+
1. This tool requires Omorfi, you can download the correct binary version from [http://mikakalevi.com/omorfi](http://mikakalevi.com/omorfi)
12+
2. HFST `pip install hfst` for more instructions, see my post about [HFST and Python](https://mikalikes.men/using-hfst-on-python/).
13+
14+
# Installing
15+
You do `pip install syntaxmaker` to install this library.
16+
After installing it, go to [Creating a sentence, the basics](https://github.com/DiscoveryGroup/syntaxmaker/wiki/Creating-a-sentence,-the-basics) for a quick start guide.
17+
18+
# More information?
19+
20+
Just go ahead and [take a look at the wiki](https://github.com/mikahama/syntaxmaker/wiki) or my [blog post about Syntax maker](https://mikalikes.men/create-finnish-sentences-computationally-in-python-nlg/).

adposition_tool.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# -*- coding: utf-8 -*-
2-
__author__ = 'mikahama'
2+
__author__ = 'Mika Hämäläinen'
33
import csv
44
import random
55
import os
+17-5
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,26 @@
1-
An NLG tool for Finnish
2-
=======================
1+
Syntax Maker
2+
=============
3+
The tool NLG tool for Finnish by `Mika Hämäläinen <https://mikakalevi.com>`_
34

4-
syntax_maker is a script that tries to create syntactically correct Finnish sentences based on a hand written grammar and automatically learned information about the language.
5+
6+
Syntax maker is the natural language generation tool for generating syntactically correct sentences in Finnish automatically. The tool is especially useful in the case of Finnish which has such a high diversity in its morphosyntax. All you need to know are the lemmas and their parts-of-speech and syntax maker will take care of the rest.
7+
8+
For instance, just throw in words rantaleijona, uneksia, korkea and aalto and you will get rantaleijonat uneksivat korkeista aalloista. So you will get the morphology right automatically! Don't believe me? `Just take a look at this tutorial to find out how. <https://github.com/mikahama/syntaxmaker/wiki/Creating-a-sentence,-the-basics>`_
9+
10+
**Update:** Python 2 and Python 3 are both now supported!
11+
12+
============
13+
Installation
14+
============
515

616
**NOTE 1:** This tool requires Omorfi, you can download the correct binary version from http://mikakalevi.com/omorfi
717

8-
**NOTE 2:** You need to install libhfst and hfst from https://kitwiki.csc.fi/twiki/bin/view/KitWiki/HfstCommandLineTools
18+
**NOTE 2:** If you have any issues with installing HFST, see `an HSFT tutorial
19+
<https://mikalikes.men/using-hfst-on-python/>`_.
920

1021
===========================
1122
How to use
1223
===========================
1324

14-
Start of by following this tutorial: https://github.com/DiscoveryGroup/syntaxmaker/wiki/Creating-a-sentence,-the-basics
25+
Start of by following this tutorial: https://github.com/mikahama/syntaxmaker/wiki/Creating-a-sentence,-the-basics . Or you can go ahead and `take a look at the wiki <https://github.com/mikahama/syntaxmaker/wiki>`_
26+
or my `blog post about Syntax maker <https://mikalikes.men/create-finnish-sentences-computationally-in-python-nlg/>`_

backup_of_build_scripts/MANIFEST.in

+2-2
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,6 @@ include DESCRIPTION.rst
55

66
# If using Python 2.6 or less, then have to include package data, even though
77
# it's already declared in setup.py
8-
include verb_valences_new.bin
8+
include verb_valences_new.json
99
include data/postpositions.csv
10-
include data/prepositions.csv
10+
include data/prepositions.csv

backup_of_build_scripts/setup.py

+6-8
Original file line numberDiff line numberDiff line change
@@ -23,16 +23,16 @@
2323
# Versions should comply with PEP440. For a discussion on single-sourcing
2424
# the version across setup.py and the project code, see
2525
# https://packaging.python.org/en/latest/single_source_version.html
26-
version='1.0.2',
26+
version='1.1.0',
2727

2828
description='An NLG tool for Finnish',
2929
long_description=long_description,
3030

3131
# The project's main homepage.
32-
url='https://github.com/DiscoveryGroup/syntaxmaker/',
32+
url='https://mikakalevi.com/nlp/syntax-maker/',
3333

3434
# Author details
35-
author='Mika Hämäläinen, University of Helsinki',
35+
author='Mika Hämäläinen, Dept. of Modern Languages, University of Helsinki',
3636
author_email='[email protected]',
3737

3838
# Choose your license
@@ -51,12 +51,10 @@
5151
'Topic :: Text Processing',
5252
"Natural Language :: Finnish",
5353

54-
# Pick your license as you wish (should match "license" above)
55-
'License :: OSI Approved :: Apache Software License',
56-
5754
# Specify the Python versions you support here. In particular, ensure
5855
# that you indicate whether you support Python 2, Python 3 or both.
5956
'Programming Language :: Python :: 2',
57+
'Programming Language :: Python :: 3',
6058
'Programming Language :: Python :: 2.6',
6159
'Programming Language :: Python :: 2.7',
6260

@@ -74,7 +72,7 @@
7472
# your project is installed. For an analysis of "install_requires" vs pip's
7573
# requirements files see:
7674
# https://packaging.python.org/en/latest/requirements.html
77-
install_requires=[],
75+
install_requires=["hfst"],
7876

7977
# List additional groups of dependencies here (e.g. development
8078
# dependencies). You can install these using the following syntax,
@@ -86,7 +84,7 @@
8684
# installed, specify them here. If using Python 2.6 or less, then these
8785
# have to be included in MANIFEST.in as well.
8886
package_data={
89-
'syntaxmaker': ['verb_valences_new.bin', 'data/*.csv', '*.json'],
87+
'syntaxmaker': ['verb_valences_new.json', 'data/*.csv', '*.json'],
9088
},
9189

9290
# Although 'package_data' is the preferred approach, in some case you may

head.py

+2
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
#encoding: utf-8
2+
__author__ = 'Mika Hämäläinen'
13
import inflector
24

35
class Head:

inflector.py

+12-4
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,18 @@
11
# -*- coding: utf-8 -*-
2+
__author__ = 'Mika Hämäläinen'
23
import hfst
34
import os
45
import pronoun_tool
5-
from itertools import ifilterfalse as ffilter
6+
import sys
67

8+
if (sys.version_info > (3, 0)):
9+
# Python 3
10+
new_python = True
11+
from itertools import filterfalse as ffilter
12+
else:
13+
# Python 2
14+
new_python = False
15+
from itertools import ifilterfalse as ffilter
716

817
datadir = "/usr/local/share/hfst/fi/"
918
if os.name == 'nt':
@@ -25,10 +34,10 @@
2534

2635
def inflect(word, pos, args):
2736
for el in args:
28-
if type(args[el]) is unicode:
37+
if not new_python and type(args[el]) is unicode:
2938
args[el] = args[el].encode('utf-8')
3039

31-
if type(word) is unicode:
40+
if not new_python and type(word) is unicode:
3241
word = word.encode('utf-8')
3342
word = word.replace("|", "")
3443
if len(args) == 0:
@@ -190,7 +199,6 @@ def standard_nominal_inflection(noun, case, number):
190199
return noun
191200

192201
def new_generator(analysis):
193-
print analysis
194202
results = synthetiser.lookup(analysis)
195203
if len(results) != 0:
196204
word = results[0][0]

phrase.py

+10-3
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,17 @@
1-
__author__ = 'mikahama'
1+
#encoding: utf-8
2+
__author__ = 'Mika Hämäläinen'
23
from head import Head
34
import copy
4-
import re
5+
import re, sys
56

67
class Phrase:
78
def __init__(self, head, structure, morphology={}):
9+
if (sys.version_info > (3, 0)):
10+
# Python 3
11+
self.new_python = True
12+
else:
13+
# Python 2
14+
self.new_python = False
815
self.parent = None
916
self.head = Head(head, structure["head"])
1017
self.components = copy.deepcopy(structure["components"])
@@ -46,7 +53,7 @@ def to_string(self, received_governance = {}):
4653
string_representation = string_representation + " " + head_word
4754
else:
4855
phrase = self.components[item]
49-
if type(phrase) is str or type(phrase) is unicode:
56+
if type(phrase) is str or (not self.new_python and type(phrase) is unicode):
5057
#Data not set
5158
pass
5259
else:

pronoun_tool.py

+1-4
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,5 @@
11
# -*- coding: utf-8 -*-
2-
__author__ = 'mikahama'
3-
import pickle
4-
import random
5-
import os
2+
__author__ = 'mika hämäläinen'
63

74

85
pronouns = {"SG1" : "minä", "SG2" : "sinä", "SG3" : "se", "PL1" : "me", "PL2": "te", "PL3": "ne"}

syntax_maker.py

+9-6
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
# -*- coding: utf-8 -*-
2+
__author__ = 'Mika Hämäläinen'
23
import verb_valence
34
from phrase import Phrase
45
import json
@@ -246,14 +247,14 @@ def create_adposition_phrase(adposition, np):
246247
set_vp_mood_and_tense(vp, mood="POTN")
247248
248249
turn_vp_into_question(vp)
249-
print vp.to_string()
250+
print(vp.to_string())
250251
251252
np = create_phrase("NP", "kissa")
252253
pp = create_adposition_phrase("ilman", np)
253-
print pp.to_string()
254-
"""
254+
print(pp.to_string())
255+
256+
255257
256-
"""
257258
np1 = create_phrase("NP", "mies")
258259
relp = create_verb_pharse("katsoa")
259260
ppp = create_phrase("NP", "orava")
@@ -272,5 +273,7 @@ def create_adposition_phrase(adposition, np):
272273
273274
add_advlp_to_vp(vep, pp)
274275
275-
print vep
276-
"""
276+
print(vep)
277+
278+
279+
"""

verb_valence.py

+4-3
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
11
# -*- coding: utf-8 -*-
2-
import pickle
2+
__author__ = 'Mika Hämäläinen'
33
import os
44
import random
55
import json
6+
import codecs
67

78
valences = {}
89
direct_cases = {"Gen", "Par", "Ela", "Ill"}
@@ -14,8 +15,8 @@
1415

1516
def load_valences_from_bin():
1617
global valences
17-
valence_path = os.path.join(os.path.dirname(os.path.realpath(__file__)), 'verb_valences_new.bin')
18-
valences = pickle.load(open(valence_path, "rb"))
18+
valence_path = os.path.join(os.path.dirname(os.path.realpath(__file__)), 'verb_valences_new.json')
19+
valences = json.load(codecs.open(valence_path, "r", encoding="utf-8"))
1920

2021

2122
load_valences_from_bin()

0 commit comments

Comments
 (0)