Skip to content

Add test files from uchardet #73

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 9, 2019
Merged

Add test files from uchardet #73

merged 1 commit into from
Nov 9, 2019

Conversation

rstm-sf
Copy link
Collaborator

@rstm-sf rstm-sf commented Oct 26, 2019

Add some files from uchardet for batch test.

Others, I either did not understand whether it was worth adding, or they could not be added (such are highlighted in bold)

bd - ??? cirilic
ce - mac-centraleurope = null (not encoding in .net?)
da - iso-8859-1 = But was: "ISO-8859-15"; windows-1252 = ???
de - iso-8859-1, windows-1252 = ??? we
en - why ascii???
es - ???
et - iso-8859-15, windows-1252 = ???
fi - ???
fr - iso-8859-1, iso-8859-15, windows-1252 = ???
ga - ???
he - ???
hr - iso-8859-16 = null; other ???
hu - ???
it - ???
ja - utf-16le = fail; other ???
ko - ??? (but utf-16le)
lt - iso-8859-10 = null; other ???
lv - ???
mt - ???
pl - ???
pt - ???
ro - ???
ru - ??? (but iso-8859-5)
sk - ???
sl - ???
sv - ???
tr - iso-8859-3 ???
vi - viscii = null
zh - ???

@304NotModified 304NotModified self-assigned this Oct 26, 2019
@rstm-sf
Copy link
Collaborator Author

rstm-sf commented Nov 5, 2019

@304NotModified, how is it going?

@304NotModified
Copy link
Member

Hey! Thanks for the reminder. Will try to look at this soon

@304NotModified 304NotModified changed the title Add some files from uchardet for batch test Add test files from uchardet Nov 5, 2019
@304NotModified
Copy link
Member

Others, I either did not understand whether it was worth adding, or they could not be added (such are highlighted in bold)

We need to figure this out before merging this, isn't?

@rstm-sf
Copy link
Collaborator Author

rstm-sf commented Nov 6, 2019

iso-8859-1, iso-8859-10, iso-8859-16, viscii

I did not find here a definition of such encoding

ce - mac-centraleurope = null (not encoding in .net?)

I guess that will be resolved by correction on the x-mac-ce

da - iso-8859-1 = But was: "ISO-8859-15"

Sorry, but I don’t remember why I singled out. I think that it will recover quickly if start over again.

en - why ascii???

Good question. The text of the test file is given for thought.

ja - utf-16le = fail

Create issue #72

Probably the previous ones should also create issues.

@rstm-sf
Copy link
Collaborator Author

rstm-sf commented Nov 6, 2019

@rstm-sf
Copy link
Collaborator Author

rstm-sf commented Nov 6, 2019

hmm, but there is iso-8859-1

@rstm-sf
Copy link
Collaborator Author

rstm-sf commented Nov 6, 2019

iso-8859-1

I overlooked something and in fact it exists by default.

Then

da - iso-8859-1 = But was: "ISO-8859-15"

Sounds like a mistake.

@rstm-sf
Copy link
Collaborator Author

rstm-sf commented Nov 6, 2019

Probably the previous ones should also create issues.

Done: #75, #76, #77

@304NotModified 304NotModified merged commit e79600a into CharsetDetector:master Nov 9, 2019
@304NotModified
Copy link
Member

Thanks! Merge this as I think it a nice improvement :)

@304NotModified 304NotModified added this to the 2.2.1 milestone Nov 9, 2019
@rstm-sf rstm-sf deleted the infra/add_test_file_from_uchardet branch January 12, 2020 08:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants