Skip to content

Arabic sentence split on the Arabic comma #113

Open
@ymoslem

Description

@ymoslem

Describe the bug
Arabic sentence split on the Arabic comma.

To Reproduce
Steps to reproduce the behavior:

import pysbd
text = "هذه تجربة، للغة العربية"
seg = pysbd.Segmenter(language="ar", clean=True)
>>> print(seg.segment(text))

Output: ['هذه تجربة،', 'للغة العربية']

Expected behavior
The text should not be split on the Arabic comma.
Expected output: ['هذه تجربة، للغة العربية']

Additional context
I locally fixed it by modifying the file: pysbd/lang/arabic.py, deleting ، from SENTENCE_BOUNDARY_REGEX.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions