Skip to content

Race when running multiple instances of tika.parser.from_file()? #431

Open
@ember91

Description

@ember91

Hi!

Is there a race when running multiple tika.parser.from_file() in parallel using Python multiprocessing? It seems to me that if I run from_file it will first download the jar file and then start the java subprocess. If something else runs from_file after the first process starts downloading the file but before the port comes up weird things may happen. Such as double download of the tika-server.jar or double subprocess startup. Is this analysis right?

Although I'm reading #337 and there it looks like it will work.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions