option has_header true is ignored

I tried to run a simple example with CSV file that has headers. 

```csv
name,age
Alice,29
Bob,31
```

So, I have created external table as following: 

```java
      context
          .sql("CREATE EXTERNAL TABLE test_table (name VARCHAR, age INT) STORED AS CSV LOCATION '/tmp/test/test.csv' OPTIONS ('has_header' 'true');")
          .thenComposeAsync(df -> df.collect(allocator))
          .join();

```

... and then executed query:

```java
      context.sql("select * from test_table").thenComposeAsync(DataFrame::show).join();
```

As the result I got the following exception:

```
Exception in thread "main" java.util.concurrent.CompletionException: java.lang.RuntimeException: Arrow error: Parser error: Error while parsing value age for column 1 at line 0
	at java.base/java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:368)
	at java.base/java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:377)
	at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1152)
	at java.base/java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:483)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:387)
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1312)
	at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1843)
	at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1808)
	at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:188)
Caused by: java.lang.RuntimeException: Arrow error: Parser error: Error while parsing value age for column 1 at line 0
	at org.apache.arrow.datafusion.DefaultDataFrame$RuntimeExceptionCallback.accept(DefaultDataFrame.java:127)
	at org.apache.arrow.datafusion.DefaultDataFrame$RuntimeExceptionCallback.accept(DefaultDataFrame.java:117)
	at org.apache.arrow.datafusion.DataFrames.showDataframe(Native Method)
	at org.apache.arrow.datafusion.DefaultDataFrame.show(DefaultDataFrame.java:70)
	at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1150)
	... 6 more
```

I have also implemented my own "show()" method:

```java
  private static void show(ArrowReader reader) {
    try {
      VectorSchemaRoot root = reader.getVectorSchemaRoot();
      System.out.println(root.getSchema().getFields());
      while (reader.loadNextBatch()) {
        int n = root.getFieldVectors().size();
        System.out.println(root.getFieldVectors().stream().map(v -> v.getField().getName() + ":" + v.getField().getFieldType().getType()).collect(Collectors.joining("|")));
        int rows =  root.getRowCount();
        for (int r = 0; r < rows; r++) {
          for (int i = 0; i < n; i++) {
            FieldVector nameVector = root.getVector(i);
            System.out.print(nameVector.getObject(r) + " | ");
          }
          System.out.println();
        }
      }
      reader.close();
    } catch (IOException e) {
      logger.warn("got IO Exception", e);
    }
  }

```

and used it as following:


```java
      context
          .sql("select * from test_table")
          .thenComposeAsync(df -> df.collect(allocator))
          .thenAccept(ExampleMain::show)
          .join();
```


In this case the error message looks like this:

```
thread '<unnamed>' panicked at src/dataframe.rs:29:14:
failed to collect dataframe: ArrowError(ParseError("Error while parsing value age for column 1 at line 0"))
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
fatal runtime error: failed to initiate panic, error 5

```

Both examples work if CSV file does not have header or if `age` column is defined as `VARCHAR`. In this case the code works but it reads header as a first line of the data. Attempt to use `formant.has_header`  instead of `has_header` does not help. 

Note that the same scenario works correctly for me with `datafusion-cli`. It looks that the `OPTIONS ('has_header' 'true')` is just ignored when running with datafusion-java. It is strange because as far as I can see datafusion-java is just a thin JNI wrapper over the native datafusion API. 

I am running on Ubunty and using java 21 (if it matters). 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

option has_header true is ignored #146

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

option has_header true is ignored #146

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions