ORC-2011: [C++] Fix Timezone to support legacy US TimeZone identifiers#2422
Closed
dongjoon-hyun wants to merge 1 commit into
Closed
ORC-2011: [C++] Fix Timezone to support legacy US TimeZone identifiers#2422dongjoon-hyun wants to merge 1 commit into
Timezone to support legacy US TimeZone identifiers#2422dongjoon-hyun wants to merge 1 commit into
Conversation
Timezone to support legacy US TimeZone identifiesTimezone to support legacy US TimeZone identifiers
218877a to
50a4ea9
Compare
50a4ea9 to
eed7462
Compare
eed7462 to
c32b470
Compare
Timezone to support legacy US TimeZone identifiersTimezone to support legacy US TimeZone identifiers
Member
Author
|
Could you review this when you have some time, @wgtmac ? |
dongjoon-hyun
commented
Sep 27, 2025
| return *(itr->second).get(); | ||
| } | ||
| timezoneCache[filename] = std::make_shared<LazyTimezone>(filename); | ||
| auto it = TZ_ALIASES.find(zone); |
Member
Author
There was a problem hiding this comment.
This overhead occurs once per zone because we fill timezoneCache only one time and reuse it forever.
Member
Author
|
I hope we can deliver this via Apache ORC 2.2.1 first for |
dongjoon-hyun
added a commit
that referenced
this pull request
Sep 27, 2025
…fiers ### What changes were proposed in this pull request? This PR aims to fix `Timezone` to support legacy `US` TimeZone identifiers. ### Why are the changes needed? Since `Ubuntu 24.04` and `Debian 13` doesn't provide old `/usr/share/zoneinfo/US/*` files, ORC C++ library fails with the following error by default. It's misleading because both recent `IANA timezone database` and `TZDIR` cannot resolve this issue. We had better provide a workaround via aliases. > C++ exception with description "Time zone file /usr/share/zoneinfo/US/Pacific does not exist. > Please install IANA time zone database and set TZDIR env." thrown in the test body. Although there are many legacy timezone identifies, this PR aims to focus on `US` issues. For the rest of the code, we can handle it later based on the usage. - https://data.iana.org/time-zones/tzdb/backward ### How was this patch tested? Pass the CIs and manually run a docker test without these lines. https://github.com/apache/orc/blob/fbea8e016699ad8e7b318f5c793b4e48fe85af57/docker/ubuntu24/Dockerfile#L58 https://github.com/apache/orc/blob/fbea8e016699ad8e7b318f5c793b4e48fe85af57/docker/debian13/Dockerfile#L40 I verified locally with the revised `Debian 13` image. ``` $ docker run -it --rm apache/orc-dev:debian13 ls -al /usr/share/zoneinfo/US ls: cannot access '/usr/share/zoneinfo/US': No such file or directory $ ./run-one.sh local x debian13 Started local run for ORC-2011 on debian13 at Fri Sep 26 21:54:25 PDT 2025 -- The C compiler identification is GNU 14.2.0 -- The CXX compiler identification is GNU 14.2.0 ... Test project /root/build Start 1: orc-test 1/9 Test #1: orc-test ......................... Passed 7.24 sec Start 2: java-test 2/9 Test #2: java-test ........................ Passed 110.33 sec Start 3: java-examples-test 3/9 Test #3: java-examples-test ............... Passed 0.37 sec Start 4: java-tools-test 4/9 Test #4: java-tools-test .................. Passed 0.06 sec Start 5: java-bench-gen-test 5/9 Test #5: java-bench-gen-test .............. Passed 0.71 sec Start 6: java-bench-scan-test 6/9 Test #6: java-bench-scan-test ............. Passed 0.66 sec Start 7: java-bench-hive-test 7/9 Test #7: java-bench-hive-test ............. Passed 11.14 sec Start 8: java-bench-spark-test 8/9 Test #8: java-bench-spark-test ............ Passed 214.61 sec Start 9: tool-test 9/9 Test #9: tool-test ........................ Passed 5.00 sec 100% tests passed, 0 tests failed out of 9 Total Test time (real) = 350.16 sec Built target test-out Finished debian13 at Fri Sep 26 22:06:39 PDT 2025 ``` Please note that the test coverage should be added separately. In other words, the docker images should be updated **selectively and gradually** after this PR because the images are shared among multiple ORC branches. Since `Debian 13` is added newly for `main` and `branch-2.2` only, I'm planning to update the following after merging this PR to have a test coverage for this feature. https://github.com/apache/orc/blob/fbea8e016699ad8e7b318f5c793b4e48fe85af57/docker/debian13/Dockerfile#L40 ### Was this patch authored or co-authored using generative AI tooling? No. Closes #2422 from dongjoon-hyun/ORC-2011. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 3c89afe) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Member
Author
|
Merged to main/2.2. |
Member
|
Sorry for missing this and thanks for fixing it! @dongjoon-hyun |
Member
Author
|
Thank you so much for reviewing this, @wgtmac ! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR aims to fix
Timezoneto support legacyUSTimeZone identifiers.Why are the changes needed?
Since
Ubuntu 24.04andDebian 13doesn't provide old/usr/share/zoneinfo/US/*files, ORC C++ library fails with the following error by default. It's misleading because both recentIANA timezone databaseandTZDIRcannot resolve this issue. We had better provide a workaround via aliases.Although there are many legacy timezone identifies, this PR aims to focus on
USissues. For the rest of the code, we can handle it later based on the usage.How was this patch tested?
Pass the CIs and manually run a docker test without these lines.
orc/docker/ubuntu24/Dockerfile
Line 58 in fbea8e0
orc/docker/debian13/Dockerfile
Line 40 in fbea8e0
I verified locally with the revised
Debian 13image.Please note that the test coverage should be added separately. In other words, the docker images should be updated selectively and gradually after this PR because the images are shared among multiple ORC branches. Since
Debian 13is added newly formainandbranch-2.2only, I'm planning to update the following after merging this PR to have a test coverage for this feature.orc/docker/debian13/Dockerfile
Line 40 in fbea8e0
Was this patch authored or co-authored using generative AI tooling?
No.