Skip to content

Releases: databrickslabs/ucx

v0.59.0

06 May 04:45
355b45a
Compare
Choose a tag to compare
  • Adds requirement for matching account groups to be created before assessment to the docs (#4017). The account group setup requirements have been clarified to ensure successful assessment and group migration workflows, mandating that account groups matching workspace local groups are created beforehand, which can be achieved manually or programmatically via various methods. The assessment workflow has been enhanced to retrieve workspace assets and securable objects from the Hive metastore for compatibility assessment with UC, storing the results in the inventory database for further analysis. Additionally, the documentation now stresses the necessity of running the validate-groups-membership command prior to initiating the group migration workflow, and recommends running the create-account-groups command beforehand if the required account groups do not already exist, to guarantee a seamless execution of the assessment and migration processes.
  • Fixed Service Principal instructions for installation (#3967). The installation requirements for UCX have been updated to reflect changes in Service Principal support, where it is no longer supported for workspace installations, but may be supported for account-level installations. As a result, account-level identity setup now requires connection via Service Principal with Account Admin and Workspace Admin privileges in all workspaces. All other installation requirements remain unchanged, including the need for a Databricks Premium or Enterprise workspace, network access to the Databricks Workspace and the Internet, a created Unity Catalog Metastore, and a PRO or Serverless SQL Warehouse for rendering reports. Additionally, users with external Hive Metastores, such as AWS Glue, must consult the relevant guide for specific instructions to ensure proper setup.
  • Fixed migrate tables when default catalog is set (#4012). The handling of the default catalog in the Hive metastore has been enhanced to ensure correct behavior when the default catalog is set. Specifically, the DESCRIBE SCHEMA EXTENDED and SHOW TBLPROPERTIES queries have been updated to include the hive_metastore prefix when fetching database descriptions and constructing table identifiers, respectively, unless the table is located in a mount point, in which case the delta prefix is used. This change addresses a previously reported issue with migrating tables when the default catalog is set, ensuring that table properties are correctly fetched and tables are properly identified. The update has been applied to multiple test cases, including those for skipping tables, upgraded tables, and mapping tables, to guarantee correct execution of queries with the default catalog name, which is essential when the default catalog is set to hive_metastore.
  • Limit crawl workflows task in assessment to workflows that ran in the last 30 days (#3963). The JobInfo class has been enhanced with a new last_run attribute to store the timestamp of the last pipeline execution, allowing for better monitoring and assessment. The from_job method has been updated to initialize this attribute consistently. Additionally, the assess_workflows method now filters workflows to only include those that have run within the last 30 days, achieved through the introduction of a last_run_days parameter in the refresh_report method. This parameter enables time-based filtering of job runs, and a new inner function lint_job_limited handles the filtering logic. The lint_job method has also been updated to accept the last_run_days parameter and check if a job has run within the specified time frame. Furthermore, a new test method test_workflow_linter_refresh_report_time_bound has been added to verify the correct functioning of the WorkflowLinter class when limited to recent workflow runs, ensuring that it produces the expected results and writes to the correct tables.
  • Pause migration progress workflow schedule (#3995). The migration progress workflow schedule is now paused by default, with its pause_status set to PAUSED, to prevent automatic execution and potential failures due to missing prerequisites. This change is driven by the experimental nature of the workflow, which may fail if a UCX catalog has not been created by the customer. To ensure successful execution, users are advised to unpause the workflow after running the create-ucx-catalog command, allowing them to control when the workflow runs and verify that necessary prerequisites are in place.
  • Warns instead of an error while finding an acc group in workspace (#4016). The behavior of the account group reflection functionality has been updated to handle duplicate groups more robustly. When encountering a group that already exists in the workspace, the function now logs a warning instead of an error, allowing it to continue executing uninterrupted. This change accommodates the introduction of nested account groups from workspace local groups, which can lead to groups being present in the workspace that are also being migrated. The warning message clearly indicates that the group is being skipped due to its existing presence in the workspace, providing transparency into the reflection process.

Contributors: @pritishpai, @FastLee

v0.58.0

16 Apr 19:10
bb89679
Compare
Choose a tag to compare
  • Added ability to create account groups from nested ws-local groups (#3818). The create_account_level_groups method has been added, enabling the creation of account level groups from workspace groups. This method retrieves valid workspace groups and recursively creates account level groups for each group, handling nested groups by checking if they already exist and creating them if necessary. The AccountGroupCreationContext dataclass is used to keep track of created, preexisting, and renamed groups. A new test function, test_create_account_level_groups_nested_groups, has been added to the test_account.py file to test the creation of account level groups from nested workspace-local groups. This function checks if the account level groups are created correctly, with the same members and membership as the corresponding workspace-local groups. The ComplexValue class has been modified to include the ref field, which references user objects, enabling the creation of account groups with members identified by their workspace-local user IDs. Integration tests have been added to verify the functionality of these changes.
  • Added error handling and tests for Workflow linter during pipeline fetch (#3819). The recent change to the open-source library introduces error handling and tests for the Workflow linter during pipeline fetch. The _register_pipeline_task method in the "jobs.py" file has been updated to handle cases where the pipeline does not exist, by yielding a DependencyProblem instance with an appropriate error message. A new private method, "_register_pipeline_library", has been introduced to handle the registration of libraries present in the pipeline. Additionally, new unit tests and integration tests have been added to ensure that the Workflow linter properly handles cases where pipelines do not exist, and manual testing has been conducted to verify the feature. Overall, these changes improve the robustness and reliability of the Workflow linter by adding error handling and testing for edge cases during pipeline fetch.
  • Added hyperlinks to tables and order the rows by type, name (#3951). In this release, the Table Types widget has been updated to enhance the user experience. The table names in the widget are now clickable and serve as hyperlinks that redirect users to a specified URL with the table name as the link text and title. The rows in the widget are also reorganized by type and then by name, making it easier for users to locate the required table. Additionally, a new set of encodings has been added for the widget that specifies how fields should be displayed, including a link display type for the name field to indicate that it should be displayed as a hyperlink. These changes were implemented in response to issue #3259. A manually tested flag has been included in the commit, indicating that the changes have been tested, but unit and integration tests have not been added. A screenshot of the changes is also included in the commit.
  • Added links to compute summary widget (#3952). In this release, we have added links to the compute summary widget to enhance navigation and usability. The encodings spec in the spec object now includes overrides for a SQL file, which adds links to the cluster_id and cluster_name fields, opening them in a new tab with the respective cluster's details. Additionally, the finding and creator fields are now displayed as strings. These changes improve the user experience by providing direct access to cluster details from the compute summary widget. The associated issue #3260 has been resolved. Manual testing has confirmed that the changes work as expected.
  • Adds option to install UCX in offline mode (#3959). A new capability has been introduced to install the UCX library in offline mode, enabling software engineers to install UCX in environments with restricted Internet access. This offline installation process can be accomplished by installing UCX on a host with Internet access, zipping the installation, transferring the zip to the target host, and unzipping it. To ensure a successful installation, the Databricks CLI version must be v0.244.0 or higher. Additionally, this commit includes updated documentation detailing the offline installation process. This feature addresses issue #3418, making it easier for software engineers to install UCX in offline environments.
  • Fixed Assessment Excel Exporter (#3962). The open-source library has been updated with several new features to enhance its functionality. Firstly, we have implemented a new sorting algorithm that offers improved performance and flexibility for sorting large datasets. This algorithm includes customizable options for handling ties and can be easily integrated into existing codebases. Additionally, we have added support for asynchronous processing, allowing developers to execute time-consuming tasks in the background while maintaining application responsiveness. This feature includes a new API for managing asynchronous tasks and improved error handling for better reliability. Lastly, we have introduced a new configuration system that simplifies the process of setting up and customizing the library. This system includes a default configuration that covers most use cases and allows for easy overriding of specific settings. These new features are designed to provide developers with more powerful and flexible tools for working with the open-source library.
  • Fixed Assessment Exporter Notebook (#3829). In this commit, the Assessment Exporter Notebook has been updated to improve code maintainability and robustness. The main change is the adjustment of the Lakeview dashboard Assessment Main dashboard path to the new naming format, which is now determined dynamically to avoid hardcoded values. The path format has also been changed from string to Path object format. Additionally, a new method _process_id_columns has been added to process ID columns in the dataset, checking for any column with id in the name and wrapping them in quotes. These changes have been manually tested and improve the accuracy of the exported Excel file and the maintainability of the code, ensuring that the Assessment Main dashboard path is correct and up-to-date and the data is accurately represented in the exported file.
  • TECH DEBT Use right workspace api call for listing credentials (#3957). In this release, we have implemented a change in the list method of the credentials.py file located in the databricks/labs/ucx/aws directory, addressing issue #3571. The list method now utilizes the list_credentials method from the _ws.credentials object instead of the api_client for listing AWS credentials. This modification replaces the previous TODO comment with actual code, thereby improving code quality and reducing technical debt. The list_credentials method is a part of the Databricks workspace API, offering a more accurate and efficient approach to list AWS credentials, resulting in enhanced reliability and performance for the code responsible for managing AWS credentials.
  • [TECHDEBT] Remove unused code for _resolve_dbfs_root in MountCrawler (#3958). In this release, we have made improvements to the MountCrawler class by removing the unused code for the _resolve_dbfs_root method and its dependencies. This method was previously used to resolve the root location of a DBFS, but it has been deprecated in favor of a new API call. The removal of this unnecessary functionality simplifies the codebase and aligns it with our goal of creating a more streamlined and efficient system. Additionally, this release includes a fix for issue #3452. Rest assured that these changes will not affect the current functionality or behavior of the system and are intended to enhance the overall performance and maintainability of the codebase.
  • [Tech Debt] removing notfound if not required in test_install.py (#3826). In this release, we've made improvements to our test suite by removing the redundant notfound function in test_install.py, specifically from 'test_create_database', 'test_open_config', and 'test_save_config_ext_hms'. The notfound function previously raised a NotFound error, which has now been replaced with a more specific error message or behavior. This enhancement simplifies the codebase, reduces technical debt, and addresses issue #2700. Note that no new unit tests were added, but existing tests were updated to account for the removal of 'notfound'.
  • [Tech Debt] standardising the error message for required parameter in cli command (#3827). This release introduces changes to standardize error messages for required parameters in the databricks labs ucx CLI command, addressing tech debt and improving the user experience. Instead of raising a KeyError, the command now returns clear and consistent error messages when required parameters are missing. Specifically, the repair_run function handles the case when the --step parameter is not provided, and the move and alias functions handle missing --from_catalog, `...
Read more

v0.57.0

05 Mar 03:47
d0bcfc5
Compare
Choose a tag to compare
  • Convert UCX job ids to int before passing to JobsCrawler (#3816). In this release, we have addressed issue #3722 and improved the robustness of the open-source library by modifying the jobs_crawler method to handle job IDs more effectively. Previously, job IDs were passed directly to the exclude_job_ids parameter, which could cause issues if they were not integers. To address this problem, we have updated the jobs_crawler method to convert all job IDs to integers using a list comprehension before passing them to the method. This change ensures that only valid integer job IDs are used, thereby enhancing the reliability of the method. The commit includes a manual test to confirm the correct behavior of this modification. In summary, this modification improves the robustness of the code by ensuring that integer job IDs are utilized correctly in the JobsCrawler method.
  • Exclude UCX jobs from crawling (#3733). In this release, we have made modifications to the JobsCrawler and the existing assessment workflow to exclude UCX jobs from crawling, avoiding confusion for users when they appear in assessment reports. This change addresses issues #3656 and #3722, and is a follow-up to previous issue #3732. We have also incorporated updates from pull requests #3767 and #3759 to improve integration tests and linting. Additionally, a retry mechanism has been added to wait for grants to exist before crawling, addressing issue #3758. The changes include the addition of unit and integration tests to ensure the correctness of the modifications. A new exclude_job_ids parameter has been added to the JobsCrawler constructor, which is initialized with the list of UCX job IDs, ensuring that UCX jobs are not included in the assessment report. The _list_jobs method now excludes jobs based on the provided exclude_job_ids and include_job_ids arguments. The _crawl method now uses the _list_jobs method to list the jobs to be crawled. The _assess_jobs method has been updated to take into account the exclusion of specific job IDs. The test_grant_detail file, an integration test for the Hive Metastore grants functionality, has been updated to include a retry mechanism to wait for grants to exist before crawling and to check if the SELECT permission on ANY FILE is present in the grants.
  • Let WorkflowLinter.refresh_report lint jobs from JobsCrawler (#3732). In this release, the WorkflowLinter.refresh_report method has been updated to lint jobs from the JobsCrawler class, ensuring that only jobs within the scope of the crawler are processed. This change resolves issue #3662 and progresses issue #3722. The workflow linting code, the assessment workflow, and the JobsCrawler class have been modified. The JobsCrawler class now includes a snapshot method, which is used in the WorkflowLinter.refresh_report method to retrieve necessary data about jobs. Unit and integration tests have been updated correspondingly, with the integration test for workflows now verifying that all rows returned from a query to the workflow_problems table have a valid path field. The WorkflowLinter constructor now includes an instance of JobsCrawler, allowing for more targeted linting of jobs. The introduction of the JobsCrawler class enables more efficient and precise linting of jobs, improving the overall accuracy of workflow assessment.
  • Let dashboard name adhere to naming convention (#3789). In this release, the naming convention for dashboard names in the ucx library has been enforced, restricting them to alphanumeric characters, hyphens, and underscores. This change replaces any non-conforming characters in existing dashboard names with hyphens or underscores, addressing several issues (#3761 through #3788). A temporary fix has been added to the _create_dashboard method to ensure newly created dashboard names adhere to the new naming convention, indicated by a TODO comment. This release also resolves a test failure in a specific GitHub Actions run and addresses a total of 29 issues. The specifics of the modification made to the databricks labs install ucx command and the changes to existing functionality are not detailed, making it difficult to assess their scope. The commit includes the deletion of a file called 02_0_owner.filter.yml, and all changes have been manually tested. For future reference, it would be helpful to include more information about the changes made, their impact, and the reason for deleting the specified file.
  • Partial revert Let dashboard name adhere to naming convention (#3794). In this release, we have partially reverted a previous change to the migration progress dashboard, reintroducing the owner filter. This change was made in response to feedback from users who found the previous modification to the dashboard less intuitive. The new owner filter has been defined in a new file, '02_0_owner.filter.yml', which includes the title, column name, type, and width of the filter. To ensure proper functionality, this change requires the release of lsql after merging. The change has been thoroughly tested to guarantee its correct operation and to provide the best possible user experience.
  • Partial revert Let dashboard name adhere to naming convention (#3795). In this release, we have partially reversed a previous change that enforced a naming convention for dashboard names, allowing the use of special characters such as spaces and brackets again. The _create_dashboard method in the install.py file and the _name method in the mixins.py file have been updated to reflect this change, affecting the migration progress dashboard. The display_name attribute of the metadata object has been updated to use the original format, which may include special characters. The reference variable has also been updated accordingly. The functions created_job_tasks and created_job have been updated to use the new naming convention when retrieving installation jobs with specific names. These changes have been manually tested and the tests have been verified to work correctly after the reversion. This change is related to issues #3799, #3789, and reverts commit 048bc8f.
  • Put back dashboard names (#3808). In the lsql release v0.16.0, the naming convention for dashboards has been updated to support non-alphanumeric characters in the dashboard names. This change modifies the _create_dashboard function in install.py and the _name method in mixins.py to create dashboard names with a format like [UCX] assessment (Main), which includes parent and child folder names. This update addresses issues reported in tickets #3797 and #3790, and partially reverses previous changes made in commits 4017a25 and 834ef14. The functionality of other methods remains unchanged. With this release, the created_job_tasks and created_job functions now accept dashboard names with non-alphanumeric characters as input.
  • Updated databricks-labs-lsql requirement from <0.15,>=0.14.0 to >=0.14.0,<0.17 (#3801). In this update, we have updated the required version of the dat ab ricks-l abs-ls ql package from a version greater than or equal to 0.15.0 and less than 0.16.0 to a version greater than or equal to 0.16.0 and less than 0.17.0. This change allows for the use of the latest version of the package, which includes various bug fixes and dependency updates. The package is utilized in the acceptance tests that are run as part of the CI/CD pipeline. With this update, the acceptance tests can now be executed using the most recent version of the package, resulting in enhanced functionality and reliability.
  • Updated databricks-sdk requirement from <0.42,>=0.40 to >=0.44,<0.45 (#3686). In this release, we have updated the version requirement for the databricks-sdk package to be greater than or equal to 0.44.0 and less than 0.45.0. This update allows for the use of the latest version of the databricks-sdk, which includes new methods, fields, and bug fixes. For instance, the get_message_query_result_by_attachment method has been added for the w.genie.workspace_level_service, and several fields such as review_state, reviews, and runner_collaborators have been removed for the databricks.sdk.service.clean_rooms.CleanRoomAssetNotebook object. Additionally, the securable_kind field has been removed for various objects such as CatalogInfo and ConnectionInfo. We recommend thoroughly testing this update to ensure compatibility with your project. The release notes for versions 0.44.0 and 0.43.0 can be found in the commit history. Please note that there are several backward-incompatible changes listed in the changelog for bot...
Read more

v0.56.0

25 Feb 03:53
05c2d6a
Compare
Choose a tag to compare
  • Added documentation to use Delta Live Tables migration (#3587). In this documentation update, we introduce a new section for migrating Delta Live Table pipelines to the Unity Catalog as part of the migration process. This workflow allows for the original and cloned pipelines to run independently after the cloned pipeline reaches the RUNNING state. The update includes an example of stopping and renaming an existing HMS DLT pipeline, and creating a new cloned pipeline. Additionally, known issues and limitations are outlined, such as supported streaming sources, maintenance pausing, and querying by timestamp. To streamline the migration process, the migrate-dlt-pipelines command is introduced with optional parameters for including or excluding specific pipeline IDs. This feature is intended for developers and administrators managing data pipelines and handling table aliasing issues. Relevant user documentation has been added and the changes have been manually tested.
  • Added support for MSSQL and POSTGRESQL to HMS Federation (#3701). In this enhancement, the open-source library now supports Microsoft SQL Server (MSSQL) and PostgreSQL databases in the Hive Metastore Federation (HMS Federation) feature. This update introduces classes for handling external Hive Metastore instances and their versions, and refactors a regex pattern for better support of various JDBC URL formats. A new supported_databases_port class variable is added to map supported databases to default ports, allowing the code to handle SQL Server's distinct default port. Additionally, a supported_hms_versions class variable is created, outlining supported Hive Metastore versions. The _external_hms method is updated to extract HMS version information more accurately, and the _split_jdbc_url method is refactored for better URL format compatibility and parameter extraction. The test file test_federation.py has been updated with new unit tests for external catalog creation with MSSQL and PostgreSQL, further enhancing compatibility with various databases and expanding HMS Federation's capabilities.
  • Added the CLI command for migrating DLT pipelines (#3579). A new CLI command, "migrate-dlt-pipelines," has been added for migrating DLT pipelines from HMS to UC using the DLT Migration API. This command allows users to include or exclude specific pipeline IDs during migration using the --include-pipeline-ids and --exclude-pipeline-ids flags, respectively. The change impacts the PipelinesMigrator class, which has been updated to accept and use these new parameters. Currently, there is no information available about testing, but the changes are expected to be manually tested and accompanied by corresponding unit and integration tests in the future. The changes are isolated to the PipelinesMigrator class and related functionality, with no impact on existing methods or functionality.
  • Addressed Bug with Dashboard migration (#3663). In this release, the _crawl method in dashboards.py has been enhanced to exclude SDK dashboards that lack IDs during the dashboard migration process. This modification enhances migration efficiency by avoiding unnecessary processing of incomplete dashboards. Additionally, the _list_dashboards method now includes a check for dashboards with no IDs while iterating through the dashboards_iterator. If a dashboard with no ID is found, the method fetches the dashboard details using the _get_dashboard method and adds them to the dashboards list, ensuring proper processing. Furthermore, a bug fix for issue #3663 has been implemented in the RedashDashboardCrawler class in assessment/test_dashboards.py. The get method has been added as a side effect to the WorkspaceClient mock's dashboards attribute, enabling the retrieval of individual dashboard objects by their IDs. This modification ensures that the RedashDashboardCrawler can correctly retrieve and process dashboard objects from the WorkspaceClient mock, preventing errors due to missing dashboard objects.
  • Broaden safe read text caught exception scope (#3705). In this release, the safe_read_text function has been enhanced to handle a broader range of exceptions that may occur while reading a text file, including OSError and UnicodeError, making it more robust and safe. The function previously caught specific exceptions such as FileNotFoundError, UnicodeDecodeError, and PermissionError. Additionally, the codebase has been improved with updated unit tests, ensuring that the new functionality works correctly. The linting parts of the code have also been updated, enhancing the readability and maintainability of the project for other software engineers. A new method, safe_read_text, has been added to the source_code module, with several new test cases designed to ensure that the method handles edge cases correctly, such as when the file does not exist, when the path is a directory, or when an OSError occurs. These changes make the open-source library more reliable and robust for various use cases.
  • Case sensitive/insensitive table validation (#3580). In this release, the library has been updated to enable more flexible and customizable metadata comparison for tables. A case sensitive flag has been introduced for metadata comparison, which allows for consideration or ignoring of column name case during validation. The TableMetadataRetriever abstract base class now includes a new parameter column_name_transformer in the get_metadata method, which is a callable that can be used to transform column names as needed for comparison. Additionally, a new case_sensitive parameter has been added to the StandardSchemaComparator constructor to determine whether column names should be compared case sensitively or not. A new parametrized test function test_schema_comparison_case has also been included to ensure that this functionality works as expected. These changes provide users with more control over the metadata comparison process and improve the library's handling of cases where column names in the source and target tables may have different cases.
  • Catch AttributeError in InfferedValue._safe_infer_internal (#3684). In this release, we have implemented a change to the _safe_infer_internal method in the InferredValue class to catch AttributeError. This change addresses an issue in the Astroid library reported in their GitHub repository (pylint-dev/astroid#2683) and resolves issue #3659 in our project. By handling AttributeError during the inference process, we have made the code more robust and safer. When an exception occurs, an error message is logged with debug-level logging, and the method yields the Uninferable sentinel value to indicate that inference failed for the node. This enhancement strengthens the source code linting code through value inference in our open-source library.
  • Document to run validate-groups-membership before groups migration, not after (#3631). In this release, we have updated the order of executing the validate-groups-membership command in the group migration process. Previously, the command was recommended to be run after the groups migration, but it has been updated to be executed before the migration. This change ensures that the groups have the correct membership and the number of groups and users in the workspace and account are the same before migration, providing an extra level of safety. Additionally, we have updated the remove-workspace-local-backup-groups command to remove workspace-level backup groups and their permissions only after confirming the successful migration of all groups. We have also updated the spelling of the validate-group-membership command to validate-groups-membership in a documentation file. This release is aimed at software engineers who are adopting the project and looking to migrate their groups to the account level.
  • Extend code migration progress documentation (#3588). In this documentation update, we have added two new sections, Code Migration and "Final details," to the open-source library's migration process documentation. The Code Migration section provides a detailed walkthrough of the steps to migrate code after completing table migration and data reconciliation, including using the linter to investigate compatibility issues and linted workspace resources. The "linter advices" provide codes and messages on detected issues and resolution methods. The migrated code can then be prioritized and tracked using the migration-progress dashboard, and migrated using the migrate- commands. The Final details section outlines the steps to take once code migration is complete, including running the cluster-remap command to remap clusters to be Unity Catalog compatible. This update resolves issue #2231 and includes updated user documentation, with new methods for linting and migrating local code, managing dashboard migrations, and syncing workspace information. Additional commands for creating and validating table mappings, migrating locations, and assigning metastores are also included, with the aim of improving the code migration process by providing more detailed documentation and new commands for managing the migration.
  • Fixed Skip/Unskip sch...
Read more

v0.55.0

24 Jan 15:36
c3ad142
Compare
Choose a tag to compare
  • Introducing UCX docs! (#3458). In this release, we introduced the new documents for UCX, you can find them here: https://databrickslabs.github.io/ucx/
  • Hosted Runner for release (#3532). In this release, we have made improvements to the release job's security and control by moving the release.yml file to a new location within a hosted runner group labeled "linux-ubuntu-latest." This change ensures that the release job now runs in a protected runner group, enhancing the overall security and reliability of the release process. The job's environment remains set to "release," and it retains the same authentication and artifact signing permissions as before the move, ensuring a seamless transition while improving the security and control of the release process.

Contributors: @sundarshankar89, @renardeinside

v0.54.0

23 Jan 22:16
ebe97e0
Compare
Choose a tag to compare
  • Implement disposition field in SQL backend (#3477). This commit adds a query_statement_disposition configuration option for the SQL backend in the UCX tool, allowing users to specify the disposition of SQL statements during assessment results export and preventing failures when dealing with large workspaces and a large number of findings. The new configuration option is added to the config.yml file and used by the SqlBackend definition. The databricks labs install ucx and databricks labs ucx export-assessment commands have been modified to support this new functionality. A new Disposition enum has been added to the databricks.sdk.service.sql module. This change resolves issue #3447 and is related to pull request #3455. The functionality has been manually tested.

  • AWS role issue with external locations pointing to the root of a storage account (#3510). The AWSResources class in the aws.py file has been updated to enhance the regular expression pattern for matching S3 bucket names, now including an optional group for trailing slashes and any subsequent characters. This allows for recognition of external locations pointing to the root of a storage account, addressing issue #3505. The access.py file within the AWS module has also been updated, introducing a new path variable and updating a for loop condition to accurately identify missing paths in external locations referencing the root of a storage account. New unit tests have been added to tests/unit/aws/test_access.py, including a test_uc_roles_create_all_roles method that checks the creation of all possible UC roles when none exist and external locations with and without folders. Additionally, the backend fixture has been updated to include a new external location s3://BUCKET4, and various tests have been updated to incorporate this location and handle errors appropriately.

  • Added assert to make sure installation is finished before re-installation (#3546). In this release, we have added an assertion to ensure that the installation process is completed before attempting to reinstall, addressing a previous issue where the reinstallation was starting before the first installation was finished, causing a warning to not be raised and resulting in a test failure. We have introduced a new function wait_for_installation_to_finish(), which retries loading the installation if it is not found, with a timeout of 2 minutes. This function is utilized in the test_compare_remote_local_install_versions test to ensure that the installation is finished before proceeding. Furthermore, we have extracted the warning message to a variable error_message for better readability. This change enhances the reliability of the installation process.

  • Added dashboards to migration progress dashboard (#3314). This commit introduces significant updates to the migration progress dashboard, adding dashboards, linting resources, and modifying existing components. The changes include a new dashboard displaying the number of dashboards pending migration, with the data sourced from the ucx_catalog.multiworkspace.objects_snapshot table. The existing 'Migration [main]' dashboard has been updated, and unit and integration tests have been adapted accordingly. The commit also renames several SQL files, updates the percentage UDF, grant, job, cluster, table, and pipeline migration progress queries, and resolves linting compatibility issues related to Unity Catalog. The changes depend on issue #3424, progress issue #3045, and break up issue #3112. The new dashboard aims to enhance the migration process and ensure a smooth transition to the Unity Catalog.

  • Added history log encoder for dashboards (#3424). A new history log encoder for dashboards has been added, addressing issues #3368 and #3369, and modifying the existing experimental-migration-progress workflow. This update includes the addition of the DashboardOwnership class, used to generate ownership information for dashboards, and the DashboardProgressEncoder class, responsible for encoding progress data related to dashboards. The new functionality is tested through manual, unit, and integration testing. In the Table class, the from_table_info and from_historical_data methods have been added, allowing for the creation of Table instances from TableInfo objects and historical data dictionaries with more flexibility and safety. The test_tables.py file in the integration/progress directory has also been updated to include a new test function for checking table failures. These changes improve the tracking and management of dashboard IDs, enhance user name retrieval, and ensure the accurate determination of object ownership.

  • Create specific failure for Python syntax error while parsing with Astroid (#3498). This commit enhances the Python linting functionality in our open-source library by introducing a specific failure message, python-parse-error, for syntax errors encountered during code parsing using Astroid. Previously, a generic system-error message was used, which has been renamed to maintain consistency with the existing sql-parse-error message. This change provides clearer failure indicators and includes more detailed information about the error location. Additionally, modifications to Python linting-related code, unit test additions, and updates to the README guide users on handling these new error types have been implemented. A new method, Tree.maybe_parse(), has been introduced to parse Python code and detect syntax errors, ensuring more precise error handling for users.

  • DBR 16 and later support (#3481). This pull request introduces support for Databricks Runtime (DBR) 16 and later in the code that converts Hive Metastore (HMS) tables to external tables within the migrate-tables workflow. The changes include the addition of a new static method _get_entity_storage_locations to handle the new entityStorageLocations property in DBR16 and the modification of the _convert_hms_table_to_external method to account for this property. Additionally, the run_workflow function in the assessment workflow now has the skip_job_wait parameter set to True, which allows the workflow to continue running even if a job within it fails. The changes have been manually tested for DBR16, verified in a staging environment, and existing integration tests have been run for DBR 15. The diff also includes updates to the test_table_migration_convert_manged_to_external method to skip job waiting during testing, enabling the test to run successfully on DBR 16.

  • Delete stale code: NotebookLinter._load_source_from_run_cell (#3529). In this update, we have removed the stale code NotebookLinter._load_source_from_run_cell, which was responsible for loading the source code from a run cell in a notebook. This change is a part of the ongoing effort to address issue #3514 and enhances the overall codebase. Additionally, we have modified the existing databricks labs ucx lint-local-code command to update the code linting functionality. We have conducted manual testing to ensure that the changes function as intended and have added and modified several unit tests. The _load_source_from_run_cell method is no longer needed, as it was part of a deprecated functionality. The modifications to the databricks labs ucx lint-local-code command impact the way code linting is performed, ultimately improving the efficiency and maintainability of the codebase.

  • Exclude ucx dashboards from Lakeview dashboard crawler (#3450). In this release, we have enhanced the lakeview_crawler method in the open-source library to exclude Ucx dashboards and prevent false positives. This has been achieved by adding a new optional argument, exclude_dashboard_ids, to the init method, which takes a list of dashboard IDs to exclude from the crawler. The _crawl method has been updated to skip dashboards whose IDs match the ones in the exclude_dashboard_ids list. The change includes unit tests and manual testing to ensure proper functionality and has been verified on the staging environment. These updates improve the accuracy and reliability of the dashboard crawler, providing better results for software engineers utilizing this library.

  • Fixed issue in installing UCX on UC enabled workspace (#3501). This PR introduces changes to the ClusterPolicyInstaller class, updating the spark_version policy definition from a fixed value to an allowlist with a default value. This resolves an issue where, when UC is enabled on a workspace, the cluster definition takes on single_user and user_isolation values instead of Legacy_Single_User and 'Legacy_Table_ACL'. The job definition is also updated to use the default value when not explicitly provided. These changes improve compatibility with UC-enabled workspaces, ensuring the correct values for spark_version in the cluster definition. The PR includes updates to unit tests and installation tests, addressing issue #3420.

  • Fixed typo in workflow name (in error message) (#3491). This PR (Pull Request) addresses a minor typo in the error message displayed by the validate_groups_permissions method in the workflows.py file. The typo occurred in the workflow name mentioned in the error message, where group was incorrectly spelled as "groups." The corrected spelling is now validate-groups-permissions. This change does not introduce any new methods or modify any existing functionality, but instead focuses on enhancing the...

Read more

v0.53.1

30 Dec 16:57
a77ca8b
Compare
Choose a tag to compare
  • Removed packaging package dependency (#3469). In this release, we have removed the dependency on the packaging package in the open-source library to address a release issue. The import statements for "packaging.version.Version" and "packaging.version.InvalidVersion" have been removed. The function _external_hms in the federation.py file has been updated to retrieve the Hive Metastore version using the "spark.sql.hive.metastore.version" configuration key and validate it using a regular expression pattern. If the version is not valid, the function logs an informational message and returns None. This change modifies the Hive Metastore version validation logic and improves the overall reliability and maintainability of the library.

Contributors: @FastLee

v0.53.0

23 Dec 18:28
dcfe27e
Compare
Choose a tag to compare
  • Added dashboard crawlers (#3397). The open-source library has been updated with new dashboard crawlers for the assessment workflow, Redash migration, and QueryLinter. These crawlers are responsible for crawling and persisting dashboards, as well as migrating or reverting them during Redash migration. They also lint the queries of the crawled dashboards using QueryLinter. This change resolves issues #3366 and #3367, and progresses #2854. The 'databricks labs ucx {migrate-dbsql-dashboards|revert-dbsql-dashboards}' command and the assessment workflow have been modified to incorporate these new features. Unit tests and integration tests have been added to ensure proper functionality of the new dashboard crawlers. Additionally, two new tables, $inventory.redash_dashboards and $inventory.lakeview_dashboards, have been introduced to hold a list of all Redash or Lakeview dashboards and are used by the QueryLinter and Redash migration. These changes improve the assessment, migration, and linting processes for dashboards in the library.
  • DBFS Root Support for HMS Federation (#3425). The commit DBFS Root Support for HMS Federation introduces changes to support the DBFS root location for HMS federation. A new method, external_locations_with_root, is added to the ExternalLocations class to return a list of external locations including the DBFS root location. This method is used in various functions and test cases, such as test_create_uber_principal_no_storage, test_create_uc_role_multiple_raises_error, test_create_uc_no_roles, test_save_spn_permissions, and test_create_access_connectors_for_storage_accounts, to ensure that the DBFS root location is correctly identified and tested in different scenarios. Additionally, the external_locations.snapshot.return_value is changed to external_locations.external_locations_with_root.return_value in test functions test_create_federated_catalog and test_already_existing_connection to retrieve a list of external locations including the DBFS root location. This commit closes issue #3406, which was related to this functionality. Overall, these changes improve the handling and testing of DBFS root location in HMS federation.
  • Log message as error when legacy permissions API is enabled/disabled depending on the workflow ran (#3443). In this release, logging behavior has been updated in several methods in the 'workflows.py' file. When the use_legacy_permission_migration configuration is set to False and specific conditions are met, error messages are now logged instead of info messages for the methods 'verify_metastore_attached', 'rename_workspace_local_groups', 'reflect_account_groups_on_workspace', 'apply_permissions_to_account_groups', 'apply_permissions', and 'validate_groups_permissions'. This change is intended to address issue #3388 and provides clearer guidance to users when the legacy permissions API is not functioning as expected. Users will now see an error message advising them to run the migrate-groups job or set use_legacy_permission_migration to True in the config.yml file. These updates will help ensure smoother workflow runs and more accurate logging for better troubleshooting.
  • MySQL External HMS Support for HMS Federation (#3385). This commit adds support for MySQL-based Hive Metastore (HMS) in HMS Federation, enhances the CLI for creating a federated catalog, and improves external HMS functionality. It introduces a new parameter enable_hms_federation in the Locations class constructor, allowing users to enable or disable MySQL-based HMS federation. The external_locations method in application.py now accepts enable_hms_federation as a parameter, enabling more granular control of the federation feature. Additionally, the CLI for creating a federated catalog has been updated to accept a prompts parameter, providing more flexibility. The commit also introduces a new dataclass ExternalHmsInfo for external HMS connection information and updates the HiveMetastoreFederationEnabler and HiveMetastoreFederation classes to support non-Glue external metastores. Furthermore, it adds methods to handle the creation of a Federated Catalog from the command-line interface, split JDBC URLs, and manage external connections and permissions.
  • Skip listing built-in catalogs to update table migration process (#3464). In this release, the migration process for updating tables in the Hive Metastore has been optimized with the introduction of the TableMigrationStatusRefresher class, which inherits from CrawlerBase. This new class includes modifications to the _iter_schemas method, which now filters out built-in catalogs and schemas when listing catalogs and schemas, thereby skipping unnecessary processing during the table migration process. Additionally, the get_seen_tables method has been updated to include checks for schema.name and schema.catalog_name, and the _crawl and _try_fetch methods have been modified to reflect changes in the TableMigrationStatus constructor. These changes aim to improve the efficiency and performance of the migration process by skipping built-in catalogs and schemas. The release also includes modifications to the existing migrate-tables workflow and adds unit tests that demonstrate the exclusion of built-in catalogs during the table migration status update process. The test case utilizes the CatalogInfoSecurableKind enumeration to specify the kind of catalog and verifies that the seen tables only include the non-builtin catalogs. These changes should prevent unnecessary processing of built-in catalogs and schemas during the table migration process, leading to improved efficiency and performance.
  • Updated databricks-sdk requirement from <0.39,>=0.38 to >=0.39,<0.40 (#3434). In this release, the requirement for the databricks-sdk package has been updated in the pyproject.toml file to be strictly greater than or equal to 0.39 and less than 0.40, allowing for the use of the latest version of the package while preventing the use of versions above 0.40. This change is based on the release notes and changelog for version 0.39 of the package, which includes bug fixes, internal changes, and API changes such as the addition of the cleanrooms package, delete() method for workspace-level services, and fields for various request and response objects. The commit history for the package is also provided. Dependabot has been configured to resolve any conflicts with this PR and can be manually triggered to perform various actions as needed. Additionally, Dependabot can be used to ignore specific dependency versions or close the PR.
  • Updated databricks-sdk requirement from <0.40,>=0.39 to >=0.39,<0.41 (#3456). In this pull request, the version range of the databricks-sdk dependency has been updated from '<0.40,>=0.39' to '>=0.39,<0.41', allowing the use of the latest version of the databricks-sdk while ensuring that it is less than 0.41. The pull request also includes release notes detailing the API changes in version 0.40.0, such as the addition of new fields to various compute, dashboard, job, and pipeline services. A changelog is provided, outlining the bug fixes, internal changes, new features, and improvements in versions 0.39.0, 0.40.0, and 0.38.0. A list of commits is also included, showing the development progress of these versions.
  • Use LTS Databricks runtime version (#3459). This release introduces a change in the Databricks runtime version to a Long-Term Support (LTS) release to address issues encountered during the migration to external tables. The previous runtime version caused the convert to external table migration strategy to fail, and this change serves as a temporary solution. The migrate-tables workflow has been modified, and existing integration tests have been reused to ensure functionality. The test_job_cluster_policy function now uses the LTS version instead of the latest version, ensuring a specified Spark version for the cluster policy. The function also checks for matching node type ID, Spark version, and necessary resources. However, users may still encounter problems with the latest Universal Connectivity (UCX) release. The _convert_hms_table_to_external method in the table_migrate.py file has been updated to return a boolean value, with a new TODO comment about a possible failure with Databricks runtime 16.0 due to a JDK update.
  • Use CREATE_FOREIGN_CATALOG instead of CREATE_FOREIGN_SECURABLE with HMS federation enablement commands (#3309). A change has been made to update the databricks-sdk dependency version from >=0.38,<0.39 to >=0.39 in the pyproject.toml file, which may affect the project's functionality related to the databricks-sdk library. In the Hive Metastore Federation codebase, CREATE_FOREIGN_CATALOG is now used instead of CREATE_FOREIGN_SECURABLE for HMS federation enablement commands, aligned with issue #3308. The _add_missing_permissions_if_needed method has been updated to check for CREATE_FOREIGN_SECURABLE instead of CREATE_FOREIGN_CATALOG when granting permissions. Additionally, a unit test file for HiveMetastore Federation has ...
Read more

v0.52.0

12 Dec 14:42
136c536
Compare
Choose a tag to compare
  • Added handling for Databricks errors during workspace listings in the table migration status refresher (#3378). In this release, we have implemented changes to enhance error handling and improve the stability of the table migration status refresher in the open-source library. We have resolved issue #3262, which addressed Databricks errors during workspace listings. The assessment workflow has been updated, and new unit tests have been added to ensure proper error handling. The changes include the import of DatabricksError from the databricks.sdk.errors module and the addition of a new method _iter_catalogs to list catalogs with error handling for DatabricksError. The _iter_schemas method now replaces _ws.catalogs.list() with self._iter_catalogs(), also including error handling for DatabricksError. Furthermore, new unit tests have been developed to check the logging of the TableMigration class when listing tables in the Databricks workspace, focusing on handling errors during catalog, schema, and table listings. These changes improve the library's robustness and ensure that it can gracefully handle errors during the table migration status refresher process.
  • Convert READ_METADATA to UC BROWSE permission for tables, views and database (#3403). The uc_grant_sql method in the grants.py file has been modified to convert READ_METADATA permissions to BROWSE permissions for tables, views, and databases. This change involves adding new entries to the dictionary used to map permission types to their corresponding UC actions and has been manually tested. The behavior of the grant_loader function in the hive_metastore module has also been modified to change the action type of a grant from READ_METADATA to EXECUTE for a specific case. Additionally, the test_grants.py unit test file has been updated to include a new test case that verifies the conversion of READ_METADATA to BROWSE for a grant on a database and handles the conversion of READ_METADATA permission to UC BROWSE for a new udf="function" parameter. These changes resolve issue #2023 and have been tested through manual testing and unit tests. No new methods have been added, and existing functionality has been changed in a limited scope. No new unit or integration tests have been added as it is assumed that the existing tests will continue to pass after these changes have been made.
  • Migrates Pipelines crawled during the assessment phase (#2778). A new utility class, PipelineMigrator, has been introduced in this release to facilitate the migration of Databricks Labs SQL (DLT) pipelines. This class is used in a new workflow that tests pipeline migration, which involves cloning DLT pipelines in the assessment phase with specific configurations to a new Unity Catalog (UC) pipeline. The migration can be skipped for certain pipelines by specifying their pipeline IDs in a list. Three test scenarios, each with different pipeline specifications, are defined to ensure the proper functioning of the migration process under various conditions. The class and the migration process are thoroughly tested with manual testing, unit tests, and integration tests, with no reliance on a staging environment. The migration process takes into account the WorkspaceClient, WorkspaceContext, AccountClient, and a flag for running the command as a collection. The PipelinesMigrator class uses a PipelinesCrawler and JobsCrawler to perform the migration and ensures better functionality for the users with additional parameters. The commit also introduces a new command, migrate_dlt_pipelines, to the CLI of the ucx package, which helps migrate DLT pipelines. The migration process is tested using a mock installation, unit tests, and integration tests. The tests cover the scenario where the installation has two jobs, test and 'assessment', with job IDs 123 and 456 respectively. The state of the installation is recorded in a state.json file. A configuration file pipeline_mapping.csv is used to map the source pipeline ID to the target catalog, schema, pipeline, and workspace names.
  • Removed try-except around verifying the migration progress prerequisites in the migrate-tables cli command (#3439). In the latest release, the ucx package's migrate-tables CLI command has undergone a significant modification in the handling of progress tracking prerequisites. The previous try-except block surrounding the verification has been removed, and the RuntimeWarning is now propagated, providing a more specific and helpful error message. If the prerequisites are not met, the verify method will raise an exception, and the migration will not proceed. This change enhances the accuracy of error messages for users and ensures that the prerequisites for migration are properly met. The tests for migrate_tables have been updated accordingly, including a new test case test_migrate_tables_errors_out_before_assessment that checks whether the migration does not proceed with the verification fails. This change affects the existing databricks labs ucx migrate-tables command and brings improved precision and reliability to the migration process.
  • Removed redundant internal methods from create_account_group (#3395). In this change, the create_account_group function's internal methods have been removed, and its signature has been modified to retrieve the workspace ID from accountworkspace._workspaces() instead of passing it as a parameter. This resolves issue #3170 and improves code efficiency by removing unnecessary parameters and methods. The AccountWorkspaces class now accepts a list of workspace IDs upon instantiation, enhancing code readability and eliminating redundancy. The function has been tested with unit tests, ensuring it creates a group if it doesn't exist, throws an exception if a group already exists, filters system groups, and handles cases where a group already has the required number of members in a workspace. These changes simplify the codebase, eliminate redundancy, and improve the maintainability of the project.
  • Updated sqlglot requirement from <25.33,>=25.5.0 to >=25.5.0,<25.34 (#3407). In this release, we have updated the sqlglot requirement to version 25.33.9999 from a range that included versions 25.5.0 to 25.32.9999. This update allows us to utilize the latest version of sqlglot, which includes various bug fixes and new features. In v25.33.0, there were two breaking changes: the TIMESTAMP data type now maps to Type.TIMESTAMPTZ, and the NEXT keyword is now treated as a function keyword. Several new features were also introduced, including support for generated columns in PostgreSQL and the ability to preserve tables in the replace_table method. Additionally, there were several bug fixes, including fixes for issues related to BigQuery, Presto, and Spark. The v25.32.1 release contained two bug fixes related to BigQuery and one bug fix related to Presto. Furthermore, v25.32.0 had three breaking changes: support for ATTACH/DETACH statements, tokenization of hints as comments, and a fix to datetime coercion in the canonicalize rule. This release also introduced new features, such as support for TO_TIMESTAMP* variants in Snowflake and improved error messages in the Redshift transpiler. Lastly, there were several bug fixes, including fixes for issues related to SQL Server, MySQL, and PostgreSQL.
  • Updated sqlglot requirement from <25.33,>=25.5.0 to >=25.5.0,<25.35 (#3413). In this release, the sqlglot dependency has been updated from a version range that allows up to 25.33, but excludes 25.34, to a version range that allows 25.5.0 and above, but excludes 25.35. This update was made to enable the latest version of sqlglot, which includes one breaking change related to the alias expansion of USING STRUCT fields. This version also introduces two new features, an optimization for alias expansion of USING STRUCT fields, and support for generated columns in PostgreSQL. Additionally, two bug fixes were implemented, addressing proper consumption of dashed table parts and removal of parentheses from CURRENT_USER in Presto. The update also includes a fix to make TIMESTAMP map to Type.TIMESTAMPTZ, a fix to parse DEFAULT in VALUES clause into a Var, and changes to the BigQuery and Snowflake dialects to improve transpilation and JSONPathTokenizer leniency. The commit message includes a reference to issue [#3413](https://github.com/databrickslabs/ucx/issues/3413) and a link to the sqlglot changelog for further reference.
  • Updated sqlglot requirement from <25.35,>=25.5.0 to >=25.5.0,<26.1 (#3433). In this release, we have updated the required version of the sqlglot library to a range that includes version 25.5.0 but excludes version 26.1. This change is crucial due to the breaking changes introduced in sqlglot v26.0.0 that are not yet compatible with our project. The commit message includes the changelog for sqlglot v26.0.0, which highlights the breaking changes, new features, bug fixes, and other modifications in this version. Additionally, the commit includes a list of commits merged into the sqlglot repository for a comprehensive understanding of the changes. As a software engineer, I recommend approving this change to maintain compatibility with sqlglot. However, I advise thorough testing to ensure the updated version does n...
Read more

v0.51.0

02 Dec 20:39
b422e78
Compare
Choose a tag to compare
  • Added assign-owner-group command (#3111). The Databricks Labs Unity Catalog Exporter (UCX) tool now includes a new assign-owner-group command, allowing users to assign an owner group to the workspace. This group will be designated as the owner for all migrated tables and views, providing better control and organization of resources. The command can be executed in the context of a specific workspace or across multiple workspaces. The implementation includes new classes, methods, and attributes in various files, such as cli.py, config.py, and groups.py, enhancing ownership management functionality. The assign-owner-group command replaces the functionality of issue #3075 and addresses issue #2890, ensuring proper schema ownership and handling of crawled grants. Developers should be aware that running the migrate-tables workflow will result in assigning a new owner group for the Hive Metastore instance in the workspace installation.
  • Added opencensus to known list (#3052). In this release, we have added OpenCensus to the list of known libraries in our configuration file. OpenCensus is a popular set of tools for distributed tracing and monitoring, and its inclusion in our system will enhance support and integration for users who utilize this tool. This change does not affect existing functionality, but instead adds a new entry in the configuration file for OpenCensus. This enhancement will allow our library to better recognize and work with OpenCensus, enabling improved performance and functionality for our users.
  • Added default owner group selection to the installer (#3370). A new class, AccountGroupLookup, has been added to the AccountGroupLookup module to select the default owner group during the installer process, addressing previous issue #3111. This class uses the workspace_client to determine the owner group, and a pick_owner_group method to prompt the user for a selection if necessary. The ownership selection process has been improved with the addition of a check in the installer's _static_owner method to determine if the current user is part of the default owner group. The GroupManager class has been updated to use the new AccountGroupLookup class and its methods, pick_owner_group and validate_owner_group. A new variable, default_owner_group, is introduced in the ConfigureGroups class to configure groups during installation based on user input. The installer now includes a unit test, "test_configure_with_default_owner_group", to demonstrate how it sets expected workspace configuration values when a default owner group is specified during installation.
  • Added handling for non UTF-8 encoded notebook error explicitly (#3376). A new enhancement has been implemented to address the issue of non-UTF-8 encoded notebooks failing to load by introducing explicit error handling for this case. A UnicodeDecodeError exception is now caught and logged as a warning, while the notebook is skipped and returned as None. This change is implemented in the load_dependency method in the loaders.py file, which is a part of the assessment workflow. Additionally, a new unit test has been added to verify the behavior of this change, and the assessment workflow has been updated accordingly. The new test function in test_loaders.py checks for different types of exceptions, specifically PermissionError and UnicodeDecodeError, ensuring that the system can handle notebooks with non-UTF-8 encoding gracefully. This enhancement resolves issue #3374, thereby improving the overall robustness of the application.
  • Added migration progress documentation (#3333). In this release, we have updated the migration-progress-experimental workflow to track the migration progress of a subset of inventory tables related to workspace resources being migrated to Unity Catalog (UCX). The workflow updates the inventory tables and tracks the migration progress in the UCX catalog tables. To use this workflow, users must attach a UC metastore to the workspace, create a UCX catalog, and ensure that the assessment job has run successfully. The Migration Progress section in the documentation has been updated with a new markdown file that provides details about the migration progress, including a migration progress dashboard and an experimental migration progress workflow that generates historical records of inventory objects relevant to the migration progress. These records are stored in the UCX UC catalog, which contains a historical table with information about the object type, object ID, data, failures, owner, and UCX version. The migration process also tracks dangling Hive or workspace objects that are not referenced by business resources, and the progress is persisted in the UCX UC catalog, allowing for cross-workspace tracking of migration progress.
  • Added note about running assessment once (#3398). In this release, we have introduced an update to the UCX assessment workflow, which will now only be executed once and will not update existing results in repeated runs. To accommodate this change, we have updated the README file with a note clarifying that the assessment workflow is a one-time process. Additionally, we have provided instructions on how to update the inventory and findings by uninstalling and reinstalling the UCX. This will ensure that the inventory and findings for a workspace are up-to-date and accurate. We recommend that software engineers take note of this change and follow the updated instructions when using the UCX assessment workflow.
  • Allowing skipping TACLs migration during table migration (#3384). A new optional flag, "skip_tacl_migration", has been added to the configuration file, providing users with more flexibility during migration. This flag allows users to control whether or not to skip the Table Access Control Language (TACL) migration during table migrations. It can be set when creating catalogs and schemas, as well as when migrating tables or using the migrate_grants method in application.py. Additionally, the install.py file now includes a new variable, skip_tacl_migration, which can be set to True during the installation process to skip TACL migration. New test cases have been added to verify the functionality of skipping TACL migration during grants management and table migration. These changes enhance the flexibility of the system for users managing table migrations and TACL operations in their infrastructure, addressing issues #3384 and #3042.
  • Bump databricks-sdk and databricks-labs-lsql dependencies (#3332). In this update, the databricks-sdk and databricks-labs-lsql dependencies are upgraded to versions 0.38 and 0.14.0, respectively. The databricks-sdk update addresses conflicts, bug fixes, and introduces new API additions and changes, notably impacting methods like create(), execute_message_query(), and others in workspace-level services. While databricks-labs-lsql updates ensure compatibility, its changelog and specific commits are not provided. This pull request also includes ignore conditions for the databricks-sdk dependency to prevent future Dependabot requests. It is strongly advised to rigorously test these updates to avoid any compatibility issues or breaking changes with the existing codebase. This pull request mirrors another (#3329), resolving integration CI issues that prevented the original from merging.
  • Explain failures when cluster encounters Py4J error (#3318). In this release, we have made significant improvements to the error handling mechanism in our open-source library. Specifically, we have addressed issue #3318, which involved handling failures when the cluster encounters Py4J errors in the databricks/labs/ucx/hive_metastore/tables.py file. We have added code to raise noisy failures instead of swallowing the error with a warning when a Py4J error occurs. The functions _all_databases() and _list_tables() have been updated to check if the error message contains "py4j.security.Py4JSecurityException", and if so, log an error message with instructions to update or reinstall UCX. If the error message does not contain "py4j.security.Py4JSecurityException", the functions log a warning message and return an empty list. These changes also resolve the linked issue #3271. The functionality has been thoroughly tested and verified on the labs environment. These improvements provide more informative error messages and enhance the overall reliability of our library.
  • Rearranged job summary dashboard columns and make job_name clickable (#3311). In this update, the job summary dashboard columns have been improved and the need for the 30_3_job_details.sql file, which contained a SQL query for selecting job details from the inventory.jobs table, has been eliminated. The dashboard columns have been rearranged, and the job_name column is now clickable, providing easy access to job details via the corresponding job ID. The changes include modifying the...
Read more