Lucene.NET Exception on Azure - segments.* file not found in AzureDirectory

event 2022-06-01 visibility 295 comment 0 insights
more_vert
insights Stats
Kontext Kontext Big Data Forum

Discuss big data frameworks/technologies such as Hadoop, Spark,  etc. 

Issue context

Environment

.NET 6.0

Lucene.Net 4.8.0-beta0016

Lucene.Net.QueryParser 4.8.0-beta16

Lucene.Net.Store.Azure 4.8.0-beta16 (I upgraded to beta16 manually to be consistent)

Exception

When querying the index stored in Azure Blob Storage after modification, the following exception is thrown:

Error occurred while searching keywords: Lucene.Net.Index.IndexNotFoundException: no segments* file found in AzureDirectory

More details about the issue

When the index is created for the first time, the following files are generated:

20220601112541-image.png

When querying the indexes, there are no issues.

A second run modifies the index to delete all documents and re-add them and add a new document to the index. The index directory in blob storage now looks like the following screenshot:

20220601112944-image.png

When querying the index, the exception is thrown:

Error occurred while searching keywords: Lucene.Net.Index.IndexNotFoundException: no segments* file found in AzureDirectory@39b5dcc lockFactory=NativeFSLockFactory@: files: []
         at Lucene.Net.Index.SegmentInfos.FindSegmentsFile.Run(IndexCommit commit)
         at Lucene.Net.Index.StandardDirectoryReader.Open(Directory directory, IndexCommit commit, Int32 termInfosIndexDivisor)
         at Lucene.Net.Index.DirectoryReader.Open(Directory directory)

Root issue

segments.gen file was deleted when modifying indexes. The upload failed when deleting it as the file already exists in Azure. Thus to fix this problem, we just need to to set overwrite parameter to true.

When I looked into the AzureDirectory lib I am using, it has not implemented all the functions properly:

protected override void Dispose(bool disposing)
        {
            _fileMutex.WaitOne();
            try
            {
                // make sure it's all written out
                _indexOutput.Flush();

                long originalLength = _indexOutput.Length;
                _indexOutput.Dispose();

                using (var blobStream = new StreamInput(CacheDirectory.OpenInput(_name, IOContext.DEFAULT)))
                {
                    // push the blobStream up to the cloud
                    _blob.Upload(blobStream, overwrite: true);

                    // set the metadata with the original index file properties
                    //_blob.SetMetadata();

                    Debug.WriteLine($"{_azureDirectory.Name} PUT {_name} bytes to {blobStream.Length} in cloud");
                }

#if FULLDEBUG
                Debug.WriteLine($"{_azureDirectory.Name} CLOSED WRITESTREAM {_name}");
#endif
                // clean up
                _indexOutput = null;
                _blobContainer = null;
                _blob = null;
                GC.SuppressFinalize(this);
            }
            finally
            {
                _fileMutex.ReleaseMutex();
            }
        }
More from Kontext
comment Comments
No comments yet.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts