[help] tesseract output mostly null

![image](https://github.com/Soumi7/Table_Data_Extraction/assets/49953642/bdd98285-a013-4860-a345-32721572940d)

I can divide the table into separate cells, but the OCR output from pytesseract is mostly empty. Could you help to improve the output?

```python
text =[]
basewidth = 300
count = 1

for row in rows:
    row_text = '|'
    for cell in row:
        (x,y,w,h) = cell
        roi = bitnot[y:y+h, x:x+w]
        kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2, 1))
        border = cv2.copyMakeBorder(roi,2,2,2,2, cv2.BORDER_CONSTANT,value=[255,255])
        resizing = cv2.resize(border, None, fx=2, fy=2, interpolation=cv2.INTER_CUBIC)
        dilation = cv2.dilate(resizing, kernel,iterations=1)
        erosion = cv2.erode(dilation, kernel,iterations=2)

        height, width = erosion.shape[:2]
        wpercent = basewidth / float(width)
        hsize = int(float(height) * wpercent)
        text_img = cv2.resize(erosion, (basewidth, hsize))
        text_img = cv2.bitwise_not(text_img)


        plt.subplot(4,3,count)
        plt.imshow(erosion,cmap='gray')
        count += 1

        out = pytesseract.image_to_string(text_img)
        # if(len(out)==0):
        #     out = pytesseract.image_to_string(erosion)
        row_text += out + '|'

    text.append(row_text)


plt.show()
print(text)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[help] tesseract output mostly null #5

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[help] tesseract output mostly null #5

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions