-
Notifications
You must be signed in to change notification settings - Fork 0
Pdf text phase1 the framework #37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: origin/master
Are you sure you want to change the base?
Conversation
…amework' into pdf-text-phase1-the-framework
…phase1-the-framework
aoloe
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi oliver
generally speaking, the code looks good.
most of my comments are about small details.
but i could not find any "active" code, code that does something. are you sure that you included all the files?
that was not very important for my review, since i will have to let jean review most of the review on the code that directly works on the pdf (he has much more experience than me on the field).
you can edit the branch you used for this branch to amend this pull request until you're ready for a further review.
|
thanks,
it seems like you only got the diff for the header file, there should have
been a lot more content in there. I'll try and find out what's going on.
…On Mon, 8 Jun 2020 at 10:26, ale rimoldi ***@***.***> wrote:
***@***.**** commented on this pull request.
hi oliver
generally speaking, the code looks good.
most of my comments are about small details.
but i could not find any "active" code, code that does something. are you
sure that you included all the files?
that was not very important for my review, since i will have to let jean
review most of the review on the code that directly works on the pdf (he
has much more experience than me on the field).
you can edit the branch you used for this branch to amend this pull
request until you're ready for a further review.
------------------------------
In scribus/plugins/import/pdf/slaoutput.h
<#37 (comment)>:
> @@ -153,6 +153,73 @@ class AnoOutputDev : public OutputDev
QStringList *m_importedColors;
};
+/* PDF TextBox Framework */
+/*
+* Holds all the dtails for each glyph in the text imported from the pdf file.
+*
+*/
+struct PdfGlyph {
+ QPointF position; // Absolute glyph coords
+ double dx; // X advance value
+ double dy; // Y advance value
+ double rise; // Text rise parameter
IMO rise needs a better documentation... on the other side dx and dy do no
really need one
------------------------------
In scribus/plugins/import/pdf/slaoutput.h
<#37 (comment)>:
> @@ -153,6 +153,73 @@ class AnoOutputDev : public OutputDev
QStringList *m_importedColors;
};
+/* PDF TextBox Framework */
+/*
+* Holds all the dtails for each glyph in the text imported from the pdf file.
+*
+*/
+struct PdfGlyph {
+ QPointF position; // Absolute glyph coords
+ double dx; // X advance value
+ double dy; // Y advance value
+ double rise; // Text rise parameter
+ QString code; // UTF-16 coded character but we only store and use UTF-8, the slternstive is const char * for utf8 so far as qt is concerned
not sure this comment is needed. using char* is imo not an alternative,
but a possible performance enhancement, which is imo only ok if you're 100%
sure that the exception is needed and you can prove that.
on the other side: you say that code contains a character as utf-8, so
why not using QChar and call the variable character? then, you don't need
to document that code contains a character.
my suggestion:
QChar character; // UTF8
finally, since the struct is called PdfGlyph, make sure that this
variable contains a character and not a glyph... i did not check that.
------------------------------
In scribus/plugins/import/pdf/slaoutput.h
<#37 (comment)>:
> @@ -153,6 +153,73 @@ class AnoOutputDev : public OutputDev
QStringList *m_importedColors;
};
+/* PDF TextBox Framework */
+/*
+* Holds all the dtails for each glyph in the text imported from the pdf file.
+*
+*/
+struct PdfGlyph {
+ QPointF position; // Absolute glyph coords
+ double dx; // X advance value
+ double dy; // Y advance value
+ double rise; // Text rise parameter
+ QString code; // UTF-16 coded character but we only store and use UTF-8, the slternstive is const char * for utf8 so far as qt is concerned
+ bool is_space;
if space is not just , it's the righte place to say what you consider
being a space.
------------------------------
In scribus/plugins/import/pdf/pdfimportoptions.cpp
<#37 (comment)>:
> @@ -71,6 +76,8 @@ void PdfImportOptions::setUpOptions(const QString& fileName, int actPage, int nu
ui->cropGroup->setVisible(cropPossible);
ui->cropGroup->setChecked(cropPossible);
ui->cropBox->setCurrentIndex(3); // Use CropBox by default
+ ui->textAsVectors->setChecked(true);
even if it's probably not finished yet, i like the option of being able to
import both the vector and text version.
at the end, one of both will probably need to go a separate layer (when
both are active)
------------------------------
In scribus/plugins/import/pdf/slaoutput.h
<#37 (comment)>:
> +*
+*/
+struct PdfGlyph {
+ QPointF position; // Absolute glyph coords
+ double dx; // X advance value
+ double dy; // Y advance value
+ double rise; // Text rise parameter
+ QString code; // UTF-16 coded character but we only store and use UTF-8, the slternstive is const char * for utf8 so far as qt is concerned
+ bool is_space;
+};
+
+
+class TextRegionLine
+{
+public:
+ qreal maxHeight = -1;
i'm not sure that initializing to -1 is a good idea.
it looks like you want to initialize with invalid values... which is
probably not the best way to do it.
imo, the initial value should be valid and hint to what the variable will
contain.
------------------------------
In scribus/plugins/import/pdf/slaoutput.h
<#37 (comment)>:
> +*
+*/
+struct PdfGlyph {
+ QPointF position; // Absolute glyph coords
+ double dx; // X advance value
+ double dy; // Y advance value
+ double rise; // Text rise parameter
+ QString code; // UTF-16 coded character but we only store and use UTF-8, the slternstive is const char * for utf8 so far as qt is concerned
+ bool is_space;
+};
+
+
+class TextRegionLine
+{
+public:
+ qreal maxHeight = -1;
(in the future we will have optional but for now, i fear that you should
write your algorithm in a way that it can run with values that are always
valid)
------------------------------
In scribus/plugins/import/pdf/slaoutput.h
<#37 (comment)>:
> +
+class TextRegionLine
+{
+public:
+ qreal maxHeight = -1;
+ qreal modeHeigth = -1;
+ qreal width = -1;
+ int glyphIndex = -1;
+ QPointF baseOrigin = QPointF(-1, -1);
+ std::vector<TextRegionLine> segments = std::vector<TextRegionLine>();
+
+};
+
+class TextRegion {
+public:
+ enum FRAMEWORKLINETESTS {
if you don't plan to use other enums in this class, that's fine, otherwise
i would use a class enum to give a scope to the enuzm values.
imo the enum name should be camelcase, starting with an uppercase F, not
all uppercase. it's a type...
------------------------------
In scribus/plugins/import/pdf/slaoutput.h
<#37 (comment)>:
> + qreal lineSpacing = -1;
+ std::vector<TextRegionLine> textRegionLines = std::vector<TextRegionLine>();
+ qreal maxWidth = -1;
+ QPointF lineBaseXY = QPointF(-1, -1); //updated with the best match left value from all the textRegionLines and the best bottom value from the textRegionLines.segments;
+ QPointF lastXY = QPointF(-1, -1);
+ static bool coLinera(qreal a, qreal b);
+ bool closeToX(qreal x1, qreal x2);
+ bool closeToY(qreal y1, qreal y2);
+ bool adjunctLesser(qreal testY, qreal lastY, qreal baseY);
+ bool adjunctGreater(qreal testY, qreal lastY, qreal baseY);
+ TextRegion::FRAMEWORKLINETESTS linearTest(QPointF point, bool xInLimits, bool yInLimits);
+ TextRegion::FRAMEWORKLINETESTS isRegionConcurrent(QPointF newPoint);
+ TextRegion::FRAMEWORKLINETESTS moveToPoint(QPointF newPoint);
+ TextRegion::FRAMEWORKLINETESTS addGlyphAtPoint(QPointF newGlyphPoint, PdfGlyph new_glyph);
+ void renderToTextFrame(PageItem* textNode, ParagraphStyle& pStyle);
+ std::vector<PdfGlyph> glyphs; //this may replace some of the other settings or it may not, certainly not font as text gets flushed if the font changes
always prefix this kind of comments with your name (or abbreviation) and
maybe add a date (even if blame can tell us)
------------------------------
In scribus/plugins/import/pdf/slaoutput.h
<#37 (comment)>:
> @@ -232,6 +299,7 @@ class SlaOutputDev : public OutputDev
void endMarkedContent(GfxState *state) override;
void markPoint(POPPLER_CONST char *name) override;
void markPoint(POPPLER_CONST char *name, Dict *properties) override;
+
empty line...
------------------------------
In scribus/plugins/import/pdf/slaoutput.h
<#37 (comment)>:
> @@ -273,6 +342,21 @@ class SlaOutputDev : public OutputDev
void type3D0(GfxState * /*state*/, double /*wx*/, double /*wy*/) override;
void type3D1(GfxState * /*state*/, double /*wx*/, double /*wy*/, double /*llx*/, double /*lly*/, double /*urx*/, double /*ury*/) override;
+ //text as text
+
+ AddCharInterface* addChar = nullptr;
+
+ enum ADDCHARMODE {
see above
------------------------------
In scribus/plugins/import/pdf/slaoutput.h
<#37 (comment)>:
> +
+class TextRegionLine
+{
+public:
+ qreal maxHeight = -1;
+ qreal modeHeigth = -1;
+ qreal width = -1;
+ int glyphIndex = -1;
+ QPointF baseOrigin = QPointF(-1, -1);
+ std::vector<TextRegionLine> segments = std::vector<TextRegionLine>();
+
+};
+
+class TextRegion {
+public:
+ enum FRAMEWORKLINETESTS {
ah, and the Scribus styles requires the { on tis own line...
------------------------------
In scribus/plugins/import/pdf/slaoutput.h
<#37 (comment)>:
> +};
+
+
+class TextRegionLine
+{
+public:
+ qreal maxHeight = -1;
+ qreal modeHeigth = -1;
+ qreal width = -1;
+ int glyphIndex = -1;
+ QPointF baseOrigin = QPointF(-1, -1);
+ std::vector<TextRegionLine> segments = std::vector<TextRegionLine>();
+
+};
+
+class TextRegion {
that {....
------------------------------
In scribus/plugins/import/pdf/slaoutput.h
<#37 (comment)>:
> @@ -153,6 +153,73 @@ class AnoOutputDev : public OutputDev
QStringList *m_importedColors;
};
+/* PDF TextBox Framework */
+/*
+* Holds all the dtails for each glyph in the text imported from the pdf file.
+*
+*/
+struct PdfGlyph {
the { on its own line...
------------------------------
In scribus/plugins/import/pdf/slaoutput.h
<#37 (comment)>:
> @@ -306,6 +390,13 @@ class SlaOutputDev : public OutputDev
void createImageFrame(QImage& image, GfxState *state, int numColorComponents);
+ //PDF Textbox framework
if it's a framework, can it be enclosed in a class?
------------------------------
In scribus/plugins/import/pdf/slaoutput.h
<#37 (comment)>:
> @@ -371,6 +462,63 @@ class SlaOutputDev : public OutputDev
QHash<QString, QList<int> > m_radioMap;
QHash<int, PageItem*> m_radioButtons;
int m_actPage;
+
+ //PDF Textbox framework
+ std::vector<TextRegion> m_textRegions = std::vector<TextRegion>();
+};
+
+class AddFirstChar : public AddCharInterface
the implementations of Add*Char are not in use yet, but i wonder if you
could not just define them as function of the same class.
without seeing their usage it and the future plans it's hard to say, but
fom the signature it's hard to say.
also:
- the implementation of the multiple addChar() seems to be missing
- if you stick to the multiple classes, i'm not sure that they should
be defined in slaouput.h, but it's hard to say without seeing the big
picture.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#37 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AJALTFQTLZGRXRTH2GTAZFLRVSVFNANCNFSM4NXHU6JA>
.
|
|
also, I've already fixed some of the things you have commented on.
…On Tue, 9 Jun 2020 at 03:27, NASA Jeff ***@***.***> wrote:
thanks,
it seems like you only got the diff for the header file, there should have
been a lot more content in there. I'll try and find out what's going on.
On Mon, 8 Jun 2020 at 10:26, ale rimoldi ***@***.***> wrote:
> ***@***.**** commented on this pull request.
>
> hi oliver
>
> generally speaking, the code looks good.
>
> most of my comments are about small details.
>
> but i could not find any "active" code, code that does something. are you
> sure that you included all the files?
>
> that was not very important for my review, since i will have to let jean
> review most of the review on the code that directly works on the pdf (he
> has much more experience than me on the field).
>
> you can edit the branch you used for this branch to amend this pull
> request until you're ready for a further review.
> ------------------------------
>
> In scribus/plugins/import/pdf/slaoutput.h
> <#37 (comment)>:
>
> > @@ -153,6 +153,73 @@ class AnoOutputDev : public OutputDev
> QStringList *m_importedColors;
> };
>
> +/* PDF TextBox Framework */
> +/*
> +* Holds all the dtails for each glyph in the text imported from the pdf file.
> +*
> +*/
> +struct PdfGlyph {
> + QPointF position; // Absolute glyph coords
> + double dx; // X advance value
> + double dy; // Y advance value
> + double rise; // Text rise parameter
>
> IMO rise needs a better documentation... on the other side dx and dy do
> no really need one
> ------------------------------
>
> In scribus/plugins/import/pdf/slaoutput.h
> <#37 (comment)>:
>
> > @@ -153,6 +153,73 @@ class AnoOutputDev : public OutputDev
> QStringList *m_importedColors;
> };
>
> +/* PDF TextBox Framework */
> +/*
> +* Holds all the dtails for each glyph in the text imported from the pdf file.
> +*
> +*/
> +struct PdfGlyph {
> + QPointF position; // Absolute glyph coords
> + double dx; // X advance value
> + double dy; // Y advance value
> + double rise; // Text rise parameter
> + QString code; // UTF-16 coded character but we only store and use UTF-8, the slternstive is const char * for utf8 so far as qt is concerned
>
> not sure this comment is needed. using char* is imo not an alternative,
> but a possible performance enhancement, which is imo only ok if you're 100%
> sure that the exception is needed and you can prove that.
>
> on the other side: you say that code contains a character as utf-8, so
> why not using QChar and call the variable character? then, you don't
> need to document that code contains a character.
>
> my suggestion:
>
> QChar character; // UTF8
>
> finally, since the struct is called PdfGlyph, make sure that this
> variable contains a character and not a glyph... i did not check that.
> ------------------------------
>
> In scribus/plugins/import/pdf/slaoutput.h
> <#37 (comment)>:
>
> > @@ -153,6 +153,73 @@ class AnoOutputDev : public OutputDev
> QStringList *m_importedColors;
> };
>
> +/* PDF TextBox Framework */
> +/*
> +* Holds all the dtails for each glyph in the text imported from the pdf file.
> +*
> +*/
> +struct PdfGlyph {
> + QPointF position; // Absolute glyph coords
> + double dx; // X advance value
> + double dy; // Y advance value
> + double rise; // Text rise parameter
> + QString code; // UTF-16 coded character but we only store and use UTF-8, the slternstive is const char * for utf8 so far as qt is concerned
> + bool is_space;
>
> if space is not just , it's the righte place to say what you consider
> being a space.
> ------------------------------
>
> In scribus/plugins/import/pdf/pdfimportoptions.cpp
> <#37 (comment)>:
>
> > @@ -71,6 +76,8 @@ void PdfImportOptions::setUpOptions(const QString& fileName, int actPage, int nu
> ui->cropGroup->setVisible(cropPossible);
> ui->cropGroup->setChecked(cropPossible);
> ui->cropBox->setCurrentIndex(3); // Use CropBox by default
> + ui->textAsVectors->setChecked(true);
>
> even if it's probably not finished yet, i like the option of being able
> to import both the vector and text version.
>
> at the end, one of both will probably need to go a separate layer (when
> both are active)
> ------------------------------
>
> In scribus/plugins/import/pdf/slaoutput.h
> <#37 (comment)>:
>
> > +*
> +*/
> +struct PdfGlyph {
> + QPointF position; // Absolute glyph coords
> + double dx; // X advance value
> + double dy; // Y advance value
> + double rise; // Text rise parameter
> + QString code; // UTF-16 coded character but we only store and use UTF-8, the slternstive is const char * for utf8 so far as qt is concerned
> + bool is_space;
> +};
> +
> +
> +class TextRegionLine
> +{
> +public:
> + qreal maxHeight = -1;
>
> i'm not sure that initializing to -1 is a good idea.
>
> it looks like you want to initialize with invalid values... which is
> probably not the best way to do it.
>
> imo, the initial value should be valid and hint to what the variable will
> contain.
> ------------------------------
>
> In scribus/plugins/import/pdf/slaoutput.h
> <#37 (comment)>:
>
> > +*
> +*/
> +struct PdfGlyph {
> + QPointF position; // Absolute glyph coords
> + double dx; // X advance value
> + double dy; // Y advance value
> + double rise; // Text rise parameter
> + QString code; // UTF-16 coded character but we only store and use UTF-8, the slternstive is const char * for utf8 so far as qt is concerned
> + bool is_space;
> +};
> +
> +
> +class TextRegionLine
> +{
> +public:
> + qreal maxHeight = -1;
>
> (in the future we will have optional but for now, i fear that you should
> write your algorithm in a way that it can run with values that are always
> valid)
> ------------------------------
>
> In scribus/plugins/import/pdf/slaoutput.h
> <#37 (comment)>:
>
> > +
> +class TextRegionLine
> +{
> +public:
> + qreal maxHeight = -1;
> + qreal modeHeigth = -1;
> + qreal width = -1;
> + int glyphIndex = -1;
> + QPointF baseOrigin = QPointF(-1, -1);
> + std::vector<TextRegionLine> segments = std::vector<TextRegionLine>();
> +
> +};
> +
> +class TextRegion {
> +public:
> + enum FRAMEWORKLINETESTS {
>
> if you don't plan to use other enums in this class, that's fine,
> otherwise i would use a class enum to give a scope to the enuzm values.
>
> imo the enum name should be camelcase, starting with an uppercase F, not
> all uppercase. it's a type...
> ------------------------------
>
> In scribus/plugins/import/pdf/slaoutput.h
> <#37 (comment)>:
>
> > + qreal lineSpacing = -1;
> + std::vector<TextRegionLine> textRegionLines = std::vector<TextRegionLine>();
> + qreal maxWidth = -1;
> + QPointF lineBaseXY = QPointF(-1, -1); //updated with the best match left value from all the textRegionLines and the best bottom value from the textRegionLines.segments;
> + QPointF lastXY = QPointF(-1, -1);
> + static bool coLinera(qreal a, qreal b);
> + bool closeToX(qreal x1, qreal x2);
> + bool closeToY(qreal y1, qreal y2);
> + bool adjunctLesser(qreal testY, qreal lastY, qreal baseY);
> + bool adjunctGreater(qreal testY, qreal lastY, qreal baseY);
> + TextRegion::FRAMEWORKLINETESTS linearTest(QPointF point, bool xInLimits, bool yInLimits);
> + TextRegion::FRAMEWORKLINETESTS isRegionConcurrent(QPointF newPoint);
> + TextRegion::FRAMEWORKLINETESTS moveToPoint(QPointF newPoint);
> + TextRegion::FRAMEWORKLINETESTS addGlyphAtPoint(QPointF newGlyphPoint, PdfGlyph new_glyph);
> + void renderToTextFrame(PageItem* textNode, ParagraphStyle& pStyle);
> + std::vector<PdfGlyph> glyphs; //this may replace some of the other settings or it may not, certainly not font as text gets flushed if the font changes
>
> always prefix this kind of comments with your name (or abbreviation) and
> maybe add a date (even if blame can tell us)
> ------------------------------
>
> In scribus/plugins/import/pdf/slaoutput.h
> <#37 (comment)>:
>
> > @@ -232,6 +299,7 @@ class SlaOutputDev : public OutputDev
> void endMarkedContent(GfxState *state) override;
> void markPoint(POPPLER_CONST char *name) override;
> void markPoint(POPPLER_CONST char *name, Dict *properties) override;
> +
>
> empty line...
> ------------------------------
>
> In scribus/plugins/import/pdf/slaoutput.h
> <#37 (comment)>:
>
> > @@ -273,6 +342,21 @@ class SlaOutputDev : public OutputDev
> void type3D0(GfxState * /*state*/, double /*wx*/, double /*wy*/) override;
> void type3D1(GfxState * /*state*/, double /*wx*/, double /*wy*/, double /*llx*/, double /*lly*/, double /*urx*/, double /*ury*/) override;
>
> + //text as text
> +
> + AddCharInterface* addChar = nullptr;
> +
> + enum ADDCHARMODE {
>
> see above
> ------------------------------
>
> In scribus/plugins/import/pdf/slaoutput.h
> <#37 (comment)>:
>
> > +
> +class TextRegionLine
> +{
> +public:
> + qreal maxHeight = -1;
> + qreal modeHeigth = -1;
> + qreal width = -1;
> + int glyphIndex = -1;
> + QPointF baseOrigin = QPointF(-1, -1);
> + std::vector<TextRegionLine> segments = std::vector<TextRegionLine>();
> +
> +};
> +
> +class TextRegion {
> +public:
> + enum FRAMEWORKLINETESTS {
>
> ah, and the Scribus styles requires the { on tis own line...
> ------------------------------
>
> In scribus/plugins/import/pdf/slaoutput.h
> <#37 (comment)>:
>
> > +};
> +
> +
> +class TextRegionLine
> +{
> +public:
> + qreal maxHeight = -1;
> + qreal modeHeigth = -1;
> + qreal width = -1;
> + int glyphIndex = -1;
> + QPointF baseOrigin = QPointF(-1, -1);
> + std::vector<TextRegionLine> segments = std::vector<TextRegionLine>();
> +
> +};
> +
> +class TextRegion {
>
> that {....
> ------------------------------
>
> In scribus/plugins/import/pdf/slaoutput.h
> <#37 (comment)>:
>
> > @@ -153,6 +153,73 @@ class AnoOutputDev : public OutputDev
> QStringList *m_importedColors;
> };
>
> +/* PDF TextBox Framework */
> +/*
> +* Holds all the dtails for each glyph in the text imported from the pdf file.
> +*
> +*/
> +struct PdfGlyph {
>
> the { on its own line...
> ------------------------------
>
> In scribus/plugins/import/pdf/slaoutput.h
> <#37 (comment)>:
>
> > @@ -306,6 +390,13 @@ class SlaOutputDev : public OutputDev
>
> void createImageFrame(QImage& image, GfxState *state, int numColorComponents);
>
> + //PDF Textbox framework
>
> if it's a framework, can it be enclosed in a class?
> ------------------------------
>
> In scribus/plugins/import/pdf/slaoutput.h
> <#37 (comment)>:
>
> > @@ -371,6 +462,63 @@ class SlaOutputDev : public OutputDev
> QHash<QString, QList<int> > m_radioMap;
> QHash<int, PageItem*> m_radioButtons;
> int m_actPage;
> +
> + //PDF Textbox framework
> + std::vector<TextRegion> m_textRegions = std::vector<TextRegion>();
> +};
> +
> +class AddFirstChar : public AddCharInterface
>
> the implementations of Add*Char are not in use yet, but i wonder if you
> could not just define them as function of the same class.
>
> without seeing their usage it and the future plans it's hard to say, but
> fom the signature it's hard to say.
>
> also:
>
> - the implementation of the multiple addChar() seems to be missing
> - if you stick to the multiple classes, i'm not sure that they should
> be defined in slaouput.h, but it's hard to say without seeing the big
> picture.
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> <#37 (review)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AJALTFQTLZGRXRTH2GTAZFLRVSVFNANCNFSM4NXHU6JA>
> .
>
|
|
I'm seeing content in the cpp files in this pull request:
scribusproject/scribus#132
I didn't realise the vs2019 stuff would also make it into the pull request
but you can look at the individual patches to avoid looking at that.
…On Tue, 9 Jun 2020 at 03:28, NASA Jeff ***@***.***> wrote:
also, I've already fixed some of the things you have commented on.
On Tue, 9 Jun 2020 at 03:27, NASA Jeff ***@***.***> wrote:
> thanks,
> it seems like you only got the diff for the header file, there should
> have been a lot more content in there. I'll try and find out what's going
> on.
>
> On Mon, 8 Jun 2020 at 10:26, ale rimoldi ***@***.***>
> wrote:
>
>> ***@***.**** commented on this pull request.
>>
>> hi oliver
>>
>> generally speaking, the code looks good.
>>
>> most of my comments are about small details.
>>
>> but i could not find any "active" code, code that does something. are
>> you sure that you included all the files?
>>
>> that was not very important for my review, since i will have to let jean
>> review most of the review on the code that directly works on the pdf (he
>> has much more experience than me on the field).
>>
>> you can edit the branch you used for this branch to amend this pull
>> request until you're ready for a further review.
>> ------------------------------
>>
>> In scribus/plugins/import/pdf/slaoutput.h
>> <#37 (comment)>:
>>
>> > @@ -153,6 +153,73 @@ class AnoOutputDev : public OutputDev
>> QStringList *m_importedColors;
>> };
>>
>> +/* PDF TextBox Framework */
>> +/*
>> +* Holds all the dtails for each glyph in the text imported from the pdf file.
>> +*
>> +*/
>> +struct PdfGlyph {
>> + QPointF position; // Absolute glyph coords
>> + double dx; // X advance value
>> + double dy; // Y advance value
>> + double rise; // Text rise parameter
>>
>> IMO rise needs a better documentation... on the other side dx and dy do
>> no really need one
>> ------------------------------
>>
>> In scribus/plugins/import/pdf/slaoutput.h
>> <#37 (comment)>:
>>
>> > @@ -153,6 +153,73 @@ class AnoOutputDev : public OutputDev
>> QStringList *m_importedColors;
>> };
>>
>> +/* PDF TextBox Framework */
>> +/*
>> +* Holds all the dtails for each glyph in the text imported from the pdf file.
>> +*
>> +*/
>> +struct PdfGlyph {
>> + QPointF position; // Absolute glyph coords
>> + double dx; // X advance value
>> + double dy; // Y advance value
>> + double rise; // Text rise parameter
>> + QString code; // UTF-16 coded character but we only store and use UTF-8, the slternstive is const char * for utf8 so far as qt is concerned
>>
>> not sure this comment is needed. using char* is imo not an alternative,
>> but a possible performance enhancement, which is imo only ok if you're 100%
>> sure that the exception is needed and you can prove that.
>>
>> on the other side: you say that code contains a character as utf-8, so
>> why not using QChar and call the variable character? then, you don't
>> need to document that code contains a character.
>>
>> my suggestion:
>>
>> QChar character; // UTF8
>>
>> finally, since the struct is called PdfGlyph, make sure that this
>> variable contains a character and not a glyph... i did not check that.
>> ------------------------------
>>
>> In scribus/plugins/import/pdf/slaoutput.h
>> <#37 (comment)>:
>>
>> > @@ -153,6 +153,73 @@ class AnoOutputDev : public OutputDev
>> QStringList *m_importedColors;
>> };
>>
>> +/* PDF TextBox Framework */
>> +/*
>> +* Holds all the dtails for each glyph in the text imported from the pdf file.
>> +*
>> +*/
>> +struct PdfGlyph {
>> + QPointF position; // Absolute glyph coords
>> + double dx; // X advance value
>> + double dy; // Y advance value
>> + double rise; // Text rise parameter
>> + QString code; // UTF-16 coded character but we only store and use UTF-8, the slternstive is const char * for utf8 so far as qt is concerned
>> + bool is_space;
>>
>> if space is not just , it's the righte place to say what you consider
>> being a space.
>> ------------------------------
>>
>> In scribus/plugins/import/pdf/pdfimportoptions.cpp
>> <#37 (comment)>:
>>
>> > @@ -71,6 +76,8 @@ void PdfImportOptions::setUpOptions(const QString& fileName, int actPage, int nu
>> ui->cropGroup->setVisible(cropPossible);
>> ui->cropGroup->setChecked(cropPossible);
>> ui->cropBox->setCurrentIndex(3); // Use CropBox by default
>> + ui->textAsVectors->setChecked(true);
>>
>> even if it's probably not finished yet, i like the option of being able
>> to import both the vector and text version.
>>
>> at the end, one of both will probably need to go a separate layer (when
>> both are active)
>> ------------------------------
>>
>> In scribus/plugins/import/pdf/slaoutput.h
>> <#37 (comment)>:
>>
>> > +*
>> +*/
>> +struct PdfGlyph {
>> + QPointF position; // Absolute glyph coords
>> + double dx; // X advance value
>> + double dy; // Y advance value
>> + double rise; // Text rise parameter
>> + QString code; // UTF-16 coded character but we only store and use UTF-8, the slternstive is const char * for utf8 so far as qt is concerned
>> + bool is_space;
>> +};
>> +
>> +
>> +class TextRegionLine
>> +{
>> +public:
>> + qreal maxHeight = -1;
>>
>> i'm not sure that initializing to -1 is a good idea.
>>
>> it looks like you want to initialize with invalid values... which is
>> probably not the best way to do it.
>>
>> imo, the initial value should be valid and hint to what the variable
>> will contain.
>> ------------------------------
>>
>> In scribus/plugins/import/pdf/slaoutput.h
>> <#37 (comment)>:
>>
>> > +*
>> +*/
>> +struct PdfGlyph {
>> + QPointF position; // Absolute glyph coords
>> + double dx; // X advance value
>> + double dy; // Y advance value
>> + double rise; // Text rise parameter
>> + QString code; // UTF-16 coded character but we only store and use UTF-8, the slternstive is const char * for utf8 so far as qt is concerned
>> + bool is_space;
>> +};
>> +
>> +
>> +class TextRegionLine
>> +{
>> +public:
>> + qreal maxHeight = -1;
>>
>> (in the future we will have optional but for now, i fear that you should
>> write your algorithm in a way that it can run with values that are always
>> valid)
>> ------------------------------
>>
>> In scribus/plugins/import/pdf/slaoutput.h
>> <#37 (comment)>:
>>
>> > +
>> +class TextRegionLine
>> +{
>> +public:
>> + qreal maxHeight = -1;
>> + qreal modeHeigth = -1;
>> + qreal width = -1;
>> + int glyphIndex = -1;
>> + QPointF baseOrigin = QPointF(-1, -1);
>> + std::vector<TextRegionLine> segments = std::vector<TextRegionLine>();
>> +
>> +};
>> +
>> +class TextRegion {
>> +public:
>> + enum FRAMEWORKLINETESTS {
>>
>> if you don't plan to use other enums in this class, that's fine,
>> otherwise i would use a class enum to give a scope to the enuzm values.
>>
>> imo the enum name should be camelcase, starting with an uppercase F, not
>> all uppercase. it's a type...
>> ------------------------------
>>
>> In scribus/plugins/import/pdf/slaoutput.h
>> <#37 (comment)>:
>>
>> > + qreal lineSpacing = -1;
>> + std::vector<TextRegionLine> textRegionLines = std::vector<TextRegionLine>();
>> + qreal maxWidth = -1;
>> + QPointF lineBaseXY = QPointF(-1, -1); //updated with the best match left value from all the textRegionLines and the best bottom value from the textRegionLines.segments;
>> + QPointF lastXY = QPointF(-1, -1);
>> + static bool coLinera(qreal a, qreal b);
>> + bool closeToX(qreal x1, qreal x2);
>> + bool closeToY(qreal y1, qreal y2);
>> + bool adjunctLesser(qreal testY, qreal lastY, qreal baseY);
>> + bool adjunctGreater(qreal testY, qreal lastY, qreal baseY);
>> + TextRegion::FRAMEWORKLINETESTS linearTest(QPointF point, bool xInLimits, bool yInLimits);
>> + TextRegion::FRAMEWORKLINETESTS isRegionConcurrent(QPointF newPoint);
>> + TextRegion::FRAMEWORKLINETESTS moveToPoint(QPointF newPoint);
>> + TextRegion::FRAMEWORKLINETESTS addGlyphAtPoint(QPointF newGlyphPoint, PdfGlyph new_glyph);
>> + void renderToTextFrame(PageItem* textNode, ParagraphStyle& pStyle);
>> + std::vector<PdfGlyph> glyphs; //this may replace some of the other settings or it may not, certainly not font as text gets flushed if the font changes
>>
>> always prefix this kind of comments with your name (or abbreviation) and
>> maybe add a date (even if blame can tell us)
>> ------------------------------
>>
>> In scribus/plugins/import/pdf/slaoutput.h
>> <#37 (comment)>:
>>
>> > @@ -232,6 +299,7 @@ class SlaOutputDev : public OutputDev
>> void endMarkedContent(GfxState *state) override;
>> void markPoint(POPPLER_CONST char *name) override;
>> void markPoint(POPPLER_CONST char *name, Dict *properties) override;
>> +
>>
>> empty line...
>> ------------------------------
>>
>> In scribus/plugins/import/pdf/slaoutput.h
>> <#37 (comment)>:
>>
>> > @@ -273,6 +342,21 @@ class SlaOutputDev : public OutputDev
>> void type3D0(GfxState * /*state*/, double /*wx*/, double /*wy*/) override;
>> void type3D1(GfxState * /*state*/, double /*wx*/, double /*wy*/, double /*llx*/, double /*lly*/, double /*urx*/, double /*ury*/) override;
>>
>> + //text as text
>> +
>> + AddCharInterface* addChar = nullptr;
>> +
>> + enum ADDCHARMODE {
>>
>> see above
>> ------------------------------
>>
>> In scribus/plugins/import/pdf/slaoutput.h
>> <#37 (comment)>:
>>
>> > +
>> +class TextRegionLine
>> +{
>> +public:
>> + qreal maxHeight = -1;
>> + qreal modeHeigth = -1;
>> + qreal width = -1;
>> + int glyphIndex = -1;
>> + QPointF baseOrigin = QPointF(-1, -1);
>> + std::vector<TextRegionLine> segments = std::vector<TextRegionLine>();
>> +
>> +};
>> +
>> +class TextRegion {
>> +public:
>> + enum FRAMEWORKLINETESTS {
>>
>> ah, and the Scribus styles requires the { on tis own line...
>> ------------------------------
>>
>> In scribus/plugins/import/pdf/slaoutput.h
>> <#37 (comment)>:
>>
>> > +};
>> +
>> +
>> +class TextRegionLine
>> +{
>> +public:
>> + qreal maxHeight = -1;
>> + qreal modeHeigth = -1;
>> + qreal width = -1;
>> + int glyphIndex = -1;
>> + QPointF baseOrigin = QPointF(-1, -1);
>> + std::vector<TextRegionLine> segments = std::vector<TextRegionLine>();
>> +
>> +};
>> +
>> +class TextRegion {
>>
>> that {....
>> ------------------------------
>>
>> In scribus/plugins/import/pdf/slaoutput.h
>> <#37 (comment)>:
>>
>> > @@ -153,6 +153,73 @@ class AnoOutputDev : public OutputDev
>> QStringList *m_importedColors;
>> };
>>
>> +/* PDF TextBox Framework */
>> +/*
>> +* Holds all the dtails for each glyph in the text imported from the pdf file.
>> +*
>> +*/
>> +struct PdfGlyph {
>>
>> the { on its own line...
>> ------------------------------
>>
>> In scribus/plugins/import/pdf/slaoutput.h
>> <#37 (comment)>:
>>
>> > @@ -306,6 +390,13 @@ class SlaOutputDev : public OutputDev
>>
>> void createImageFrame(QImage& image, GfxState *state, int numColorComponents);
>>
>> + //PDF Textbox framework
>>
>> if it's a framework, can it be enclosed in a class?
>> ------------------------------
>>
>> In scribus/plugins/import/pdf/slaoutput.h
>> <#37 (comment)>:
>>
>> > @@ -371,6 +462,63 @@ class SlaOutputDev : public OutputDev
>> QHash<QString, QList<int> > m_radioMap;
>> QHash<int, PageItem*> m_radioButtons;
>> int m_actPage;
>> +
>> + //PDF Textbox framework
>> + std::vector<TextRegion> m_textRegions = std::vector<TextRegion>();
>> +};
>> +
>> +class AddFirstChar : public AddCharInterface
>>
>> the implementations of Add*Char are not in use yet, but i wonder if you
>> could not just define them as function of the same class.
>>
>> without seeing their usage it and the future plans it's hard to say, but
>> fom the signature it's hard to say.
>>
>> also:
>>
>> - the implementation of the multiple addChar() seems to be missing
>> - if you stick to the multiple classes, i'm not sure that they
>> should be defined in slaouput.h, but it's hard to say without seeing
>> the big picture.
>>
>> —
>> You are receiving this because you authored the thread.
>> Reply to this email directly, view it on GitHub
>> <#37 (review)>,
>> or unsubscribe
>> <https://github.com/notifications/unsubscribe-auth/AJALTFQTLZGRXRTH2GTAZFLRVSVFNANCNFSM4NXHU6JA>
>> .
>>
>
|
|
hi oliver, since i started the review in here, i would prefer to finish it here too. can you please make a (or multiple) commit with the missing files (and, of course, without the VS specific files). the goal is also that at the end you can use the PR to create the patch for scribus... without having to hack around. |
|
and, of course, you don't need my ok before you submit your patch to the bug tracker. i only want to help you to create a patch that has more chances to be accepted by jean (and craig). |
move the enums into their own class
replace addChar classes with dynamically mapped textFrame function pointers
fix a couple of potential overflows that were picked upo as compiler wqarnings
fix { should be on a new line for new and some existing code
fix regression, don't set lastxy if were adding the first char
makje addchar private and implement a public caller not that we have setCharMode
fixed regression bug that caused textimport to crash.
all the changes following the review: still todo function pointer tidy up and code formatting of irrelivant code put in a seperate patch
rename textFramework and tidyup AaddChar function poiinters. one last pass at using function pointers as it id upto 20% faster and no slower than using switch
aoloe
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it already feels much better!
i have commented inline and have also two general remarks:
1
don't put so many comments and TODO / FIXME in the the code. put them in a comment before the beginning of the function as a summary of your future plans.
and make sure that all of them are still needed AND correctly spelled. and that commented out code is really waiting to be reactivated / changed n the future.
currently, the code is hard to read and the reviewer gets the impression that the code does not work at all.
2
most of your code should probably go into a separate file, not in slaoutput. pdftextimport.h|cpp might be a good name.
it would probably also make it easier for jean to accept it, since the changes would be well confined.
…tching newline comments, tidyups and seperate pdftextrecognition has regersion in matching newline that was probably introduced when I implemented ale, if else simplification without properly checking it first.
… on invariants fixed layout regression, this was caused by basing code on an outdated version of the positioning result returned by linearTest. , made qDebug output conditional based on a precompiler defin, added niotes on invariants
collating pdf text and displaying it in scribus textframes