TransWikia.com

unwanted page break in multilanguage document

TeX - LaTeX Asked on December 1, 2021

I am writing a thesis having Arabic and English text. I complete it, but I am facing an unwanted page break at a different page. Almost 1/4 of the bottom page remain blank there. These page break happens between (1) items of the enumerate environment and (2) also somewhere in the paragraph the newline jumps to the new page.

documentclass[12pt]{report}
% margins
usepackage[top=1.0in, bottom=1.0in, left=1.5in, right=1.0in]{geometry}
%usepackage[top=1.0in, bottom=1.2in, left=1.7in, right=1.0in]{geometry}
% double spacing
usepackage{setspace}
doublespacing
usepackage[table]{xcolor}
% graphics and subfigures
usepackage{graphicx}
usepackage[captionskip=8pt, nearskip=10pt]{subfig}
usepackage{float}
usepackage{lscape}
usepackage{tikz}
usepackage{multirow}
usepackage[english]{babel}
usepackage{arabtex}
usepackage{utf8}
setcode{utf8}
usepackage[UTF8]{ctex}
% citations
usepackage{cite}
% abbreviations
usepackage{nomencl}
makenomenclature
% math
usepackage{amsmath}
usepackage{amssymb}
%usepackage{enumerate}
%usepackage{enumitem}
usepackage{float}
usepackage{lmodern}
usepackage{array}
usepackage{longtable}
% algorithms
usepackage{algorithm}
usepackage{algorithmic}
parindent=0pt
%renewcommand{listalgorithmname}{LIST OF ALGORITHMS}
%renewcommand{thealgorithm}{thechapter.arabic{algorithm}}
%newcommand{listofalgorithmsbreak}{addtocontents{loa}{protectvspace{8pt}}}
usepackage{titlesec}
setcounter{secnumdepth}{4}
%setcounter{tocdepth}{4}
titleformat{paragraph}
{normalfontnormalsizebfseries}{theparagraph}{1em}{}
titlespacing*{paragraph}{0pt}{3.25ex plus 1ex minus .2ex}{1.5ex plus .2ex}
%definitions
newtheorem {myDef} {Definition}
newtheorem{myEx}{Example}
%newcommand{tabincell}[2]{begin{tabular}{@{}#1@{}}#2end{tabular}}
% capitalize names of sections
defcontentsname{TABLE OF CONTENTS}
deflisttablename{LIST OF TABLES}
deflistfigurename{LIST OF FIGURES}
defbibname{REFERENCES}
% section headers
%
% http://www.latex-community.org/forum/viewtopic.php+f=5&t=1270
%
%makeatletter
%
%renewcommand*@makechapterhead[1]{%
%  vspace*{50p@}%
%  {parindent z@ centering normalfont
%    hugebfseries
%    ifnum c@secnumdepth >m@ne
%         thechapter.space
%    fi
%    #1parnobreak
%    vskip 36p@
%  }}
%makeatother
%
%makeatletter
%renewcommand*@makeschapterhead[1]{%
%  vspace*{50p@}%
%  {parindent z@ centering normalfont
%    hugebfseries
%    ifnum c@secnumdepth >m@ne
%    fi
%    #1parnobreak
%    vskip 36p@
%  }}
%makeatother
definecolor{orange1}{RGB}{250,50,10}
begin{document}
begin{enumerate}
item[A-] Tokenization: tokenization process is among the widely adopted methods in sentiment analysis projects where the text is divided into individual token of words (terms) cite{Awajan2018}, characters and phrases cite{Mansour2017} in the process of sentence segmentation cite{Awajan2018}.
item[B-] Case Normalization: in case normalization, the entire sentences or documents are converted into lowercase or vice versa cite{Joshi2014}. Moreover, in Arabic language, there are several characters that could come in various shapes (for instance, Taa Marboutah “RL{ة}” and Haa Marboutah “RL{ه}”). Consequently, this stage addresses the normalization of the characters’ spelling cite{Awajan2018}. In English language, if all the characters in a certain term are in capital, this could reflect a strong emotion or sentiment cite{Mansour2017}.
item[C-] Exclusion of Foreigh Letters: in English langusage, that data that does not include the letters of {A-Z, a-z} should be excluded cite{Awajan2018}.
item[D-] Stop Word Removal: in many cases the stop words are useless for the processing. Hence, they are discarded because it saves space and time cite{Awajan2018}. Moreover, an example of Arabic stop words is ‘fe’,’lan’,’kan’ RL{في, لن, كان}. The benefits from removing these words are to enhance effectiveness, enhance response time and decrease index space. However, a one unified list of stop words that should be deleted does not exist yet cite{Mansour2017}.
item[E-] Handling Negation: through the use of special words (such as not, no, never and so forth) the sentiment polarity is transformed from negative to positive or vice-versa cite{Awajan2018}.
item[F-] Acronyms Expansion: the acronyms are expanded to their original terms via a dictionary of acronym cite{Awajan2018}.
item[G-] Spelling Checking: the reviews and social media include data that has various mistakes in spelling as extra or missing letters which should be corrected cite{Mansour2017}.  
item[H-] Replacing Characters: the benefit gained from replacing the characters forms with different forms is to enhance the prediction accuracy. For instance, in Arabic language, there is some differences in the letters as follows cite{Mansour2017}:
item[-] Hamza: the hamza letter “RL{ء}” is interchangeable based on the position and the word. It contains .RL{ئ, ؤ, أ}
item[-] Ta Marboutah is some times used interchangeably with the ha RL{ه}.
item[-] Alef: there are multiple forms of this character in Arabic language that are interchangeably used, which are RL{ا, أ, إ}. Moreover, Alef Maqsourah RL{ى} is some times mistakenly written as RL{ي}.
item[I-] Identify or Delete Punctuations: deletion of punctuations that include commas and stop words because they are not needed to identify the polarity of the text. In some situations, the punctuations could reflect the polarity of the text, for instance, the question mark that reflect perplexity or the exclamation mark that reflect a strong feeling cite{Mansour2017} (anger, delight, wonder and so forth cite{Maryland}) in the text.
item[J-] Stemming or Lemmatizing: stemming process targets reducing related tokens to be of one sort. The common stemming process includes the identification and removal of suffixes, prefixes and inappropriate pluralization cite{Joshi2014}. There is light stemming, statistical stemmer, root-based stemming and a hybrid technique cite{Awajan2018}. On the other hand, lemmatization works through identifying the basic form “lemma” to every inflected term form in a given sentence. The advantages of lemmatization match those of stemming cite{Laurikkala2004}. For instance, the term inflations as going, goes and gone will be stemmed to “go” but the mapping of the term “went” will not be to “go”. On the other hand, the term “went” will be lemmatized as the lemma “go”.  Furthermore, the following explanation illustrates stemming and lemmatizing cite{Jivani2011}:
item[K-] Stemming: introduces, introducing, introduction begin{tikzpicture}
draw[->,>=stealth,line width = 3pt] (0,0) -- (1.5,0);
end{tikzpicture}  introduc \
Goes, going, gone begin{tikzpicture}
draw[->,>=stealth,line width = 3pt] (0,0) -- (1.5,0);
end{tikzpicture} go \
Lemmatizing: introduces, introducing, introduction begin{tikzpicture}
draw[->,>=stealth,line width = 3pt] (0,0) -- (1.5,0);
end{tikzpicture} introduce\
Goes, going, gone begin{tikzpicture}
draw[->,>=stealth,line width = 3pt] (0,0) -- (1.5,0);
end{tikzpicture} went, go
item[L-] Filtering: through deleting irrelevant data such as emoticons, special words, repeated letters, URL links, user names and so forth cite{Awajan2018}.
end{enumerate}
end{document}

One Answer

i have found a trick to solve the issue using minipage, where the page break issue exist. but this is obviously not the best solution. Any one having the better solution please share..

Answered by awan on December 1, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP