TransWikia.com

QGIS How are data points of the same value classified for Equal Count (Quantile)?

Geographic Information Systems Asked by tdgsysjrgv on January 5, 2021

For example, I have 100 data points and am classifying them using the Equal Count (Quantile) method.

I want to have 5 classes, and in the ideal situation I will have 100/5 = 20 values in each class. However, I have 35 data values which are 0.

Would QGIS classify all 35 zeros into one class, or split them into 2 classes at random?

I have tried for one dataset and I noticed that all the zeros are grouped together. Is that the case every time?

Couldn’t find any documentation for how they split the data for Equal Count.

I tried to read the source code at https://github.com/qgis/QGIS/blob/master/src/core/classification/qgsclassificationmethod.cpp to gain some insights. However, I see how they generated the breaks but I can’t find where they classify the values into the separate groups. Can someone tell me where and how the code works for this part?

2 Answers

Here's some insight: https://issues.qgis.org/issues/21451

But in short, items with the same value need to be assigned the same rank, meaning sometimes you'll get differing numbers in each quantile. Think about it this way. If you have 2 first-place teams there won't be any second-place team.

For my own use, I created a processing algorithm for this that adds and populates a ranking field. I've been meaning to add this to a GitHub repo, and just have. You can find it in its ragged glory here:

https://github.com/davidlgalt/locitools/

Disclaimer: When it comes to PyQGIS & GitHub I am still on the uphill side of the learning curve.

Correct answer by David Galt on January 5, 2021

it looks like the behavior you saw should be reproduced every time. I believe the code you found (and linked) is general and used for every type of classification which QGIS makes available in the symbology tab. It is reliant on the calculateBreaks() method which is defined separately for each type of classification.

Here is the link to the equal count (quantile) implementation of calculateBreaks(): https://github.com/qgis/QGIS/blob/master/src/core/classification/qgsclassificationquantile.cpp

The way these breaks are generated uses a static formula in the code linked above. Then, the code you found is used to assign each data point into a category between those breaks. All ties will still be between one break and another and will be sorted into the same place.

I'm not totally sure what the static formula is doing line by line or I'd give a better explanation, but it seems to be essentially following the steps described on statistics how to here: https://www.statisticshowto.com/quantile-definition-find-easy-steps/

Answered by Randcelot on January 5, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP