сортировка массива целых чисел лексикографически C++

Question

сортировка массива целых чисел лексикографически C++

Я хочу отсортировать большой массив целых чисел (скажем, 1 миллон элементов) лексикографически.

пример:

input [] = { 100, 21 , 22 , 99 , 1  , 927 }
sorted[] = { 1  , 100, 21 , 22 , 927, 99  }

Я сделал это, используя самый простой способ:

преобразование всех чисел в строки (очень дорого, потому что это займет огромную память)
использовать std:sort С strcmp как функция сравнения
преобразование строк в целые числа

есть ли лучший способ, чем этот?

17

arrays c++ lexicographic sorting

автор: dyp

11 ответов

автор: Oswald · Accepted Answer · 2013-10-25 12:03:59

использовать std::sort() С соответствующей функции сравнения. Это сокращает требования к памяти.

функция сравнения может использовать n % 10, n / 10 % 10, n / 100 % 10 etc. для доступа к отдельным цифрам (для положительных целых чисел; отрицательные числа работают немного по-разному).

автор: Lightness Races in Orbit · Accepted Answer · 2017-05-23 12:09:20

чтобы обеспечить любой заказ сортировки, вы можете предоставить компаратор std::sort. В этом случае это будет несколько сложно, используя логарифмы для проверки отдельных цифр вашего номера в базе 10.

вот пример - комментарии inline описывают, что происходит.

#include <iostream>
#include <algorithm>
#include <cmath>
#include <cassert>

int main() {
    int input[] { 100, 21, 22, 99, 1, 927, -50, -24, -160 };

    /**
     * Sorts the array lexicographically.
     * 
     * The trick is that we have to compare digits left-to-right
     * (considering typical Latin decimal notation) and that each of
     * two numbers to compare may have a different number of digits.
     * 
     * This is very efficient in storage space, but inefficient in
     * execution time; an approach that pre-visits each element and
     * stores a translated representation will at least double your
     * storage requirements (possibly a problem with large inputs)
     * but require only a single translation of each element.
     */
    std::sort(
        std::begin(input),
        std::end(input),
        [](int lhs, int rhs) -> bool {
            // Returns true if lhs < rhs
            // Returns false otherwise
            const auto BASE      = 10;
            const bool LHS_FIRST = true;
            const bool RHS_FIRST = false;
            const bool EQUAL     = false;


            // There's no point in doing anything at all
            // if both inputs are the same; strict-weak
            // ordering requires that we return `false`
            // in this case.
            if (lhs == rhs) {
                return EQUAL;
            }

            // Compensate for sign
            if (lhs < 0 && rhs < 0) {
                // When both are negative, sign on its own yields
                // no clear ordering between the two arguments.
                // 
                // Remove the sign and continue as for positive
                // numbers.
                lhs *= -1;
                rhs *= -1;
            }
            else if (lhs < 0) {
                // When the LHS is negative but the RHS is not,
                // consider the LHS "first" always as we wish to
                // prioritise the leading '-'.
                return LHS_FIRST;
            }
            else if (rhs < 0) {
                // When the RHS is negative but the LHS is not,
                // consider the RHS "first" always as we wish to
                // prioritise the leading '-'.
                return RHS_FIRST;
            }

            // Counting the number of digits in both the LHS and RHS
            // arguments is *almost* trivial.
            const auto lhs_digits = (
                lhs == 0
                ? 1
                : std::ceil(std::log(lhs+1)/std::log(BASE))
            );

            const auto rhs_digits = (
                rhs == 0
                ? 1
                : std::ceil(std::log(rhs+1)/std::log(BASE))
            );

            // Now we loop through the positions, left-to-right,
            // calculating the digit at these positions for each
            // input, and comparing them numerically. The
            // lexicographic nature of the sorting comes from the
            // fact that we are doing this per-digit comparison
            // rather than considering the input value as a whole.
            const auto max_pos = std::max(lhs_digits, rhs_digits);
            for (auto pos = 0; pos < max_pos; pos++) {
                if (lhs_digits - pos == 0) {
                    // Ran out of digits on the LHS;
                    // prioritise the shorter input
                    return LHS_FIRST;
                }
                else if (rhs_digits - pos == 0) {
                    // Ran out of digits on the RHS;
                    // prioritise the shorter input
                    return RHS_FIRST;
                }
                else {
                    const auto lhs_x = (lhs / static_cast<decltype(BASE)>(std::pow(BASE, lhs_digits - 1 - pos))) % BASE;
                    const auto rhs_x = (rhs / static_cast<decltype(BASE)>(std::pow(BASE, rhs_digits - 1 - pos))) % BASE;

                    if (lhs_x < rhs_x)
                        return LHS_FIRST;
                    else if (rhs_x < lhs_x)
                        return RHS_FIRST;
                }
            }

            // If we reached the end and everything still
            // matches up, then something probably went wrong
            // as I'd have expected to catch this in the tests
            // for equality.
            assert("Unknown case encountered");
        }
    );

    std::cout << '{';
    for (auto x : input)
        std::cout << x << ", ";
    std::cout << '}';

    // Output: {-160, -24, -50, 1, 100, 21, 22, 927, 99, }
}

демо

есть более быстрые способы рассчитать количество цифр в числе, но выше вы начнете.

автор: dyp · Accepted Answer · 2017-05-23 12:21:48

вот еще один алгоритм, который выполняет некоторые вычисления, прежде чем сортировка. Кажется, это довольно быстро, несмотря на дополнительное копирование (см. сравнения).

Примечание:

он поддерживает только положительные целые числа
in поддерживает только целые числа std::numeric_limits<int>::max()/10

_{Б. Н. Вы можете оптимизировать count_digits и my_pow10; см., например,три подсказки оптимизации для C++ от Андрея Александреску и любой способ быстрее, чем pow () для вычисления целочисленной мощности 10 В C++?}

помощники:

#include <random>
#include <vector>
#include <utility>
#include <cmath>
#include <iostream>
#include <algorithm>
#include <limits>
#include <iterator>

// non-optimized version
int count_digits(int p) // returns `0` for `p == 0`
{
    int res = 0;
    for(; p != 0; ++res)
    {
        p /= 10;
    }
    return res;
}

// non-optimized version
int my_pow10(unsigned exp)
{
    int res = 1;
    for(; exp != 0; --exp)
    {
        res *= 10;
    }
    return res;
}

(Примечание - не на месте):

// helper to provide integers with the same number of digits
template<class T, class U>
std::pair<T, T> lexicographic_pair_helper(T const p, U const maxDigits)
{
    auto const digits = count_digits(p);
    // append zeros so that `l` has `maxDigits` digits
    auto const l = static_cast<T>( p  * my_pow10(maxDigits-digits) );
    return {l, p};
}

template<class RaIt>
using pair_vec
    = std::vector<std::pair<typename std::iterator_traits<RaIt>::value_type,
                            typename std::iterator_traits<RaIt>::value_type>>;

template<class RaIt>
pair_vec<RaIt> lexicographic_sort(RaIt p_beg, RaIt p_end)
{
    if(p_beg == p_end) return {};

    auto max = *std::max_element(p_beg, p_end);
    auto maxDigits = count_digits(max);

    pair_vec<RaIt> result;
    result.reserve( std::distance(p_beg, p_end) );

    for(auto i = p_beg; i != p_end; ++i)
        result.push_back( lexicographic_pair_helper(*i, maxDigits) );

    using value_type = typename pair_vec<RaIt>::value_type;

    std::sort(begin(result), end(result),
              [](value_type const& l, value_type const& r)
              {
                  if(l.first < r.first) return true;
                  if(l.first > r.first) return false;
                  return l.second < r.second; }
             );

    return result;
}

пример использования:

int main()
{
    std::vector<int> input = { 100, 21 , 22 , 99 , 1  , 927 };
    // generate some numbers
    /*{
        constexpr int number_of_elements = 1E6;
        std::random_device rd;
        std::mt19937 gen( rd() );
        std::uniform_int_distribution<>
            dist(0, std::numeric_limits<int>::max()/10);
        for(int i = 0; i < number_of_elements; ++i)
            input.push_back( dist(gen) );
    }*/

    std::cout << "unsorted: ";
    for(auto const& e : input) std::cout << e << ", ";
    std::cout << "\n\n";


    auto sorted = lexicographic_sort(begin(input), end(input));

    std::cout << "sorted: ";
    for(auto const& e : sorted) std::cout << e.second << ", ";
    std::cout << "\n\n";
}

автор: DyP · Accepted Answer · 2013-10-26 21:39:56

вот сообщество wiki для сравнения решений. Я взял код ним и сделал его легко расширяемым. Не стесняйтесь добавлять свои решения и результаты.

образец запускает старый медленный компьютер (3 GB RAM, Core2Duo U9400) с g++4.9 @ -O3 -march=native:

number of elements: 1e+03
size of integer type: 4

reference solution: Lightness Races in Orbit

solution "dyp":
    duration: 0 ms and 301 microseconds
    comparison to reference solution: exact match
solution "Nim":
    duration: 2 ms and 160 microseconds
    comparison to reference solution: exact match
solution "nyarlathotep":
    duration: 8 ms and 126 microseconds
    comparison to reference solution: exact match
solution "notbad":
    duration: 1 ms and 102 microseconds
    comparison to reference solution: exact match
solution "Eric Postpischil":
    duration: 2 ms and 550 microseconds
    comparison to reference solution: exact match
solution "Lightness Races in Orbit":
    duration: 17 ms and 469 microseconds
    comparison to reference solution: exact match
solution "pts":
    duration: 1 ms and 92 microseconds
    comparison to reference solution: exact match

==========================================================

number of elements: 1e+04
size of integer type: 4

reference solution: Lightness Races in Orbit

solution "nyarlathotep":
    duration: 109 ms and 712 microseconds
    comparison to reference solution: exact match
solution "Lightness Races in Orbit":
    duration: 272 ms and 819 microseconds
    comparison to reference solution: exact match
solution "dyp":
    duration: 1 ms and 748 microseconds
    comparison to reference solution: exact match
solution "notbad":
    duration: 16 ms and 115 microseconds
    comparison to reference solution: exact match
solution "pts":
    duration: 15 ms and 10 microseconds
    comparison to reference solution: exact match
solution "Eric Postpischil":
    duration: 33 ms and 301 microseconds
    comparison to reference solution: exact match
solution "Nim":
    duration: 17 ms and 83 microseconds
    comparison to reference solution: exact match

==========================================================

number of elements: 1e+05
size of integer type: 4

reference solution: Lightness Races in Orbit

solution "Nim":
    duration: 217 ms and 4 microseconds
    comparison to reference solution: exact match
solution "pts":
    duration: 199 ms and 505 microseconds
    comparison to reference solution: exact match
solution "dyp":
    duration: 20 ms and 330 microseconds
    comparison to reference solution: exact match
solution "Eric Postpischil":
    duration: 415 ms and 477 microseconds
    comparison to reference solution: exact match
solution "Lightness Races in Orbit":
    duration: 3955 ms and 58 microseconds
    comparison to reference solution: exact match
solution "notbad":
    duration: 215 ms and 259 microseconds
    comparison to reference solution: exact match
solution "nyarlathotep":
    duration: 1341 ms and 46 microseconds
    comparison to reference solution: mismatch found

==========================================================

number of elements: 1e+06
size of integer type: 4

reference solution: Lightness Races in Orbit

solution "Lightness Races in Orbit":
    duration: 52861 ms and 314 microseconds
    comparison to reference solution: exact match
solution "Eric Postpischil":
    duration: 4757 ms and 608 microseconds
    comparison to reference solution: exact match
solution "nyarlathotep":
    duration: 15654 ms and 195 microseconds
    comparison to reference solution: mismatch found
solution "dyp":
    duration: 233 ms and 779 microseconds
    comparison to reference solution: exact match
solution "pts":
    duration: 2181 ms and 634 microseconds
    comparison to reference solution: exact match
solution "Nim":
    duration: 2539 ms and 9 microseconds
    comparison to reference solution: exact match
solution "notbad":
    duration: 2675 ms and 362 microseconds
    comparison to reference solution: exact match

==========================================================

number of elements: 1e+07
size of integer type: 4

reference solution: Lightness Races in Orbit

solution "notbad":
    duration: 33425 ms and 423 microseconds
    comparison to reference solution: exact match
solution "pts":
    duration: 26000 ms and 398 microseconds
    comparison to reference solution: exact match
solution "Eric Postpischil":
    duration: 56206 ms and 359 microseconds
    comparison to reference solution: exact match
solution "Lightness Races in Orbit":
    duration: 658540 ms and 342 microseconds
    comparison to reference solution: exact match
solution "nyarlathotep":
    duration: 187064 ms and 518 microseconds
    comparison to reference solution: mismatch found
solution "Nim":
    duration: 30519 ms and 227 microseconds
    comparison to reference solution: exact match
solution "dyp":
    duration: 2624 ms and 644 microseconds
    comparison to reference solution: exact match

алгоритмы должны быть структурами с шаблонами операторов вызова функций, которые поддерживают интерфейс:

template<class RaIt> operator()(RaIt begin, RaIt end);

копия входных данных предоставляется в качестве параметра, ожидается, что алгоритм предоставит результат в том же диапазоне (например, сортировка на месте).

#include <iostream>
#include <vector>
#include <algorithm>
#include <iterator>
#include <random>
#include <vector>
#include <utility>
#include <cmath>
#include <cassert>
#include <chrono>
#include <cstring>
#include <climits>
#include <functional>
#include <cstdlib>
#include <iomanip>

using duration_t = decltype(  std::chrono::high_resolution_clock::now()
                            - std::chrono::high_resolution_clock::now());

template<class T>
struct result_t
{
    std::vector<T> numbers;
    duration_t duration;
    char const* name;
};

template<class RaIt, class F>
result_t<typename std::iterator_traits<RaIt>::value_type>
apply_algorithm(RaIt p_beg, RaIt p_end, F f, char const* name)
{
    using value_type = typename std::iterator_traits<RaIt>::value_type;

    std::vector<value_type> inplace(p_beg, p_end);

    auto start = std::chrono::high_resolution_clock::now();

    f(begin(inplace), end(inplace));

    auto end = std::chrono::high_resolution_clock::now();
    auto duration = end - start;

    return {std::move(inplace), duration, name};
}

// non-optimized version
int count_digits(int p) // returns `0` for `p == 0`
{
        int res = 0;
        for(; p != 0; ++res)
        {
            p /= 10;
        }
        return res;
}

// non-optimized version
int my_pow10(unsigned exp)
{
        int res = 1;
        for(; exp != 0; --exp)
        {
            res *= 10;
        }
        return res;
}


// !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
// paste algorithms here
// !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!


int main(int argc, char** argv)
{
    using integer_t = int;
    constexpr integer_t dist_min = 0;
    constexpr integer_t dist_max = std::numeric_limits<integer_t>::max()/10;
    constexpr std::size_t default_number_of_elements = 1E6;

    const std::size_t number_of_elements = argc>1 ? std::atoll(argv[1]) :
                                           default_number_of_elements;
    std::cout << "number of elements: ";
    std::cout << std::scientific << std::setprecision(0);
    std::cout << (double)number_of_elements << "\n";
    std::cout << /*std::defaultfloat <<*/ std::setprecision(6);
    std::cout.unsetf(std::ios_base::floatfield);

    std::cout << "size of integer type: " << sizeof(integer_t) << "\n\n";

    std::vector<integer_t> input;
    {
        input.reserve(number_of_elements);

        std::random_device rd;
        std::mt19937 gen( rd() );
        std::uniform_int_distribution<> dist(dist_min, dist_max);

        for(std::size_t i = 0; i < number_of_elements; ++i)
            input.push_back( dist(gen) );
    }

    auto b = begin(input);
    auto e = end(input);

    using res_t = result_t<integer_t>;
    std::vector< std::function<res_t()> > algorithms;

    #define MAKE_BINDER(B, E, ALGO, NAME) \
        std::bind( &apply_algorithm<decltype(B),decltype(ALGO)>, \
                   B,E,ALGO,NAME )
    constexpr auto lightness_name = "Lightness Races in Orbit";
    algorithms.push_back( MAKE_BINDER(b, e, lightness(), lightness_name) );
    algorithms.push_back( MAKE_BINDER(b, e, dyp(), "dyp") );
    algorithms.push_back( MAKE_BINDER(b, e, nim(), "Nim") );
    algorithms.push_back( MAKE_BINDER(b, e, pts(), "pts") );
    algorithms.push_back( MAKE_BINDER(b, e, epost(), "Eric Postpischil") );
    algorithms.push_back( MAKE_BINDER(b, e, nyar(), "nyarlathotep") );
    algorithms.push_back( MAKE_BINDER(b, e, notbad(), "notbad") );

    {
        std::srand( std::random_device()() );
        std::random_shuffle(begin(algorithms), end(algorithms));
    }

    std::vector< result_t<integer_t> > res;
    for(auto& algo : algorithms)
        res.push_back( algo() );

    auto reference_solution
        = *std::find_if(begin(res), end(res),
                        [](result_t<integer_t> const& p)
                        { return 0 == std::strcmp(lightness_name, p.name); });
    std::cout << "reference solution: "<<reference_solution.name<<"\n\n";

    for(auto const& e : res)
    {
        std::cout << "solution \""<<e.name<<"\":\n";
        auto ms =
            std::chrono::duration_cast<std::chrono::microseconds>(e.duration);
        std::cout << "\tduration: "<<ms.count()/1000<<" ms and "
                                   <<ms.count()%1000<<" microseconds\n";

        std::cout << "\tcomparison to reference solution: ";
        if(e.numbers.size() != reference_solution.numbers.size())
        {
            std::cout << "ouput count mismatch\n";
            break;
        }

        auto mismatch = std::mismatch(begin(e.numbers), end(e.numbers),
                                      begin(reference_solution.numbers)).first;
        if(end(e.numbers) == mismatch)
        {
            std::cout << "exact match\n";
        }else
        {
            std::cout << "mismatch found\n";
        }
    }
}

текущие алгоритмы; обратите внимание, что Я заменил счетчики цифр и pow-of-10 глобальной функцией, поэтому мы все выиграем, если кто-то оптимизирует.

struct lightness
{
    template<class RaIt> void operator()(RaIt b, RaIt e)
    {
        using T = typename std::iterator_traits<RaIt>::value_type;

        /**
         * Sorts the array lexicographically.
         *
         * The trick is that we have to compare digits left-to-right
         * (considering typical Latin decimal notation) and that each of
         * two numbers to compare may have a different number of digits.
         *
         * This is very efficient in storage space, but inefficient in
         * execution time; an approach that pre-visits each element and
         * stores a translated representation will at least double your
         * storage requirements (possibly a problem with large inputs)
         * but require only a single translation of each element.
         */
        std::sort(
            b,
            e,
            [](T lhs, T rhs) -> bool {
                // Returns true if lhs < rhs
                // Returns false otherwise
                const auto BASE      = 10;
                const bool LHS_FIRST = true;
                const bool RHS_FIRST = false;
                const bool EQUAL     = false;


                // There's no point in doing anything at all
                // if both inputs are the same; strict-weak
                // ordering requires that we return `false`
                // in this case.
                if (lhs == rhs) {
                    return EQUAL;
                }

                // Compensate for sign
                if (lhs < 0 && rhs < 0) {
                    // When both are negative, sign on its own yields
                    // no clear ordering between the two arguments.
                    //
                    // Remove the sign and continue as for positive
                    // numbers.
                    lhs *= -1;
                    rhs *= -1;
                }
                else if (lhs < 0) {
                    // When the LHS is negative but the RHS is not,
                    // consider the LHS "first" always as we wish to
                    // prioritise the leading '-'.
                    return LHS_FIRST;
                }
                else if (rhs < 0) {
                    // When the RHS is negative but the LHS is not,
                    // consider the RHS "first" always as we wish to
                    // prioritise the leading '-'.
                    return RHS_FIRST;
                }

                // Counting the number of digits in both the LHS and RHS
                // arguments is *almost* trivial.
                const auto lhs_digits = (
                    lhs == 0
                    ? 1
                    : std::ceil(std::log(lhs+1)/std::log(BASE))
                );

                const auto rhs_digits = (
                    rhs == 0
                    ? 1
                    : std::ceil(std::log(rhs+1)/std::log(BASE))
                );

                // Now we loop through the positions, left-to-right,
                // calculating the digit at these positions for each
                // input, and comparing them numerically. The
                // lexicographic nature of the sorting comes from the
                // fact that we are doing this per-digit comparison
                // rather than considering the input value as a whole.
                const auto max_pos = std::max(lhs_digits, rhs_digits);
                for (auto pos = 0; pos < max_pos; pos++) {
                    if (lhs_digits - pos == 0) {
                        // Ran out of digits on the LHS;
                        // prioritise the shorter input
                        return LHS_FIRST;
                    }
                    else if (rhs_digits - pos == 0) {
                        // Ran out of digits on the RHS;
                        // prioritise the shorter input
                        return RHS_FIRST;
                    }
                    else {
                        const auto lhs_x = (lhs / static_cast<decltype(BASE)>(std::pow(BASE, lhs_digits - 1 - pos))) % BASE;
                        const auto rhs_x = (rhs / static_cast<decltype(BASE)>(std::pow(BASE, rhs_digits - 1 - pos))) % BASE;

                        if (lhs_x < rhs_x)
                            return LHS_FIRST;
                        else if (rhs_x < lhs_x)
                            return RHS_FIRST;
                    }
                }

                // If we reached the end and everything still
                // matches up, then something probably went wrong
                // as I'd have expected to catch this in the tests
                // for equality.
                assert("Unknown case encountered");

                // dyp: suppress warning and throw
                throw "up";
            }
        );
    }
};

namespace ndyp
{
    // helper to provide integers with the same number of digits
    template<class T, class U>
    std::pair<T, T> lexicographic_pair_helper(T const p, U const maxDigits)
    {
        auto const digits = count_digits(p);
        // append zeros so that `l` has `maxDigits` digits
        auto const l = static_cast<T>( p  * my_pow10(maxDigits-digits) );
        return {l, p};
    }

    template<class RaIt>
    using pair_vec
        = std::vector<std::pair<typename std::iterator_traits<RaIt>::value_type,
                                typename std::iterator_traits<RaIt>::value_type>>;

    template<class RaIt>
    pair_vec<RaIt> lexicographic_sort(RaIt p_beg, RaIt p_end)
    {
        if(p_beg == p_end) return pair_vec<RaIt>{};

        auto max = *std::max_element(p_beg, p_end);
        auto maxDigits = count_digits(max);

        pair_vec<RaIt> result;
        result.reserve( std::distance(p_beg, p_end) );

        for(auto i = p_beg; i != p_end; ++i)
            result.push_back( lexicographic_pair_helper(*i, maxDigits) );

        using value_type = typename pair_vec<RaIt>::value_type;

        std::sort(begin(result), end(result),
                  [](value_type const& l, value_type const& r)
                  {
                      if(l.first < r.first) return true;
                      if(l.first > r.first) return false;
                      return l.second < r.second; }
                 );

        return result;
    }
}

struct dyp
{
    template<class RaIt> void operator()(RaIt b, RaIt e)
    {
        auto pairvec = ndyp::lexicographic_sort(b, e);
        std::transform(begin(pairvec), end(pairvec), b,
                       [](typename decltype(pairvec)::value_type const& e) { return e.second; });
    }
};


namespace nnim
{
    bool comp(int l, int r)
    {
      int lv[10] = {}; // probably possible to get this from numeric_limits
      int rv[10] = {};

      int lc = 10; // ditto
      int rc = 10;
      while (l || r)
      {
        if (l)
        {
          auto t = l / 10;
          lv[--lc] = l - (t * 10);
          l = t;
        }
        if (r)
        {
          auto t = r / 10;
          rv[--rc] = r - (t * 10);
          r = t;
        }
      }
      while (lc < 10 && rc < 10)
      {
        if (lv[lc] == rv[rc])
        {
          lc++;
          rc++;
        }
        else
          return lv[lc] < rv[rc];
      }
      return lc > rc;
    }
}

struct nim
{
    template<class RaIt> void operator()(RaIt b, RaIt e)
    {
        std::sort(b, e, nnim::comp);
    }
};

struct pts
{
        template<class T> static bool lex_less(T a, T b) {
          unsigned la = 1, lb = 1;
          for (T t = a; t > 9; t /= 10) ++la;
          for (T t = b; t > 9; t /= 10) ++lb;
          const bool ll = la < lb;
          while (la > lb) { b *= 10; ++lb; }
          while (lb > la) { a *= 10; ++la; }
          return a == b ? ll : a < b;
        }

        template<class RaIt> void operator()(RaIt b, RaIt e)
    {
        std::sort(b, e, lex_less<typename std::iterator_traits<RaIt>::value_type>);
    }
};

struct epost
{
        static bool compare(int x, int y)
        {
                static const double limit = .5 * (log(INT_MAX) - log(INT_MAX-1));

                double lx = log10(x);
                double ly = log10(y);
                double fx = lx - floor(lx);  // Get the mantissa of lx.
                double fy = ly - floor(ly);  // Get the mantissa of ly.
                return fabs(fx - fy) < limit ? lx < ly : fx < fy;
        }

        template<class RaIt> void operator()(RaIt b, RaIt e)
    {
        std::sort(b, e, compare);
    }
};

struct nyar
{
        static bool lexiSmaller(int i1, int i2)
        {
            int digits1 = count_digits(i1);
            int digits2 = count_digits(i2);

            double val1 = i1/pow(10.0, digits1-1);
            double val2 = i2/pow(10.0, digits2-1);

            while (digits1 > 0 && digits2 > 0 && (int)val1 == (int)val2)
            {
                digits1--;
                digits2--;
                val1 = (val1 - (int)val1)*10;
                val2 = (val2 - (int)val2)*10;
            }
            if (digits1 > 0 && digits2 > 0)
            {
                return (int)val1 < (int)val2;
            }
            return (digits2 > 0);
        }

        template<class RaIt> void operator()(RaIt b, RaIt e)
    {
        std::sort(b, e, lexiSmaller);
    }
};

struct notbad
{
        static int up_10pow(int n) {
          int ans = 1;
          while (ans < n) ans *= 10;
          return ans;
        }

        static bool compare(int v1, int v2) {
           int ceil1 = up_10pow(v1), ceil2 = up_10pow(v2);
           while ( ceil1 != 0 && ceil2 != 0) {
              if (v1 / ceil1  < v2 / ceil2) return true;
              else if (v1 / ceil1 > v2 / ceil2) return false;
              ceil1 /= 10;
              ceil2 /= 10;
           }
           if (v1 < v2) return true;
           return false;
        }

        template<class RaIt> void operator()(RaIt b, RaIt e)
    {
        std::sort(b, e, compare);
    }
};

автор: Eric Postpischil · Accepted Answer · 2013-10-26 17:37:59

я считаю, что следующие работы в качестве функции сравнения сортировки для натуральных чисел при условии, что используемый целочисленный тип существенно уже, чем double тип (например, 32-разрядные int и 64-разрядных double) и log10 используемая процедура возвращает точно правильные результаты для точных степеней 10 (что делает хорошая реализация):

static const double limit = .5 * (log(INT_MAX) - log(INT_MAX-1));

double lx = log10(x);
double ly = log10(y);
double fx = lx - floor(lx);  // Get the mantissa of lx.
double fy = ly - floor(ly);  // Get the mantissa of ly.
return fabs(fx - fy) < limit ? lx < ly : fx < fy;

он работает, сравнивая мантиссы логарифмов. Мантиссы-это дробные части логарифма, и они указывают значение значащих цифр числа без величины (например, логарифмы 31, 3.1 и 310 имеют точно такую же мантиссу).

цель fabs(fx - fy) < limit должен учитывать ошибки в принятии логарифма, которые происходят как из-за реализации log10 несовершенны и потому, что формат с плавающей запятой вызывает некоторую ошибку. (Целочисленные части логарифмов 31 и 310 используют разное количество битов, поэтому для significand, поэтому они в конечном итоге округляются до немного разных значений.) До тех пор, пока целочисленный тип существенно уже, чем double тип, из расчета limit будет намного больше, чем ошибка в log10. Таким образом, тест fabs(fx - fy) < limit по существу говорит нам, будут ли две вычисленные мантиссы равны, если вычислены точно.

если мантиссы отличаются, они указывают лексикографический порядок, поэтому мы возвращаемся fx < fy. Если они равны, то целая часть логарифм сообщает нам порядок, поэтому мы возвращаемся lx < ly.

это просто проверить, является ли log10 возвращает правильные результаты для каждой степени десяти, так как их так мало. Если это не так, корректировки могут быть сделаны легко: Insert if (1-fx < limit) fx = 0; if (1-fu < limit) fy = 0;. Это позволяет когда log10 возвращает что-то вроде 4.99999... когда он должен был вернуть 5.

этот метод имеет то преимущество, что не использует циклы или разделение (что занимает много времени на многих процессорах).

автор: Tino Didriksen · Accepted Answer · 2013-10-25 13:14:01

задача звучит как естественная пригонка для варианта MSD сортировки Radix с заполнением (http://en.wikipedia.org/wiki/Radix_sort).

зависит от того, сколько кода Вы хотите бросить на нее. Простой код, как показывают другие, является o(log n) сложностью, в то время как полностью оптимизированная сортировка radix будет O (kn).

автор: pts · Accepted Answer · 2013-10-25 16:57:52

компактное решение, если все ваши числа неотрицательны и они достаточно малы, чтобы умножение их на 10 не вызывало переполнения:

template<class T> bool lex_less(T a, T b) {
  unsigned la = 1, lb = 1;
  for (T t = a; t > 9; t /= 10) ++la;
  for (T t = b; t > 9; t /= 10) ++lb;
  const bool ll = la < lb;
  while (la > lb) { b *= 10; ++lb; }
  while (lb > la) { a *= 10; ++la; }
  return a == b ? ll : a < b;
}

запустите его так:

#include <iostream>
#include <algorithm>
int main(int, char **) {
  unsigned short input[] = { 100, 21 , 22 , 99 , 1  , 927 };
  unsigned input_size = sizeof(input) / sizeof(input[0]);
  std::sort(input, input + input_size, lex_less<unsigned short>);
  for (unsigned i = 0; i < input_size; ++i) {
    std::cout << ' ' << input[i];
  }
  std::cout << std::endl;
  return 0;
}

автор: Conor Linehan · Accepted Answer · 2013-10-25 13:00:34

Вы можете попробовать использовать оператор%, чтобы дать вам доступ к каждой отдельной цифре, например, 121% 100 даст вам первую цифру и проверить таким образом, но вам придется найти способ обойти тот факт, что они имеют разные размеры.

чтобы найти максимальное значение в массиве. Я не знаю, есть ли функция для этого в встроенном вы можете попробовать.

int Max (int* pdata,int size)
 {
int temp_max =0 ;

for (int i =0 ; i < size ; i++)
 {
    if (*(pdata+i) > temp_max)
    {
        temp_max = *(pdata+i);

    }
 }
 return temp_max;
 }

эта функция вернет количество цифр в число

 int Digit_checker(int n)
{
 int num_digits = 1;

while (true)
{
    if ((n % 10) == n)
        return num_digits;
    num_digits++;
    n = n/10;
}
return num_digits;
}

пусть количество цифр в макс. равна n. После этого откройте цикл for в формате for (int i = 1; i

затем вы можете пройти через свой и использовать "data[i] % (10^(n-i))", чтобы получить доступ к первой цифре рода, что и тогда на следующей итерации вы получите доступ ко второй цифре. Но я не знаю, как вы их рассортируете.

он не будет работать для отрицательных чисел, и вам придется обойти данные[i] % (10^(n-i)), возвращая себя для чисел с меньшим количеством цифр, чем max

автор: Zhiwen Fang · Accepted Answer · 2013-10-25 19:07:04

перегрузить оператор

class CmpIntLex {
int up_10pow(int n) {
  int ans = 1;
  while (ans < n) ans *= 10;
  return ans;
}
public: 
bool operator ()(int v1, int v2) {
   int ceil1 = up_10pow(v1), ceil2 = up_10pow(v2);
   while ( ceil1 != 0 && ceil2 != 0) {
      if (v1 / ceil1  < v2 / ceil2) return true;
      else if (v1 / ceil1 > v2 / ceil2) return false;
      ceil1 /= 10; 
      ceil2 /= 10;
   }
   if (v1 < v2) return true;
   return false;
}
int main() {
vector<int> vi = {12,45,12134,85};
sort(vi.begin(), vi.end(), CmpIntLex());
}

автор: codeling · Accepted Answer · 2013-10-25 12:20:30

в то время как некоторые другие ответы здесь (легковесность, notbad) уже показывают довольно хороший код, я считаю, что могу добавить одно решение, которое может быть более эффективным (поскольку оно не требует ни деления, ни мощности в каждом цикле; но оно требует арифметики с плавающей запятой, которая снова может сделать его медленным и, возможно, неточным для больших чисел):

#include <algorithm>
#include <iostream>
#include <assert.h>

// method taken from http://stackoverflow.com/a/1489873/671366
template <class T>
int numDigits(T number)
{
    int digits = 0;
    if (number < 0) digits = 1; // remove this line if '-' counts as a digit
    while (number) {
        number /= 10;
        digits++;
    }
    return digits;
}

bool lexiSmaller(int i1, int i2)
{
    int digits1 = numDigits(i1);
    int digits2 = numDigits(i2);

    double val1 = i1/pow(10.0, digits1-1);
    double val2 = i2/pow(10.0, digits2-1);

    while (digits1 > 0 && digits2 > 0 && (int)val1 == (int)val2)
    {
        digits1--;
        digits2--;
        val1 = (val1 - (int)val1)*10;
        val2 = (val2 - (int)val2)*10;
    }
    if (digits1 > 0 && digits2 > 0)
    {
        return (int)val1 < (int)val2;
    }
    return (digits2 > 0);
}


int main(int argc, char* argv[])
{
    // just testing whether the comparison function works as expected:
    assert (lexiSmaller(1, 100));
    assert (!lexiSmaller(100, 1));
    assert (lexiSmaller(100, 22));
    assert (!lexiSmaller(22, 100));
    assert (lexiSmaller(927, 99));
    assert (!lexiSmaller(99, 927));
    assert (lexiSmaller(1, 927));
    assert (!lexiSmaller(927, 1));
    assert (lexiSmaller(21, 22));
    assert (!lexiSmaller(22, 21));
    assert (lexiSmaller(22, 99));
    assert (!lexiSmaller(99, 22));

    // use the comparison function for the actual sorting:
    int input[] = { 100 , 21 , 22 , 99 , 1 ,927 };
    std::sort(&input[0], &input[5], lexiSmaller);
    std::cout << "sorted: ";
    for (int i=0; i<6; ++i)
    {
        std::cout << input[i];
        if (i<5)
        {
            std::cout << ", ";
        }
    }
    std::cout << std::endl;
    return 0;
}

хотя я должен признать,что я еще не тестировал производительность.

автор: Nim · Accepted Answer · 2013-10-25 15:55:47

вот глупое решение, которое не использует никаких трюков с плавающей запятой. Это почти то же самое, что сравнение строк, но не использует строку, скажем, также не обрабатывает отрицательные числа, чтобы добавить раздел вверху...

bool comp(int l, int r)
{
  int lv[10] = {}; // probably possible to get this from numeric_limits
  int rv[10] = {};

  int lc = 10; // ditto
  int rc = 10;
  while (l || r)
  {
    if (l)
    {
      auto t = l / 10;
      lv[--lc] = l - (t * 10);
      l = t;
    }
    if (r)
    {
      auto t = r / 10;
      rv[--rc] = r - (t * 10);
      r = t;
    }
  }
  while (lc < 10 && rc < 10)
  {
    if (lv[lc] == rv[rc])
    {
      lc++;
      rc++;
    }
    else
      return lv[lc] < rv[rc];
  }
  return lc > rc;
}

это быстро, и я уверен, что это можно сделать еще быстрее, но это работает, и это достаточно глупо, чтобы понять...

EDIT: я съел, чтобы сбросить много кода, но вот сравнение всех решений так далеко..

#include <iostream>
#include <vector>
#include <algorithm>
#include <iterator>
#include <random>
#include <vector>
#include <utility>
#include <cmath>
#include <cassert>
#include <chrono>

std::pair<int, int> lexicographic_pair_helper(int p, int maxDigits)
{
  int digits = std::log10(p);
  int l = p*std::pow(10, maxDigits-digits);
  return {l, p};
}

bool l_comp(int l, int r)
{
  int lv[10] = {}; // probably possible to get this from numeric_limits
  int rv[10] = {};

  int lc = 10; // ditto
  int rc = 10;
  while (l || r)
  {
    if (l)
    {
      auto t = l / 10;
      lv[--lc] = l - (t * 10);
      l = t;
    }
    if (r)
    {
      auto t = r / 10;
      rv[--rc] = r - (t * 10);
      r = t;
    }
  }
  while (lc < 10 && rc < 10)
  {
    if (lv[lc] == rv[rc])
    {
      lc++;
      rc++;
    }
    else
      return lv[lc] < rv[rc];
  }
  return lc > rc;
}

int up_10pow(int n) {
  int ans = 1;
  while (ans < n) ans *= 10;
  return ans;
}
bool l_comp2(int v1, int v2) {
  int n1 = up_10pow(v1), n2 = up_10pow(v2);
  while ( v1 != 0 && v2 != 0) {
    if (v1 / n1  < v2 / n2) return true;
    else if (v1 / n1 > v2 / n2) return false;
    v1 /= 10;
    v2 /= 10;
    n1 /= 10;
    n2 /= 10;
  }
  if (v1 == 0 && v2 != 0) return true;
  return false;
}

int main()
{
  std::vector<int> numbers;
  {
    constexpr int number_of_elements = 1E6;
    std::random_device rd;
    std::mt19937 gen( rd() );
    std::uniform_int_distribution<> dist;
    for(int i = 0; i < number_of_elements; ++i) numbers.push_back( dist(gen) );
  }

  std::vector<int> lo(numbers);
  std::vector<int> dyp(numbers);
  std::vector<int> nim(numbers);
  std::vector<int> nb(numbers);

  std::cout << "starting..." << std::endl;

  {

    auto start = std::chrono::high_resolution_clock::now();
    /**
    * Sorts the array lexicographically.
    *
    * The trick is that we have to compare digits left-to-right
    * (considering typical Latin decimal notation) and that each of
    * two numbers to compare may have a different number of digits.
    *
    * This probably isn't very efficient, so I wouldn't do it on
    * "millions" of numbers. But, it works...
    */
    std::sort(
    std::begin(lo),
              std::end(lo),
              [](int lhs, int rhs) -> bool {
                // Returns true if lhs < rhs
                // Returns false otherwise
                const auto BASE      = 10;
                const bool LHS_FIRST = true;
                const bool RHS_FIRST = false;
                const bool EQUAL     = false;


                // There's no point in doing anything at all
                // if both inputs are the same; strict-weak
                // ordering requires that we return `false`
                // in this case.
                if (lhs == rhs) {
                  return EQUAL;
                }

                // Compensate for sign
                if (lhs < 0 && rhs < 0) {
                  // When both are negative, sign on its own yields
                  // no clear ordering between the two arguments.
                  //
                  // Remove the sign and continue as for positive
                  // numbers.
                  lhs *= -1;
                  rhs *= -1;
                }
                else if (lhs < 0) {
                  // When the LHS is negative but the RHS is not,
              // consider the LHS "first" always as we wish to
              // prioritise the leading '-'.
              return LHS_FIRST;
                }
                else if (rhs < 0) {
                  // When the RHS is negative but the LHS is not,
              // consider the RHS "first" always as we wish to
              // prioritise the leading '-'.
              return RHS_FIRST;
                }

                // Counting the number of digits in both the LHS and RHS
                // arguments is *almost* trivial.
                const auto lhs_digits = (
                lhs == 0
                ? 1
                : std::ceil(std::log(lhs+1)/std::log(BASE))
                );

                const auto rhs_digits = (
                rhs == 0
                ? 1
                : std::ceil(std::log(rhs+1)/std::log(BASE))
                );

                // Now we loop through the positions, left-to-right,
              // calculating the digit at these positions for each
              // input, and comparing them numerically. The
              // lexicographic nature of the sorting comes from the
              // fact that we are doing this per-digit comparison
              // rather than considering the input value as a whole.
              const auto max_pos = std::max(lhs_digits, rhs_digits);
              for (auto pos = 0; pos < max_pos; pos++) {
                if (lhs_digits - pos == 0) {
                  // Ran out of digits on the LHS;
                  // prioritise the shorter input
                  return LHS_FIRST;
                }
                else if (rhs_digits - pos == 0) {
                  // Ran out of digits on the RHS;
                  // prioritise the shorter input
                  return RHS_FIRST;
                }
                else {
                  const auto lhs_x = (lhs / static_cast<decltype(BASE)>(std::pow(BASE, lhs_digits - 1 - pos))) % BASE;
                  const auto rhs_x = (rhs / static_cast<decltype(BASE)>(std::pow(BASE, rhs_digits - 1 - pos))) % BASE;

                  if (lhs_x < rhs_x)
                    return LHS_FIRST;
                  else if (rhs_x < lhs_x)
                    return RHS_FIRST;
                }
              }

              // If we reached the end and everything still
              // matches up, then something probably went wrong
              // as I'd have expected to catch this in the tests
              // for equality.
              assert("Unknown case encountered");
              }
              );

    auto end = std::chrono::high_resolution_clock::now();
    auto elapsed = end - start;
    std::cout << "Lightness: " << elapsed.count() << '\n';
  }

  {
    auto start = std::chrono::high_resolution_clock::now();

    auto max = *std::max_element(begin(dyp), end(dyp));
    int maxDigits = std::log10(max);

    std::vector<std::pair<int,int>> temp;
    temp.reserve(dyp.size());
    for(auto const& e : dyp) temp.push_back( lexicographic_pair_helper(e, maxDigits) );

    std::sort(begin(temp), end(temp), [](std::pair<int, int> const& l, std::pair<int, int> const& r)
    { if(l.first < r.first) return true; if(l.first > r.first) return false; return l.second < r.second; });

    auto end = std::chrono::high_resolution_clock::now();
    auto elapsed = end - start;
    std::cout << "Dyp: " << elapsed.count() << '\n';
  }

  {
    auto start = std::chrono::high_resolution_clock::now();
    std::sort (nim.begin(), nim.end(), l_comp);
    auto end = std::chrono::high_resolution_clock::now();
    auto elapsed = end - start;
    std::cout << "Nim: " << elapsed.count() << '\n';
  }

//   {
//     auto start = std::chrono::high_resolution_clock::now();
//     std::sort (nb.begin(), nb.end(), l_comp2);
//     auto end = std::chrono::high_resolution_clock::now();
//     auto elapsed = end - start;
//     std::cout << "notbad: " << elapsed.count() << '\n';
//   }

  std::cout << (nim == lo) << std::endl;
  std::cout << (nim == dyp) << std::endl;
  std::cout << (lo == dyp) << std::endl;
//   std::cout << (lo == nb) << std::endl;
}