Front page | perl.perl6.users |
Postings from November 2021
Re: hope we have the distributed computing perl6
Thread Previous
|
Thread Next
From:
William Michels via perl6-users
Date:
November 29, 2021 17:39
Subject:
Re: hope we have the distributed computing perl6
Message ID:
CAA99HCxdVvsCJ0Uu-AiYQh21Db6t24mfTkTiFto39NG32fs8Aw@mail.gmail.com
Hi Piper,
RE:
https://github.com/perl-spark
Thank you for the reply. There seems to be two issues here: 1) 'What
is going on with Perl-Spark?' and 2). 'Can we make an effort to
produce Raku-Spark?'. Below I only address the former question.
The "perl-spark" Github project appears to contain 13 repos (you
graciously provided the link to the "Spark" repo). The only person I
see associated with the "perl-spark" Github project is Kent Fredric,
who sadly passed away earlier this year:
http://blogs.perl.org/users/neilb/2021/04/kent-fredrics-cpan-distributions.html
https://forums.gentoo.org/viewtopic-t-1130094-postdays-0-postorder-asc-start-0.html
https://lwn.net/Articles/846054/
https://givealittle.co.nz/cause/kent-fredrics-funeral-costs
I recall Kent was active in the "Raku" Github-renaming discussion
(e.g. https://github.com/Raku/problem-solving/issues/81#issuecomment-528756303),
and he wanted Raku (née Perl6) to have a fresh start. While I see
efforts are underway to have Kent's CPAN distributions adopted, I
don't know of a similar process on Github. While each of the 13
repositories under the "perl-spark" umbrella can easily be forked,
it's unclear to me if the entire Github project can similarly be
forked.
I have copied some active members of the Perl community on this email,
in the hopes that they can help transfer the "perl-spark" Github
project.
Best Regards, Bill.
On Sun, Nov 28, 2021 at 10:34 PM Piper H <potthua@gmail.com> wrote:
>
> William, I didn't use SparkR. I use R primarily for plotting.
>
> Spark's basic API is quite simple, it does the distributed computing of map, filter, group, reduce etc, which are all covered by perl's map, sort, grep functions IMO.
>
> for instance, this common statistics on Spark:
>
> >>> fruit.take(5)
> [('peach', 1), ('apricot', 2), ('apple', 3), ('haw', 1), ('persimmon', 9)]
> >>>
> >>>
> >>> fruit.filter(lambda x:x[0] == 'apple').reduceByKey(lambda x,y:x+y).collect()
> [('apple', 86)]
>
> Which is easily implemented by perl's grep and map functions.
> But we need a distributed computing framework of perl6.
>
> Yes there is already the perl-spark project:
> https://github.com/perl-spark/Spark
> Which didn't get updated for many years. I don't think it's still in active development.
>
> So I asked the original question.
>
> Thank you.
> Piper
>
>
> On Mon, Nov 29, 2021 at 1:44 PM William Michels <wjm1@caa.columbia.edu> wrote:
>>
>> Hi Piper!
>>
>> Have you used SparkR (R on Spark)?
>>
>> https://spark.apache.org/docs/latest/sparkr.html
>>
>> I'm encouraged by the data-type mapping between R and Spark. It
>> suggests to me that with a reasonable Spark API, mapping data types
>> between Raku and Spark should be straightforward:
>>
>> https://spark.apache.org/docs/latest/sparkr.html#data-type-mapping-between-r-and-spark
>>
>> Best Regards,
>>
>> Bill.
>>
>>
>> On Sat, Nov 27, 2021 at 12:16 AM Piper H <potthua@gmail.com> wrote:
>> >
>> > I use perl5 everyday for data statistics.
>> > The scripts are running on a single server for the computing tasks.
>> > I also use R, which has the similar usage.
>> > When we face very large data, we change to Apache Spark for distributed computing.
>> > Spark's interface languages (python, scala, even ruby) are not flexible, but their computing capability is amazing, due to the whole cluster contributing the computing powers.
>> > Yes I know perl5 is somewhat old, but in perl6 why won't we make that a distributed computing framework like Spark? Then it will help a lot to the data programmer who already knows perl.
>> > I expect a lot from this project.
>> >
>> > Thanks.
>> > Piper
Thread Previous
|
Thread Next